DeSR Dependency Parser

Tanl::Corpus Class Reference

Represents common aspects of a Corpus. More...

#include <Corpus.h>

Inheritance diagram for Tanl::Corpus:

Inheritance graph
Collaboration diagram for Tanl::Corpus:

Collaboration graph

List of all members.

Public Member Functions

 Corpus (Language const &lang)
 Corpus (Language const &lang, CorpusFormat &format)
 Create from specified CorpusFormat.
 Corpus (Language const &lang, char const *formatFile)
 Read the corpus format from file formatFile.
AttributeId attributeId (const char *name)
virtual SentenceReader * sentenceReader (std::istream *is)
virtual void print (std::ostream &os, Sentence const &sent) const
 Print the sentence in the standard format for the corpus.

Static Public Member Functions

static Corpuscreate (Language const &language, char const *inputFormat)
 Factory pattern for creating a Corpus based on the provided format.
static Corpuscreate (char const *language, char const *inputFormat)
static CorpusFormatparseFormat (char const *formatFile)
 Read the corpus format from file formatFile.

Public Attributes

Language const & language
AttributeIndex index
TokenFields tokenFields

Static Protected Member Functions

static CorpusFormatparseFormat (std::istream &is)


Detailed Description

Represents common aspects of a Corpus.

Definition at line 98 of file Corpus.h.


Constructor & Destructor Documentation

Tanl::Corpus::Corpus ( Language const &  lang  )  [inline]

Parameters:
lang the default language for sentences in the corpus.

Definition at line 108 of file Corpus.h.

Referenced by create().

Tanl::Corpus::Corpus ( Language const &  lang,
CorpusFormat format 
) [inline]

Create from specified CorpusFormat.

Parameters:
lang the default language for sentences in the corpus.

Definition at line 116 of file Corpus.h.

Tanl::Corpus::Corpus ( Language const &  lang,
char const *  formatFile 
)

Read the corpus format from file formatFile.

Parameters:
lang the default language for sentences in the corpus.

Definition at line 41 of file Corpus.cpp.

References Tanl::CorpusFormat::index, parseFormat(), and Tanl::CorpusFormat::tokenFields.


Member Function Documentation

AttributeId Tanl::Corpus::attributeId ( const char *  name  )  [inline]

Returns:
the index (into the vector of values for tokens) of the attribute with the given
Parameters:
name. 

Definition at line 151 of file Corpus.h.

Corpus * Tanl::Corpus::create ( Language const &  language,
char const *  inputFormat 
) [static]

Factory pattern for creating a Corpus based on the provided format.

Parameters:
lang the default language for sentences in the corpus.
inputFormat is either the name of a builtin format (either CoNLL, conll08, DgaXML, Text, TokenizedText) or the name of a file containing the specifications of the format.

Definition at line 53 of file Corpus.cpp.

References Corpus(), Tanl::CorpusFormat::name, and parseFormat().

CorpusFormat * Tanl::Corpus::parseFormat ( char const *  formatFile  )  [static]

Read the corpus format from file formatFile.

Returns:
created format or 0 if reading failed.

Definition at line 74 of file Corpus.cpp.

Referenced by Corpus(), and create().

virtual SentenceReader* Tanl::Corpus::sentenceReader ( std::istream *  is  )  [virtual]

Returns:
an appropriate reader for reading sentences of the corpus from the given filename.

Reimplemented in Tanl::TextCorpus, and Tanl::TokenizedTextCorpus.


The documentation for this class was generated from the following files:
 
Copyright © 2005-2007 G. Attardi. Generated on 13 Aug 2009 by doxygen 1.5.7.1.