DeSR Dependency Parser |
#include <Corpus.h>
Public Member Functions | |
Corpus (Language const &lang) | |
Corpus (Language const &lang, CorpusFormat &format) | |
Create from specified CorpusFormat. | |
Corpus (Language const &lang, char const *formatFile) | |
Read the corpus format from file formatFile . | |
AttributeId | attributeId (const char *name) |
virtual SentenceReader * | sentenceReader (std::istream *is) |
virtual void | print (std::ostream &os, Sentence const &sent) const |
Print the sentence in the standard format for the corpus. | |
Static Public Member Functions | |
static Corpus * | create (Language const &language, char const *inputFormat) |
Factory pattern for creating a Corpus based on the provided format. | |
static Corpus * | create (char const *language, char const *inputFormat) |
static CorpusFormat * | parseFormat (char const *formatFile) |
Read the corpus format from file formatFile . | |
Public Attributes | |
Language const & | language |
AttributeIndex | index |
TokenFields | tokenFields |
Static Protected Member Functions | |
static CorpusFormat * | parseFormat (std::istream &is) |
Definition at line 98 of file Corpus.h.
Tanl::Corpus::Corpus | ( | Language const & | lang | ) | [inline] |
Tanl::Corpus::Corpus | ( | Language const & | lang, | |
CorpusFormat & | format | |||
) | [inline] |
Create from specified CorpusFormat.
lang | the default language for sentences in the corpus. |
Tanl::Corpus::Corpus | ( | Language const & | lang, | |
char const * | formatFile | |||
) |
Read the corpus format from file formatFile
.
lang | the default language for sentences in the corpus. |
Definition at line 41 of file Corpus.cpp.
References Tanl::CorpusFormat::index, parseFormat(), and Tanl::CorpusFormat::tokenFields.
AttributeId Tanl::Corpus::attributeId | ( | const char * | name | ) | [inline] |
Factory pattern for creating a Corpus based on the provided format.
lang | the default language for sentences in the corpus. | |
inputFormat | is either the name of a builtin format (either CoNLL, conll08, DgaXML, Text, TokenizedText) or the name of a file containing the specifications of the format. |
Definition at line 53 of file Corpus.cpp.
References Corpus(), Tanl::CorpusFormat::name, and parseFormat().
CorpusFormat * Tanl::Corpus::parseFormat | ( | char const * | formatFile | ) | [static] |
Read the corpus format from file formatFile
.
Definition at line 74 of file Corpus.cpp.
virtual SentenceReader* Tanl::Corpus::sentenceReader | ( | std::istream * | is | ) | [virtual] |
filename
. Reimplemented in Tanl::TextCorpus, and Tanl::TokenizedTextCorpus.