DeSR Dependency Parser |
Namespaces | |
namespace | io |
Platform independent IO. | |
Classes | |
struct | Options |
Options describes a set of command-line options. More... | |
class | OptionStream |
Given the traditional argc and argv for command-line arguments, extract options from them following the stream model. More... | |
struct | FileAction |
Perform action on each file in directory tree. More... | |
class | Configuration |
A Configuration object that holds all the configuration variables. More... | |
class | Var |
Configuration variable. More... | |
class | VarDefault |
Configuration variable with default value. More... | |
class | conf |
class | conf< bool > |
A conf<bool> is a Var for containing the value of a Boolean configuration variable. More... | |
class | conf< Dictionary > |
class | conf< float > |
A conf<float> is a Var for containing the value of a float configuration variable. More... | |
class | conf< int > |
A conf<int> is a Var for containing the value of an integer configuration variable. More... | |
class | conf< PatternSet > |
A conf_PatternSet contains a list of shell wildcard patterns. More... | |
class | conf_set |
A conf_set contains a set of configuration values. More... | |
class | conf< std::string > |
A conf<string> is a configuration variable containing a string value. More... | |
class | conf< std::vector< std::string > > |
A conf_vector contains a set of configuration values. More... | |
class | ExcludeFile |
An ExcludeFile contains the set of filename patterns to exclude during either indexing or extraction. More... | |
class | FileType |
A FileType maps a filename pattern to a file type. More... | |
class | IncludeFile |
An IncludeFile contains the set of filename patterns to include during either indexing or extraction. More... | |
class | PatternList |
An PatternList contains a list of shell wildcard patterns. More... | |
class | PatternMap |
A PatternMap maps a shell wildcard pattern to an object of type T. More... | |
class | PatternSet |
A PatternSet contains a set of shell wildcard patterns. More... | |
class | Error |
Base class for all errors reported. More... | |
class | LogicError |
Base class for errors due to programming errors. More... | |
class | RuntimeError |
Base class for errors due to run time problems. More... | |
class | AssertionError |
Thrown if an internal consistency check fails. More... | |
class | UnimplementedError |
Thrown when an attempt to use an unimplemented feature is made. More... | |
class | InvalidArgumentError |
Thrown when an invalid argument is supplied to the API. More... | |
class | ConfigFileError |
Thrown when reading a configuration file fails. More... | |
class | FileError |
Thrown when opening a file fails. More... | |
class | MmapError |
Thrown when mmap fails mapping a file to memory. More... | |
class | FormatError |
Wrong index format file. More... | |
class | DocNotFoundError |
Thrown when an attempt is made to access a document which is not in the collection. More... | |
class | InternalError |
thrown when an internal inconsistency occurs. More... | |
class | IndexingError |
thrown during indexing. More... | |
class | RangeError |
thrown when an element is out of range. More... | |
class | ReaderError |
Thrown when reader fails interpreting document format. More... | |
class | CollectionError |
thrown for miscellaneous collection errors. More... | |
class | NetworkError |
thrown when there is a communications problem with a remote collection. More... | |
class | MemoryError |
thrown when there is a communications problem with a remote collection. More... | |
class | OpeningError |
Thrown when opening a collection fails. More... | |
class | TableError |
Thrown when accessing a database Table fails. More... | |
class | ParserError |
class | QueryError |
Thrown when an SQL query fails. More... | |
class | InvalidResultError |
Thrown when trying to access invalid data. More... | |
class | SystemError |
Thrown when a system call fails. More... | |
class | IOError |
class | Set |
A Set is a set but with the addition of a contains() member function, one that returns a simpler bool result indicating whether a given element is in the set. More... | |
class | Set< char const * > |
Specialize Set for C-stle strings so as not to have a reference (implemented as a pointer) to a char const*. More... | |
class | Timer |
Typedefs | |
typedef std::map< char const *, char const * > | Dictionary |
A conf_dictionary contains a dictionary. | |
typedef conf_set< std::string > | conf_stringset |
A conf_set contains a set of configuration string values. | |
typedef FileType | MimeType |
A MimeType maps a mime type to a document reader type. | |
typedef char | char_t |
typedef unsigned | DocID |
DocID is a numeric ID for documents in IXE collections. | |
typedef unsigned short | HitPosition |
Word position in document. | |
typedef unsigned short | Occurrences |
Number of occurrences of a word in a document. | |
typedef short | TermColor |
TermColor is a numeric ID of a 'color' attribute for the word. | |
typedef unsigned | TermID |
typedef unsigned | Count |
typedef unsigned | Size |
typedef unsigned char | byte |
typedef Set< char const * > | chars_set |
Functions | |
OptionStream & | operator>> (OptionStream &os, OptionStream::Option &o) |
Parse and extract an option from an option stream (argv values). | |
bool | is_indexable_file (char const *path) |
void | mapDir (char const *pathname, FileAction &action, bool recurse_subdirectories, bool follow_symbolic_links, int verbosity) |
Perform action on each file in directory tree. | |
void | showTime (char const *msg, struct::timeval &l0, struct::timeval &l1, ostream &out) |
Display the duration of an interval in sec, microseconds. | |
int | url_decode (char *dest, char const *src, int len) |
Decodes any %## encoding in the given string. | |
char * | url_encode (char const *s) |
Returns a string in which all non-alphanumeric characters except "-_.!~*'()," have been replaced with a percent (%) sign followed by two hex digits. | |
int | url_encode (char *dst, char const *s) |
void | reverseURLdomain (char *revDomain, char const *url, Size len) |
Return the URL's site in reverse. | |
void | unreverseURLdomain (char *domain, char const *revDomain) |
Size | availableMemory () |
Detect the available memory. | |
void | cgi_parse (map< char const *, char const * > &keyMap, char *qstart) |
Parse a cgi query into a map of key values. | |
bool | file_empty (char const *path) |
Set the limit for the given resource to its maximum value. | |
off_t | file_size (char const *path) |
bool | file_exists (char const *path) |
bool | file_exists (std::string const &path) |
bool | is_directory (char const *path) |
bool | is_directory (std::string const &path) |
bool | is_plain_file (char const *path) |
bool | is_plain_file (std::string const &path) |
bool | is_symbolic_link (char const *path) |
bool | is_symbolic_link (std::string const &path) |
void | showTime (char const *msg, Timeval &l0, Timeval &l1, std::ostream &out=std::cerr) |
void | cgi_parse (std::map< char const *, char const * > &keyMap, char *qstart) |
char const * | next_token_line (char const *&start, const char *sep, char esc= '\\') |
simple string tokenizer, which returns next token within line. | |
Variables | |
struct stat | stat_buf |
conf< bool > | VerboseConfig ("VerboseConfig", false) |
FileType | fileTypes |
MimeType | mimeTypes |
int const | WordMaxSize = 25 |
int const | WordMinSize = 2 |
char const | ConfigFileDefault [] = "ixe.conf" |
char const | TableNameDefault [] = "INDEX/docinfo" |
int const | FilesGrowDefault = 100 |
char const | IndexExt [] = ".fti" |
char const | PostingExt [] = ".pst" |
char const | TableExt [] = ".bdb" |
char const | ContentsExt [] = ".gz" |
int const | ResultsMaxDefault = 10 |
char const | TempDirectoryDefault [] = "/tmp" |
int const | WordPercentMaxDefault = 100 |
int const | Word_Threshold = 60000 |
int const | max_columns = 64 |
int const | max_prefix_lists = 5000 |
int const | Max_CursorAll_Hits = 20 |
int const | Postings_Segment_Size = 1024 |
int const | Min_Postings_Table = 4096 |
DocID const | noDocID = 0 |
HitPosition const | noPosition = 0 |
HitPosition const | maxPosition = (HitPosition)-1 |
Occurrences const | maxOccurrences = (Occurrences)-1 |
TermColor const | noColor = -1 |
TermColor const | color_Not_Found = -2 |
int const | num_bigrams = 256*256 + 1 |
char const | version [] = "1.6" |
All the IXE classes are declared in namespace IXE.
The IXE Toolkit is a set of modular C++ classes and utilities for indexing and querying documents.
typedef unsigned IXE::DocID |
typedef unsigned short IXE::HitPosition |
typedef short IXE::TermColor |
bool IXE::file_empty | ( | char const * | path | ) | [inline] |
Set the limit for the given resource to its maximum value.
resource | The ID for the resource as given in sys/resources.h. |
This can't be an ordinary function since the type "resource" isn't int on some systems.
SEE ALSO
W. Richard Stevens. "Advanced Programming in the Unix Environment," Addison-Wesley, Reading, MA, 1993. pp. 180-184. File test functions.
char const * IXE::next_token_line | ( | char const *& | start, | |
const char * | sep, | |||
char | esc = '\\' | |||
) |
simple string tokenizer, which returns next token within line.
A token is a sequence of characters delimited by characters in
sep | except if preceded by | |
esc. | ||
sep | sequence of delimiting characters |
ptr. | Advances ptr to the end of the token. | |
esc | is an escape character for line continuation |
Definition at line 175 of file conf.cpp.
Referenced by Parser::conf_feature::parseValue(), IXE::FileType::parseValue(), IXE::conf< std::vector< std::string > >::parseValue(), IXE::conf_set< T >::parseValue(), IXE::conf< int >::parseValue(), IXE::conf< float >::parseValue(), and IXE::conf< bool >::parseValue().
OptionStream& IXE::operator>> | ( | OptionStream & | os, | |
OptionStream::Option & | o | |||
) |
Parse and extract an option from an option stream (argv values).
Options begin with either a '-' for short options or a "--" for long options. Either a '-' or "--" by itself explicitly ends the options; however, the difference is that '-' is returned as the first non-option whereas "--" is skipped entirely.
When there are no more options, the OptionStream converts to bool as false. The OptionStream's shift() member is the number of options parsed which the caller can use to adjust argc and argv.
Short options can take an argument either as the remaining characters of the same argv or in the next argv unless the next argv looks like an option by beginning with a '-').
Long option names can be abbreviated so long as the abbreviation is unambiguous. Long options can take an argument either directly after a '=' in the same argv or in the next argv (but without an '=') unless the next argv looks like an option by beginning with a '-').
os | The OptionStream to extract options from | |
o | The option to deposit into. |
Definition at line 120 of file OptionStream.cpp.
References IXE::OptionStream::argc, IXE::OptionStream::argv, IXE::OptionStream::end, IXE::OptionStream::err, IXE::OptionStream::index, IXE::OptionStream::next_c, and IXE::OptionStream::specs.
void IXE::reverseURLdomain | ( | char * | revDomain, | |
char const * | url, | |||
Size | len | |||
) |
void IXE::showTime | ( | char const * | msg, | |
struct::timeval & | l0, | |||
struct::timeval & | l1, | |||
ostream & | out | |||
) |
int IXE::url_decode | ( | char * | dest, | |
char const * | src, | |||
int | len | |||
) |
Decodes any %## encoding in the given string.
Definition at line 177 of file util.cpp.
Referenced by cgi_parse().
int IXE::url_encode | ( | char * | dst, | |
char const * | s | |||
) |
char * IXE::url_encode | ( | char const * | s | ) |
Returns a string in which all non-alphanumeric characters except "-_.!~*'()," have been replaced with a percent (%) sign followed by two hex digits.
This is the encoding described in RFC 2396 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URL's from being mangled by transmission media with character conversions (like some email systems).
According to RFC 2396, only alphanumerics, the unreserved characters "-_.!~*'()", and reserved characters ";/?:@&=+$,", used for their reserved purposes may be used unencoded within a URL. (