DeSR Dependency Parser

IXE Namespace Reference

Global configuration parameters. More...


Namespaces

namespace  io
 Platform independent IO.

Classes

struct  Options
 Options describes a set of command-line options. More...
class  OptionStream
 Given the traditional argc and argv for command-line arguments, extract options from them following the stream model. More...
struct  FileAction
 Perform action on each file in directory tree. More...
class  Configuration
 A Configuration object that holds all the configuration variables. More...
class  Var
 Configuration variable. More...
class  VarDefault
 Configuration variable with default value. More...
class  conf
class  conf< bool >
 A conf<bool> is a Var for containing the value of a Boolean configuration variable. More...
class  conf< Dictionary >
class  conf< float >
 A conf<float> is a Var for containing the value of a float configuration variable. More...
class  conf< int >
 A conf<int> is a Var for containing the value of an integer configuration variable. More...
class  conf< PatternSet >
 A conf_PatternSet contains a list of shell wildcard patterns. More...
class  conf_set
 A conf_set contains a set of configuration values. More...
class  conf< std::string >
 A conf<string> is a configuration variable containing a string value. More...
class  conf< std::vector< std::string > >
 A conf_vector contains a set of configuration values. More...
class  ExcludeFile
 An ExcludeFile contains the set of filename patterns to exclude during either indexing or extraction. More...
class  FileType
 A FileType maps a filename pattern to a file type. More...
class  IncludeFile
 An IncludeFile contains the set of filename patterns to include during either indexing or extraction. More...
class  PatternList
 An PatternList contains a list of shell wildcard patterns. More...
class  PatternMap
 A PatternMap maps a shell wildcard pattern to an object of type T. More...
class  PatternSet
 A PatternSet contains a set of shell wildcard patterns. More...
class  Error
 Base class for all errors reported. More...
class  LogicError
 Base class for errors due to programming errors. More...
class  RuntimeError
 Base class for errors due to run time problems. More...
class  AssertionError
 Thrown if an internal consistency check fails. More...
class  UnimplementedError
 Thrown when an attempt to use an unimplemented feature is made. More...
class  InvalidArgumentError
 Thrown when an invalid argument is supplied to the API. More...
class  ConfigFileError
 Thrown when reading a configuration file fails. More...
class  FileError
 Thrown when opening a file fails. More...
class  MmapError
 Thrown when mmap fails mapping a file to memory. More...
class  FormatError
 Wrong index format file. More...
class  DocNotFoundError
 Thrown when an attempt is made to access a document which is not in the collection. More...
class  InternalError
 thrown when an internal inconsistency occurs. More...
class  IndexingError
 thrown during indexing. More...
class  RangeError
 thrown when an element is out of range. More...
class  ReaderError
 Thrown when reader fails interpreting document format. More...
class  CollectionError
 thrown for miscellaneous collection errors. More...
class  NetworkError
 thrown when there is a communications problem with a remote collection. More...
class  MemoryError
 thrown when there is a communications problem with a remote collection. More...
class  OpeningError
 Thrown when opening a collection fails. More...
class  TableError
 Thrown when accessing a database Table fails. More...
class  ParserError
class  QueryError
 Thrown when an SQL query fails. More...
class  InvalidResultError
 Thrown when trying to access invalid data. More...
class  SystemError
 Thrown when a system call fails. More...
class  IOError
class  Set
 A Set is a set but with the addition of a contains() member function, one that returns a simpler bool result indicating whether a given element is in the set. More...
class  Set< char const * >
 Specialize Set for C-stle strings so as not to have a reference (implemented as a pointer) to a char const*. More...
class  Timer

Typedefs

typedef std::map< char const
*, char const * > 
Dictionary
 A conf_dictionary contains a dictionary.
typedef conf_set< std::string > conf_stringset
 A conf_set contains a set of configuration string values.
typedef FileType MimeType
 A MimeType maps a mime type to a document reader type.
typedef char char_t
typedef unsigned DocID
 DocID is a numeric ID for documents in IXE collections.
typedef unsigned short HitPosition
 Word position in document.
typedef unsigned short Occurrences
 Number of occurrences of a word in a document.
typedef short TermColor
 TermColor is a numeric ID of a 'color' attribute for the word.
typedef unsigned TermID
typedef unsigned Count
typedef unsigned Size
typedef unsigned char byte
typedef Set< char const * > chars_set

Functions

OptionStreamoperator>> (OptionStream &os, OptionStream::Option &o)
 Parse and extract an option from an option stream (argv values).
bool is_indexable_file (char const *path)
void mapDir (char const *pathname, FileAction &action, bool recurse_subdirectories, bool follow_symbolic_links, int verbosity)
 Perform action on each file in directory tree.
void showTime (char const *msg, struct::timeval &l0, struct::timeval &l1, ostream &out)
 Display the duration of an interval in sec, microseconds.
int url_decode (char *dest, char const *src, int len)
 Decodes any %## encoding in the given string.
char * url_encode (char const *s)
 Returns a string in which all non-alphanumeric characters except "-_.!~*'()," have been replaced with a percent (%) sign followed by two hex digits.
int url_encode (char *dst, char const *s)
void reverseURLdomain (char *revDomain, char const *url, Size len)
 Return the URL's site in reverse.
void unreverseURLdomain (char *domain, char const *revDomain)
Size availableMemory ()
 Detect the available memory.
void cgi_parse (map< char const *, char const * > &keyMap, char *qstart)
 Parse a cgi query into a map of key values.
bool file_empty (char const *path)
 Set the limit for the given resource to its maximum value.
off_t file_size (char const *path)
bool file_exists (char const *path)
bool file_exists (std::string const &path)
bool is_directory (char const *path)
bool is_directory (std::string const &path)
bool is_plain_file (char const *path)
bool is_plain_file (std::string const &path)
bool is_symbolic_link (char const *path)
bool is_symbolic_link (std::string const &path)
void showTime (char const *msg, Timeval &l0, Timeval &l1, std::ostream &out=std::cerr)
void cgi_parse (std::map< char const *, char const * > &keyMap, char *qstart)
char const * next_token_line (char const *&start, const char *sep, char esc= '\\')
 simple string tokenizer, which returns next token within line.

Variables

struct stat stat_buf
conf< bool > VerboseConfig ("VerboseConfig", false)
FileType fileTypes
MimeType mimeTypes
int const WordMaxSize = 25
int const WordMinSize = 2
char const ConfigFileDefault [] = "ixe.conf"
char const TableNameDefault [] = "INDEX/docinfo"
int const FilesGrowDefault = 100
char const IndexExt [] = ".fti"
char const PostingExt [] = ".pst"
char const TableExt [] = ".bdb"
char const ContentsExt [] = ".gz"
int const ResultsMaxDefault = 10
char const TempDirectoryDefault [] = "/tmp"
int const WordPercentMaxDefault = 100
int const Word_Threshold = 60000
int const max_columns = 64
int const max_prefix_lists = 5000
int const Max_CursorAll_Hits = 20
int const Postings_Segment_Size = 1024
int const Min_Postings_Table = 4096
DocID const noDocID = 0
HitPosition const noPosition = 0
HitPosition const maxPosition = (HitPosition)-1
Occurrences const maxOccurrences = (Occurrences)-1
TermColor const noColor = -1
TermColor const color_Not_Found = -2
int const num_bigrams = 256*256 + 1
char const version [] = "1.6"


Detailed Description

Global configuration parameters.

All the IXE classes are declared in namespace IXE.

The IXE Toolkit is a set of modular C++ classes and utilities for indexing and querying documents.


Typedef Documentation

typedef unsigned IXE::DocID

DocID is a numeric ID for documents in IXE collections.

DocID start at 1, 0 is reserved for no ID.

Definition at line 44 of file ixe.h.

typedef unsigned short IXE::HitPosition

Word position in document.

Positions start at 1, 0 is reserved for no position.

Definition at line 51 of file ixe.h.

typedef short IXE::TermColor

TermColor is a numeric ID of a 'color' attribute for the word.

The color may represent a META NAME, or some HTML TAG.

Definition at line 65 of file ixe.h.


Function Documentation

bool IXE::file_empty ( char const *  path  )  [inline]

Set the limit for the given resource to its maximum value.

Parameters:
resource The ID for the resource as given in sys/resources.h.
NOTE

This can't be an ordinary function since the type "resource" isn't int on some systems.

SEE ALSO

W. Richard Stevens. "Advanced Programming in the Unix Environment," Addison-Wesley, Reading, MA, 1993. pp. 180-184. File test functions.

Definition at line 111 of file util.h.

char const * IXE::next_token_line ( char const *&  start,
const char *  sep,
char  esc = '\\' 
)

simple string tokenizer, which returns next token within line.

A token is a sequence of characters delimited by characters in

Parameters:
sep except if preceded by
esc. 
sep sequence of delimiting characters
Returns:
the first token from
Parameters:
ptr. Advances ptr to the end of the token.
esc is an escape character for line continuation

Definition at line 175 of file conf.cpp.

Referenced by Parser::conf_feature::parseValue(), IXE::FileType::parseValue(), IXE::conf< std::vector< std::string > >::parseValue(), IXE::conf_set< T >::parseValue(), IXE::conf< int >::parseValue(), IXE::conf< float >::parseValue(), and IXE::conf< bool >::parseValue().

OptionStream& IXE::operator>> ( OptionStream &  os,
OptionStream::Option &  o 
)

Parse and extract an option from an option stream (argv values).

Options begin with either a '-' for short options or a "--" for long options. Either a '-' or "--" by itself explicitly ends the options; however, the difference is that '-' is returned as the first non-option whereas "--" is skipped entirely.

When there are no more options, the OptionStream converts to bool as false. The OptionStream's shift() member is the number of options parsed which the caller can use to adjust argc and argv.

Short options can take an argument either as the remaining characters of the same argv or in the next argv unless the next argv looks like an option by beginning with a '-').

Long option names can be abbreviated so long as the abbreviation is unambiguous. Long options can take an argument either directly after a '=' in the same argv or in the next argv (but without an '=') unless the next argv looks like an option by beginning with a '-').

Parameters:
os The OptionStream to extract options from
o The option to deposit into.
Returns:
The passed-in OptionStream.

Definition at line 120 of file OptionStream.cpp.

References IXE::OptionStream::argc, IXE::OptionStream::argv, IXE::OptionStream::end, IXE::OptionStream::err, IXE::OptionStream::index, IXE::OptionStream::next_c, and IXE::OptionStream::specs.

void IXE::reverseURLdomain ( char *  revDomain,
char const *  url,
Size  len 
)

Return the URL's site in reverse.

Parameters:
url is the URL

Definition at line 277 of file util.cpp.

void IXE::showTime ( char const *  msg,
struct::timeval &  l0,
struct::timeval &  l1,
ostream &  out 
)

Display the duration of an interval in sec, microseconds.

Parameters:
msg Comment message
l0 interval start
l1 interval end

Definition at line 142 of file util.cpp.

int IXE::url_decode ( char *  dest,
char const *  src,
int  len 
)

Decodes any %## encoding in the given string.

Returns:
the difference in size between src and dest.

Definition at line 177 of file util.cpp.

Referenced by cgi_parse().

int IXE::url_encode ( char *  dst,
char const *  s 
)

Returns:
the length of the encoded string

Definition at line 250 of file util.cpp.

char * IXE::url_encode ( char const *  s  ) 

Returns a string in which all non-alphanumeric characters except "-_.!~*'()," have been replaced with a percent (%) sign followed by two hex digits.

This is the encoding described in RFC 2396 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URL's from being mangled by transmission media with character conversions (like some email systems).

According to RFC 2396, only alphanumerics, the unreserved characters "-_.!~*'()", and reserved characters ";/?:@&=+$,", used for their reserved purposes may be used unencoded within a URL. (

See also:
http://www.faqs.org/rfcs/rfc2396.html)

Definition at line 223 of file util.cpp.

 
Copyright © 2005-2007 G. Attardi. Generated on 13 Aug 2009 by doxygen 1.5.7.1.