DeSR Dependency Parser

Tanl::Text::RegExp::Pattern Class Reference

Regular Expression matching. More...

#include <RegExp.h>

List of all members.

Public Member Functions

 Pattern (std::string &expression, int cflags=0)
 Pattern (char const *expression, int cflags=0)
Patternoperator= (Pattern const &other)
 Assignement.
bool test (std::string const &str, int eflags=0)
 Tests if the pattern matches at given string str.
bool test (char const *str, size_t len=0, int eflags=0)
 Tests if the pattern matches at given string str, within the given length len.
int matchSize (std::string const &text, int eflags=0)
 compute the size of the match.
int match (const char *start, const char *end, MatchGroups &pos, int eflags=0)
 Matches the text between start and end and returns the matching positions in pos, expressed as byte-offset from start.
int match (std::string const &text, MatchGroups &pos, int eflags=0)
 Matches the text in text and returns the matching positions in pos, expressed as byte-offset from start.
std::vector< std::string > match (std::string const &str, int eflags=0)
std::string replace (std::string &text, std::string &with, bool replaceAll=false)
 Replaces the first substring matching the expression within text with the string with.

Static Public Member Functions

static std::string escape (std::string &str)
 Escapes all meta characters.
static const unsigned char * setLocale (char const *locale)
 Set the locale for use during matching.

Static Public Attributes

static const unsigned char * CharTables = Pattern::setLocale(setlocale(LC_CTYPE, 0))
 The current chartable to use for matching.


Detailed Description

Regular Expression matching.

A pattern is compiled from a regular expression and used in matching. Regular expressions are written using the Perl 5 syntax.

A simple use for testing whether a string matches a pattern is::

      Pattern p("a*b");
      bool b = p.test("aaab");
  
In order to extract the portions of the string that match, MatchGroups can be used:
      Pattern p("(a*)b");
      MatchGroups m(2);
      string s("daaab");
      int n = p.matches(s, m);
  
n is the number of groups matched: group 0 represents the substring captured by the whole pattern.

Definition at line 114 of file RegExp.h.


Constructor & Destructor Documentation

Tanl::Text::RegExp::Pattern::Pattern ( std::string &  expression,
int  cflags = 0 
)

Parameters:
expression the regular expression
cflags a combination of CompileFlags
NOTE. The ISO Latin-15 locale is used by default: ensure that the locale files for LC_CTYPE=en_US.iso885915 are installed in the OS. This can be changed using SetLocale().

Tanl::Text::RegExp::Pattern::Pattern ( char const *  expression,
int  cflags = 0 
)

Parameters:
expression the regular expression
cflags a combination of CompileFlags
NOTE. The ISO Latin-15 locale is used by default: ensure that the locale files for LC_CTYPE=en_US.iso885915 are installed in the OS. This can be changed using SetLocale().

Definition at line 42 of file RegExp.cpp.

References CharTables.


Member Function Documentation

std::vector<std::string> Tanl::Text::RegExp::Pattern::match ( std::string const &  str,
int  eflags = 0 
)

Parameters:
str the text to match.
eflags any combinations of EvaluateFlags
Returns:
an vector<string>: [0] substring matched [1 - n] sub expression with '()'

int Tanl::Text::RegExp::Pattern::match ( std::string const &  text,
MatchGroups pos,
int  eflags = 0 
)

Matches the text in text and returns the matching positions in pos, expressed as byte-offset from start.

Parameters:
text the string to match.
pos the identified matching positions.
eflags any combinations of EvaluateFlags
Returns:
0 if not matching, otherwise the count of matched expressions.

int Tanl::Text::RegExp::Pattern::match ( const char *  start,
const char *  end,
MatchGroups pos,
int  eflags = 0 
)

Matches the text between start and end and returns the matching positions in pos, expressed as byte-offset from start.

Parameters:
start start of the text to match.
end end of the text to match.
pos the identified matching positions.
eflags any combinations of EvaluateFlags
Returns:
0 if not matching, otherwise the count of matched expressions.

Definition at line 144 of file RegExp.cpp.

References Tanl::Text::RegExp::MatchGroups::size().

Referenced by Tanl::TokenSentenceReader::MoveNext(), and Tanl::ConllXSentenceReader::MoveNext().

int Tanl::Text::RegExp::Pattern::matchSize ( std::string const &  text,
int  eflags = 0 
)

compute the size of the match.

Parameters:
text the text to match.
eflags any combinations of EvaluateFlags.
Returns:
0 if not matching, otherwise the size of the match

Pattern& Tanl::Text::RegExp::Pattern::operator= ( Pattern const &  other  )  [inline]

Assignement.

Hack to avoid freeing twice _pcre.

Definition at line 155 of file RegExp.h.

References _errorCode, _pcre, _pcre_extra, and subpatterns.

std::string Tanl::Text::RegExp::Pattern::replace ( std::string &  text,
std::string &  with,
bool  replaceAll = false 
)

Replaces the first substring matching the expression within text with the string with.

If replaceAll is true, all occurrences are replaced.

const unsigned char * Tanl::Text::RegExp::Pattern::setLocale ( char const *  locale  )  [static]

Set the locale for use during matching.

Use "en_US.iso885915" or similar for recognizing ISO Latin-15 letters.

Definition at line 30 of file RegExp.cpp.

References CharTables.

bool Tanl::Text::RegExp::Pattern::test ( char const *  str,
size_t  len = 0,
int  eflags = 0 
)

Tests if the pattern matches at given string str, within the given length len.

Parameters:
str the string to match.
len the length of the string to match.
eflags any combinations of EvaluateFlags
Returns:
true if matches

Definition at line 91 of file RegExp.cpp.

bool Tanl::Text::RegExp::Pattern::test ( std::string const &  str,
int  eflags = 0 
)

Tests if the pattern matches at given string str.

Parameters:
str the string to match.
eflags any combinations of EvaluateFlags
Returns:
true if matches

Referenced by Parser::State::predicates(), and Parser::ParseState::transition().


The documentation for this class was generated from the following files:
 
Copyright © 2005-2007 G. Attardi. Generated on 13 Aug 2009 by doxygen 1.5.7.1.