DeSR Dependency Parser |
#include <RegExp.h>
Public Member Functions | |
Pattern (std::string &expression, int cflags=0) | |
Pattern (char const *expression, int cflags=0) | |
Pattern & | operator= (Pattern const &other) |
Assignement. | |
bool | test (std::string const &str, int eflags=0) |
Tests if the pattern matches at given string str. | |
bool | test (char const *str, size_t len=0, int eflags=0) |
Tests if the pattern matches at given string str, within the given length len. | |
int | matchSize (std::string const &text, int eflags=0) |
compute the size of the match. | |
int | match (const char *start, const char *end, MatchGroups &pos, int eflags=0) |
Matches the text between start and end and returns the matching positions in pos, expressed as byte-offset from start. | |
int | match (std::string const &text, MatchGroups &pos, int eflags=0) |
Matches the text in text and returns the matching positions in pos, expressed as byte-offset from start. | |
std::vector< std::string > | match (std::string const &str, int eflags=0) |
std::string | replace (std::string &text, std::string &with, bool replaceAll=false) |
Replaces the first substring matching the expression within text with the string with. | |
Static Public Member Functions | |
static std::string | escape (std::string &str) |
Escapes all meta characters. | |
static const unsigned char * | setLocale (char const *locale) |
Set the locale for use during matching. | |
Static Public Attributes | |
static const unsigned char * | CharTables = Pattern::setLocale(setlocale(LC_CTYPE, 0)) |
The current chartable to use for matching. |
A pattern is compiled from a regular expression and used in matching. Regular expressions are written using the Perl 5 syntax.
A simple use for testing whether a string matches a pattern is::
Pattern p("a*b"); bool b = p.test("aaab");In order to extract the portions of the string that match,
MatchGroups
can be used: Pattern p("(a*)b"); MatchGroups m(2); string s("daaab"); int n = p.matches(s, m);
n
is the number of groups matched: group 0 represents the substring captured by the whole pattern.
Definition at line 114 of file RegExp.h.
Tanl::Text::RegExp::Pattern::Pattern | ( | std::string & | expression, | |
int | cflags = 0 | |||
) |
expression | the regular expression | |
cflags | a combination of CompileFlags |
Tanl::Text::RegExp::Pattern::Pattern | ( | char const * | expression, | |
int | cflags = 0 | |||
) |
expression | the regular expression | |
cflags | a combination of CompileFlags |
Definition at line 42 of file RegExp.cpp.
References CharTables.
std::vector<std::string> Tanl::Text::RegExp::Pattern::match | ( | std::string const & | str, | |
int | eflags = 0 | |||
) |
str | the text to match. | |
eflags | any combinations of EvaluateFlags |
int Tanl::Text::RegExp::Pattern::match | ( | std::string const & | text, | |
MatchGroups & | pos, | |||
int | eflags = 0 | |||
) |
Matches the text in text and returns the matching positions in pos, expressed as byte-offset from start.
text | the string to match. | |
pos | the identified matching positions. | |
eflags | any combinations of EvaluateFlags |
int Tanl::Text::RegExp::Pattern::match | ( | const char * | start, | |
const char * | end, | |||
MatchGroups & | pos, | |||
int | eflags = 0 | |||
) |
Matches the text between start and end and returns the matching positions in pos, expressed as byte-offset from start.
start | start of the text to match. | |
end | end of the text to match. | |
pos | the identified matching positions. | |
eflags | any combinations of EvaluateFlags |
Definition at line 144 of file RegExp.cpp.
References Tanl::Text::RegExp::MatchGroups::size().
Referenced by Tanl::TokenSentenceReader::MoveNext(), and Tanl::ConllXSentenceReader::MoveNext().
int Tanl::Text::RegExp::Pattern::matchSize | ( | std::string const & | text, | |
int | eflags = 0 | |||
) |
compute the size of the match.
text | the text to match. | |
eflags | any combinations of EvaluateFlags. |
Assignement.
Hack to avoid freeing twice _pcre.
Definition at line 155 of file RegExp.h.
References _errorCode, _pcre, _pcre_extra, and subpatterns.
std::string Tanl::Text::RegExp::Pattern::replace | ( | std::string & | text, | |
std::string & | with, | |||
bool | replaceAll = false | |||
) |
Replaces the first substring matching the expression within text with the string with.
If replaceAll is true, all occurrences are replaced.
const unsigned char * Tanl::Text::RegExp::Pattern::setLocale | ( | char const * | locale | ) | [static] |
Set the locale for use during matching.
Use "en_US.iso885915" or similar for recognizing ISO Latin-15 letters.
Definition at line 30 of file RegExp.cpp.
References CharTables.
bool Tanl::Text::RegExp::Pattern::test | ( | char const * | str, | |
size_t | len = 0 , |
|||
int | eflags = 0 | |||
) |
Tests if the pattern matches at given string str, within the given length len.
str | the string to match. | |
len | the length of the string to match. | |
eflags | any combinations of EvaluateFlags |
Definition at line 91 of file RegExp.cpp.
bool Tanl::Text::RegExp::Pattern::test | ( | std::string const & | str, | |
int | eflags = 0 | |||
) |
Tests if the pattern matches at given string str.
str | the string to match. | |
eflags | any combinations of EvaluateFlags |
Referenced by Parser::State::predicates(), and Parser::ParseState::transition().