DeSR Dependency Parser |
Namespaces | |
namespace | RegExp |
Regular Expression matching. | |
namespace | Unicode |
Utilities to handle UTF-8 strings. | |
Classes | |
class | Char |
Representation of Unicode characters. More... | |
class | Utf8Char |
This is just a type specifier for use in CharBuffer. More... | |
class | CChar |
This is just a type specifier for use in CharBuffer. More... | |
class | CharBuffer |
A text buffer that provides a random access iterator through it. More... | |
class | Encoding |
class | HtmlTokenizer |
Similar to StringTokenizer, except that it skips HTML tags. More... | |
class | StreamTokenizer |
class | String |
String class This class stores and manipulates strings of characters defined according to ISO10646. More... | |
class | StringTokenizer |
class | Suffixes |
List of string suffix. More... | |
struct | eqstr |
struct | eqstrcase |
struct | WordIndex |
Associates an ID to each word in a set. More... | |
class | WordSetBase |
class | WordSet |
Set of words. More... | |
struct | NormEqual |
Compare strings by normalizing to lowercase and discarding non alphanumeric characters. More... | |
struct | NormHash |
class | NormWordSet |
Typedefs | |
typedef unsigned short | UCS2 |
UCS2 holds a single UTF-16 code unit. | |
typedef int | UCS4 |
UCS4 represents a Unicode code point. | |
Functions | |
char | iso8859_to_ascii (char c) |
Convert an 8-bit ISO 8859-1 (Latin 1) character to its closest 7-bit ASCII equivalent. | |
bool | operator== (const String &s1, const String &s2) |
bool | operator== (const String &s1, const std::string &s2) |
bool | operator== (const String &s1, const char *s2) |
bool | operator== (const std::string &s1, const String &s2) |
bool | operator== (const char *s1, const String &s2) |
bool | operator!= (const String &s1, const String &s2) |
bool | operator< (const String &s1, const String &s2) |
bool | operator> (const String &s1, const String &s2) |
bool | operator<= (const String &s1, const String &s2) |
bool | operator>= (const String &s1, const String &s2) |
String | operator+ (const String &s1, const String &s2) |
String | operator+ (const String &s1, String::CharType *c) |
String | operator+ (String::CharType *c, const String &s1) |
String | operator+ (const String &s1, String::CharType c) |
String | operator+ (String::CharType c, const String &s1) |
bool | strStartsWith (const char *s1, const char *init) |
Determine whether string s1 starts with the sequence in init, disregarding case. | |
void | itoa (register long n, register char *s) |
Convert a long integer to a string. | |
void | to_lower (register char *d, register char const *s) |
Convert a string to lower case. | |
char * | to_lower (register char *s) |
Destructively convert a string to lower case. | |
string & | to_lower (string &s) |
Convert a string to lower case. | |
void | to_upper (register char *d, register char const *s) |
Convert a string to upper case. | |
char * | to_upper (register char *s) |
Destructively convert a string to upper case. | |
string & | to_upper (string &s) |
Convert a string to upper case. | |
char const * | next_token (char const *&ptr, const char *sep, char esc) |
simple string tokenizer, with escape. | |
char * | strstr (const char *haystack, const char *needle, size_t count) |
Variant of strstr() which limits search to count characters in haystack . | |
std::string | operator+ (const std::string s, const int i) |
std::string | operator+ (const int i, const std::string s) |
std::string | operator+ (const std::string s, const unsigned i) |
std::string | operator+ (const unsigned i, const std::string s) |
void | itoa (long, char *) |
String utilities. | |
char | to_lower (char c) |
char * | to_lower (char *) |
std::string & | to_lower (std::string &) |
char | to_upper (char c) |
char * | to_upper (char *) |
std::string & | to_upper (std::string &) |
int | strncasecmp (const char *s1, const char *s2) |
bool | strempty (const char *s) |
Test for empty string. | |
Variables | |
char const | iso8859_map [256] |
char Tanl::Text::iso8859_to_ascii | ( | char | c | ) | [inline] |
Convert an 8-bit ISO 8859-1 (Latin 1) character to its closest 7-bit ASCII equivalent.
(This mostly means that accents are stripped.)
This function exists to ensure that the value of the character used to index the iso8859_map[] vector declared above is unsigned.
c | The character to be converted. |
International Standards Organization. "ISO 8859-1: Information Processing -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1," 1987.
void Tanl::Text::itoa | ( | register long | n, | |
register char * | s | |||
) |
Convert a long integer to a string.
n | The long integer to be converted. | |
s | A pointer to the string. |
Definition at line 59 of file strings.cpp.
char const * Tanl::Text::next_token | ( | char const *& | ptr, | |
const char * | sep, | |||
char | esc | |||
) |
simple string tokenizer, with escape.
if preceded by
esc. | A token is a sequence of characters delimited by characters in | |
sep | except when preceded by | |
esc. | ||
sep | sequence of delimiting characters |
ptr. | Advances ptr to the end of the token. | |
esc | is an escape character for line continuation |
Definition at line 223 of file strings.cpp.
References next_token().
Referenced by next_token().
bool Tanl::Text::strempty | ( | const char * | s | ) | [inline] |
string& Tanl::Text::to_lower | ( | string & | s | ) |
Convert a string to lower case.
s | The string to be converted. |
Definition at line 121 of file strings.cpp.
char* Tanl::Text::to_lower | ( | register char * | s | ) |
Destructively convert a string to lower case.
s | The string to be converted. |
Definition at line 105 of file strings.cpp.
References to_lower().
void Tanl::Text::to_lower | ( | register char * | d, | |
register char const * | s | |||
) |
Convert a string to lower case.
d | The destination string. | |
s | The string to be converted. |
Definition at line 90 of file strings.cpp.
Referenced by strStartsWith(), and to_lower().
string& Tanl::Text::to_upper | ( | string & | s | ) |
Convert a string to upper case.
s | The string to be converted. |
Definition at line 172 of file strings.cpp.
char* Tanl::Text::to_upper | ( | register char * | s | ) |
Destructively convert a string to upper case.
s | The string to be converted. |
Definition at line 156 of file strings.cpp.
References to_upper().
void Tanl::Text::to_upper | ( | register char * | d, | |
register char const * | s | |||
) |
Convert a string to upper case.
d | The destination string. | |
s | The string to be converted. |
Definition at line 141 of file strings.cpp.
Referenced by to_upper().