INTRODUCTION
DeSR is a dependency parser for natural language sentences.
Among its notable features:
- accuracy: close to state of the art accuracy
- efficiency: it can parse up to 200 sentence/sec
- multilingual: it can be trained from an annotated corpus on multiple languages
- customizable: features used in training can be customized.
Technique
DeSR is a shift-reduce dependency parser, which uses a variant of the approach
of Yamada and Matsumoto (2003).
Dependency structures are built scanning the input from left to right and
deciding at each step whether to perform a shift or to create a dependency
between two adjacent tokens.
DeSR uses though a different set of rules and includes additional rules to
handle non-projective dependencies that allow parsing to be performed
deterministically in a single pass.
The algorithm also produces fully labeled dependency trees.
A classifier is used for learning and predicting the proper parsing
action.
The parser can be configured, selecting among several learning algorithms
(Averaged Perceptron, Maximum Entropy, memory-based learning using TiMBL,
support vector machines using libSVM), providing user-defined feature models,
and selecting input-output formats (including the CoNLL shared task format).
Download
Sources can be downloaded from
SourceForge, while these are precompiled binaries for Linux Fedora Core 5:
- Linux/Unix
- desr (32 bit version), desr64 (64 bit version)
- sample configuration file
- desr.conf
- model for Spanish
- spanish.AP.gz
.
Usage
Suppose you have both the parser and the configuration file in the same
directory, you call:
desr -t -m modelFile trainFile
to produce a model from a training corpus in CoNLL format.
To parse sentences in CoNLL format, use:
desr -m modelFile parseFile > parsedFile
If you plan to use the downloaded model file, first gunzip it.
For a full list of options, type:
desr -h
Several classifiers are available, including:
Maximum Entropy (-aME), Perceptron (-aAP), MBL (-aMBL) or SVM (-aSVM).
The algorithm can also be specified in the configuration file
desr.conf as well as the features to be used.
Be careful using option SecondOrder, since it may considerably
increase the model size.
References