DeSR Dependency Parser

INTRODUCTION

DeSR is a dependency parser for natural language sentences.

Among its notable features:

accuracy: close to state of the art accuracy
efficiency: it can parse up to 200 sentence/sec
multilingual: it can be trained from an annotated corpus on multiple languages
customizable: features used in training can be customized.

Technique

DeSR is a shift-reduce dependency parser, which uses a variant of the approach of Yamada and Matsumoto (2003). Dependency structures are built scanning the input from left to right and deciding at each step whether to perform a shift or to create a dependency between two adjacent tokens. DeSR uses though a different set of rules and includes additional rules to handle non-projective dependencies that allow parsing to be performed deterministically in a single pass. The algorithm also produces fully labeled dependency trees. A classifier is used for learning and predicting the proper parsing action. The parser can be configured, selecting among several learning algorithms (Averaged Perceptron, Maximum Entropy, memory-based learning using TiMBL, support vector machines using libSVM), providing user-defined feature models, and selecting input-output formats (including the CoNLL shared task format).

Download

Sources can be downloaded from SourceForge, while these are precompiled binaries for Linux Fedora Core 5:

Linux/Unix: desr (32 bit version), desr64 (64 bit version)
sample configuration file: desr.conf
model for Spanish: spanish.AP.gz

Usage

Suppose you have both the parser and the configuration file in the same directory, you call:

   desr -t -m modelFile trainFile

to produce a model from a training corpus in CoNLL format.

To parse sentences in CoNLL format, use:

   desr -m modelFile parseFile > parsedFile

If you plan to use the downloaded model file, first gunzip it.

For a full list of options, type:

   desr -h

Several classifiers are available, including:
Maximum Entropy (-aME), Perceptron (-aAP), MBL (-aMBL) or SVM (-aSVM). The algorithm can also be specified in the configuration file desr.conf as well as the features to be used.

Be careful using option SecondOrder, since it may considerably increase the model size.

References

G. Attardi. 2006. Experiments with a Multilanguage Non-Projective Dependency Parser, Proc. of the Tenth Conference on Natural Language Learning, New York, (NY)
G. Attardi, M. Ciaramita. 2007. Tree Revision Learning for Dependency Parsing, Proc. of the Human Language Technology Conference 2007.
G. Attardi, A. Chanev, M. Ciaramita, F. Dell'Orletta and M. Simi. 2007. Multilingual Dependency Parsing and Domain Adaptation using DeSR. Proceedings the CoNLL Shared Task Session of of EMNLP-CoNLL 2007, Prague.