|
Linguistic Annotation |
|
Introduction
|
|
The main goal of the DEREKO corpus is to provide a large general purpose
resource for the German language. A linguist or an engineer using such a
resource will expect detailed yet reliable information. The state of the art
in syntactic annotation, however, shows that beyond the syntactic level of
chunks, automatic syntactic annotation has to deal with rapidly increasing
ambiguity, which ultimately either decreases the quality of automatic
annotation, slows down annotation speed, or requires more knowledge - or all at
once.
For DEREKO, a finite-state approach to parsing was adopted, solving the
problems of speed, accuracy and resources outlined above. Finite-state grammars
can be applied efficiently, so that huge volumes of text can be processed
quickly. Second, the phenomena that can be described by finite-state grammars
coincide with those syntactic phenomena that are only moderately ambiguous.
Annotation using finite-state grammars is still very useful for linguistic
research, as the overall syntactic ambiguity is reduced, and further annotation
can take direct advantage of it. Please browse the
documentation for details about the linguistic markup, and details about
implementing the robust and efficient annotation system. Please also have a
look at a small sample.
|
|