DEREKO
DEREKO is a joint effort of
Acquisition Annotation Exploitation
IDS Mannheim
SfS Tübingen
IMS Stuttgart









Acquisition and Document Annotation


Please see IDS




Linguistic Annotation


Introduction


Documentation


Sample


Contact




Corpus Exploitation


Introduction


Query Collection


Documentation


Sample


Contact







Linguistic Annotation

Introduction

The main goal of the DEREKO corpus is to provide a large general purpose resource for the German language. A linguist or an engineer using such a resource will expect detailed yet reliable information. The state of the art in syntactic annotation, however, shows that beyond the syntactic level of chunks, automatic syntactic annotation has to deal with rapidly increasing ambiguity, which ultimately either decreases the quality of automatic annotation, slows down annotation speed, or requires more knowledge - or all at once.

For DEREKO, a finite-state approach to parsing was adopted, solving the problems of speed, accuracy and resources outlined above. Finite-state grammars can be applied efficiently, so that huge volumes of text can be processed quickly. Second, the phenomena that can be described by finite-state grammars coincide with those syntactic phenomena that are only moderately ambiguous. Annotation using finite-state grammars is still very useful for linguistic research, as the overall syntactic ambiguity is reduced, and further annotation can take direct advantage of it. Please browse the documentation for details about the linguistic markup, and details about implementing the robust and efficient annotation system. Please also have a look at a small sample.





Please contact Tylman Ule for more information. Site last modified Sun Sep 26 2004.