Some thoughts on treebank design and grammars

Hendrik Feddes and Frank Schumacher
Arbeitsbereich Linguistik, Universität Münster

Abstract

One of the most important decisions that one has to make before starting to produce a treebank is what kind of grammar should be used. Traditional phrase structure grammars have been shown to cause considerable problems for languages like German which does not have a fixed word order like English and which shows many discontinuous phenomena. We present some ideas for grammar formalisms that try to overcome the limitations of simple phrase structure models while on the other hand still being suitable for tasks like grammar induction. We will also demonstrate how these grammar conceptions could be translated into annotation schemata for corpora. It will be shown how these schemata can be integrated into the existing corpus annotation scheme in Muenster and which tools will have to be added to the existing toolset.


doug@essex.ac.uk