DEREKO - Corpus Exploitation - Query Collection

DEREKO is a joint effort of
Acquisition	Annotation	Exploitation
IDS Mannheim	SfS Tübingen	IMS Stuttgart


	The Project

	Acquisition and Document Annotation
		Please see IDS

	Linguistic Annotation
		Introduction
		Documentation
		Sample
		Contact

	Corpus Exploitation
		Introduction
		Query Collection
		Documentation
		Sample
		Contact

Corpus Exploitation
	Query Collection

The Query collection is grouped in two layers:

The first layer of queries builts recursive syntactic structures on top of the chunk annotation (see Shallow-Parsing Stylebook for German for more details).
The second layer of queries extracts corpus evidence for specific lexicographic tasks.

The query results of the first layer are added to the corpus annotation for efficiency reasons. Thus, the basic syntactic analyses do not have to be reconstructed for each extraction query. The queries identify, among others, the following syntactic phrases:

pre-head recursive embedding, e.g., complex APs involving embedding of PPs and NPs.
post-head recursive embedding of genitive NPs and named entities

Neither construction is part of the chunk analysis.

Besides, certain lexical properties of terminal nodes are added. These properties are projected from the head to the chunk. The lexical information is used during the extraction process either to look for chunks with specific annotations or to exclude them.

The second layer of queries make use of both the chunk annotation and the information added by the first layer queries in order to extract corpus evidence for lexicographic and linguistic purposes.

Please contact kcl@ims.uni-stuttgart.de for more information. Site last modified Sun Sep 26 2004.