|
Corpus Exploitation |
|
Query Collection
|
|
The Query collection is grouped in two layers:
- The first layer of queries builts recursive syntactic structures on top of
the chunk annotation (see Shallow-Parsing
Stylebook for German for more details).
- The second layer of queries extracts corpus evidence for specific
lexicographic tasks.
The query results of the first layer are added to the corpus annotation
for efficiency reasons. Thus, the basic syntactic analyses do not have
to be reconstructed for each extraction query. The queries identify, among others, the
following syntactic phrases:
- pre-head recursive embedding, e.g., complex APs involving
embedding of PPs and NPs.
- post-head recursive embedding of genitive NPs and named entities
Neither construction is part of the chunk analysis.
Besides, certain lexical properties of terminal nodes are added. These
properties are projected from the head to the chunk. The lexical information is
used during the extraction process either to look for chunks with specific
annotations or to exclude them.
The second layer of queries make use of both the chunk annotation and the
information added by the first layer queries in order to extract corpus
evidence for lexicographic and linguistic purposes.
|
|