LRE 62-080:
Adapting bilingual dictionaries for on-line COMPrehension ASSistance
The aim is to implement two bilingual dictionaries (English-French and German-English) as on-line context sensitive comprehension dictionaries. It presupposes that a user has a text on an electrionic medium that he wants to read. Clicking on a word will display a context dependent translation and on request, background information (up to the full dictionary entry) in the user's mother tongue. The system will reveal if the word is part of a multi-word idiom and will select the appropriate translation depending on the syntactic context.
The starting points of the project are: the English-French SGML-marked machine readable Oxford-Hachette dictionary, the type-setter tape of the Collins German-English dictionary and a prototype called LOCOLEX that is already under development by the coordinator. The Compass project will improve this prototype through performance tuning, adding of the German-English language pair, adapting it to the specific needs of comprehending a foreign language and implementing a user interface that integrates LOCOLEX in the user's environment.
The LOCOLEX prototype carries out a morphological analysis of the sentence in which the selected word occurs and a stochastic disambiguation of the word class information. The information is then matched against the dictionary. When words with several meanings are used in a context in which there are no exploitable features that allow one to select the appropriate sense, the entry is structured as a tree and information associated with the most general node is displayed allowing the user to zoom into the appropriate sub sense.
The dictionaries will be adapted to comprehension needs by filtering out non-relevant information and many contextualising indicators, by decreasing the metalanguage and by reinforcing the treatment of the multi-word lexemes. The hierarchical structure of the dictionaries will be made explicit by transforming the source text of both dictionaries into lexical databases. The conversions starting from SGML and type-setting tape will be compared and conversion guidelines will be drawn up. Lexical gaps, missing words or collocations detected by the statistical analysis of text corpora will be filled. The human look-up process will be analysed to design a user-friendly human-computer interface. The consortium will carry out the following actions:
The project will provide methods and tools aimed at facilitating the reuse of existing lexica and at creating machine-processable lexical resources. It differs from other existing projects intending to convert printed dictionaries into computer-tractable ones in the sense that the dicitionaries are developed to meet a specific purpose: foreign language comprehension. Secondary results will be an improvement of the University of Tübingen German tagger and a contrastive study and encoding guidelines for two dictionaries' conversions starting from SGML and type-setter format.
The Compass consortium intends to collabrate with the EAGLES lexicon committee and will develop contacts with partners of the ACQUILEX 2 project.
Mrs. Annie ZAENEN Rank Xerox Research Centre Immeuble Le Quartz Tel.: +33 76 61 50 50 6, chemin de Maupertuis Fax.: +33 76 61 50 99 F 38420 Meylan e-mail: annie.zaenen@xerox.fr