LRE 62-080: COMPASS

COMPASS

Adapting bilingual dictionaries for on-line COMPrehension ASSistance

Objectives

With texts more and more commonly on an electronic medium (e. g. word processor, CD-ROM), paperbased bilingual dictionaries are no longer as much a part of the normal working environment. The COMPASS project will evaluate, complement and convert common dictionaries to make them suitable for a computer based word and multi-word idiom translator. Moreover bilingual dictionaries have traditionally been dominated by the requirement of people either composing in the foreign language, of translating into or out of it. The emphasis of the project will be on the process of adapting existing bilingual dictionaries for foreign language comprehension, and on evaluating the user's response to comprehension tools.

The aim is to implement two bilingual dictionaries (English-French and German-English) as on-line context sensitive comprehension dictionaries. It presupposes that a user has a text on an electrionic medium that he wants to read. Clicking on a word will display a context dependent translation and on request, background information (up to the full dictionary entry) in the user's mother tongue. The system will reveal if the word is part of a multi-word idiom and will select the appropriate translation depending on the syntactic context.

Approach and Methodology

The project is founded on the insight that recent advances in parsing technology may have made it possible for the look-up device itself to detect relevant features of a word's or phrase's syntactic context. At the same time, significant-sized dictionaries can now be stored in a hand-held or lap-top device. Hence this could support a display of what is being read and a context-sensitive system to look up unknown words and phrases. The system could keep a useful record of what the reader needed to look up, and hence may wish to review or memorise.

The starting points of the project are: the English-French SGML-marked machine readable Oxford-Hachette dictionary, the type-setter tape of the Collins German-English dictionary and a prototype called LOCOLEX that is already under development by the coordinator. The Compass project will improve this prototype through performance tuning, adding of the German-English language pair, adapting it to the specific needs of comprehending a foreign language and implementing a user interface that integrates LOCOLEX in the user's environment.

The LOCOLEX prototype carries out a morphological analysis of the sentence in which the selected word occurs and a stochastic disambiguation of the word class information. The information is then matched against the dictionary. When words with several meanings are used in a context in which there are no exploitable features that allow one to select the appropriate sense, the entry is structured as a tree and information associated with the most general node is displayed allowing the user to zoom into the appropriate sub sense.

The dictionaries will be adapted to comprehension needs by filtering out non-relevant information and many contextualising indicators, by decreasing the metalanguage and by reinforcing the treatment of the multi-word lexemes. The hierarchical structure of the dictionaries will be made explicit by transforming the source text of both dictionaries into lexical databases. The conversions starting from SGML and type-setting tape will be compared and conversion guidelines will be drawn up. Lexical gaps, missing words or collocations detected by the statistical analysis of text corpora will be filled. The human look-up process will be analysed to design a user-friendly human-computer interface. The consortium will carry out the following actions:

  1. Specification of the necessary features of bilingual comprehension dictionaries,
  2. Development of methods to analyse and evaluate existing bilingual on-line dictionaries and to adapt them for the purpose of language comprehension,
  3. Validate the methods applied to existing dictionaries, English-French and for German-English,
  4. Implement a user interface that integrates LOCOLEX into the user's working environment,
  5. Evaluate and test the system with users reading foreign language texts.

Exploitation and Future Prospects

The project concerns the large number of people who have some knowledge of a foreign language but not enough to read it efficiently. Since texts on electronic media are becoming more and more popular (CD-ROM, on-line newspaper, electronic mail), the number of potential users of this type of device is growing rapidly. Hence the coordinator, Xerox, may integrate a further development of the prototype in one of its commercial products.

The project will provide methods and tools aimed at facilitating the reuse of existing lexica and at creating machine-processable lexical resources. It differs from other existing projects intending to convert printed dictionaries into computer-tractable ones in the sense that the dicitionaries are developed to meet a specific purpose: foreign language comprehension. Secondary results will be an improvement of the University of Tübingen German tagger and a contrastive study and encoding guidelines for two dictionaries' conversions starting from SGML and type-setter format.

The Compass consortium intends to collabrate with the EAGLES lexicon committee and will develop contacts with partners of the ACQUILEX 2 project.

Contact Point

Mrs. Annie ZAENEN
Rank Xerox Research Centre
Immeuble Le Quartz		Tel.:	+33 76 61 50 50
6, chemin de Maupertuis		Fax.:	+33 76 61 50 99
F 38420 Meylan			e-mail: annie.zaenen@xerox.fr

Partners

Start Date:
March 1994
Duration:
24 months
Resources:
152 person-months
Estimated total cost:
1.182.181 ECU
Helmut Feldweg / Seminar für Sprachwissenschaft / Universität Tübingen / feldweg@sfs.nphil.uni-tuebingen.de