The education of the students will follow an `apprenticeship' model. That is, the students will pursue their individual researches within a collaborative project-based research environment. The programme will offer graduate students from Bulgaria and CEE the opportunity to pursue their research and education within the broad computational, linguistic, mathematical and philosophical framework necessary nowadays for advanced research in computational linguistics and knowledge representation. The default working language of the programme will be English.
Though knowledge representation and computational linguistics clearly address broadly similar research problems, research within each of these fields has hitherto been largely ignorant of research within the other. This ignorance is doubly unfortunate, since interdisciplinary research in knowledge representation and computational linguistics would be likely to yield important scientific advances in the representation, use and acquisition of linguistic knowledge, advances with obvious potential for industrial application in products as diverse as message understanding software, automatic language acquisition devices, user-friendly network navigators, intelligent information retrievers, and machine-aided translation tools. However, the ignorance the two fields have of each other both fosters and is fostered by a wide gulf between the educations received by students of knowledge representation and students of computational linguistics. Attempts to break this vicious circle began in the mid 1980's with the founding of several institutionalised interdisciplinary programmes of education that included computational linguistics and knowledge representation. However, the enormous political and economic changes that occurred at the end of the 1980's meant that such programmes were extremely rare in CEE.
CLARK will be an international programme of graduate education in knowledge representation and computational linguistics, with sites at the SfS in Germany and the LML in Bulgaria. More specifically, the programme will supervise a number of graduate students - primarily from Bulgaria and CEE - and will guide them to the completion of their graduate degrees. All doctoral students to be supported by a CLARK fellowship are expected to complete their doctoral theses during the 24 months of funding that is available to them under the CLARK program. In addition, the CLARK program will produce high-quality teaching materials to be aired and tested at annual CLARK summer schools, the first in Bulgaria and the second in Germany. At the second summer school all doctoral students funded by a CLARK fellowship are expected to present the results of their doctoral thesis research.
The ubiquity of knowledge within linguistics provides far too wide a field for us to teach in its entirety. Consequently, we have decided to concentrate on a part of this field that readily exploits our existing expertise. We will teach methods for representing, acquiring and using linguistic knowledge represented as formal theories in feature logics. We do so in the belief that feature logics bestow a number of advantages over existing symbolic approaches to representing linguistic knowledge. By far the most important advantage is that feature logics simultaneously support both linguistic and knowledge representation formalisms. On the one hand, feature logics have been extensively used to support formalisms for various unification grammars such as the Lexical Functional Grammar (LFG) of [Kaplan and Bresnan 1982] (see [Johnson 1988]) and constraint grammars such as the Head-driven Phrase Structure Grammar (HPSG) of [Pollard and Sag 1994] (see [King 1989], [Carpenter 1992] and [Pollard 1998]). On the other, similarities between feature logics and knowledge representation languages have long been recognised (see [Backofen, Trost and Uszkoreit 1991], [Nebel and Smolka 1991], [Manandhar 1993] and [Simov 1995b]). These include set denoting symbols, binary attributes, full Boolean connectivity, and constraints over the domains and ranges of attributes. Such similarities allow us to see feature logics as knowledge representation languages, and to try to equip them with the services necessary for the acquisition and use of linguistic knowledge.
Simov is a research fellow at the LML, where he helped to create several large morphological dictionaries of Bulgarian (see [Simov et al. 1990], [Simov et al. 1992], [Simov and Popov 1996] and [Popov, Simov and Vidinska 1997]), and lead two research projects concerned with knowledge representation (see [Simov and Boynov 1994] and [Simov 1997]). His current research interests include logic-based knowledge representation languages (terminological and feature logics, Conceptual Graphs, the Knowledge Interchange Format (KIF), and the Knowledge Query and Manipulation Language (KQML)), knowledge acquisition and reuse, natural language processing systems (grammar engineering, acquisition of linguistic knowledge, morphology, HPSG), and adaptive user interfaces. His expertise in knowledge representation includes using SRL as a knowledge representation language (see [Simov 1995b]). His work with morphological grammars and dictionaries has given him considerable practical experience in the management of very large linguistic knowledge bases. Moreover, King and Simov have successfully worked together for some time on a variety of research topics, including the complexity of SRL modelability (see [King, Simov and Aldag 1998]) and the management of knowledge represented as SRL theories (see [King and Simov 1998]).
The central idea of SFB 340 is to make insights and results of linguistic theories utilisable for the development of computational linguistics. Research concentrates on the requirements and standards which have to be fulfilled in order to successfully integrate structural descriptions of linguistic phenomena and algorithmic realisations of linguistic processes within the development of language-understanding and/or language-generating systems. Altogether, SFB 340 consists of fifteen subprojects, seven of which are located in Tübingen. The range of subjects includes theoretical syntax and semantics as well as their application in automatic text analysis.
The Graduiertenkolleg ``Integriertes Linguistik-Studium'', associated with the Seminar für Sprachwissenschaft, is a graduate school for interdisciplinary language-related studies funded by the Deutsche Forschungsgemeinschaft and located in Tübingen. The Graduiertenkolleg focuses on the interface of theoretical and computational linguistics with computer science, philosophy of mind, logic and linguistic studies of individual languages (particularly Slavic, Romance and Germanic languages). It is expected that there will be significant collaboration between the twelve doctoral and two post-doctoral fellows of the Graduiertenkolleg with the advanced training and research measures outlined in the current proposal.
The LML is a research institution within the Bulgarian Academy of Sciences where projects are in progress on natural language processing (ranging from the semantics of affixes to the syntactic and semantic analysis of sentences and texts), formalisms for knowledge representation, and large computer dictionaries. Also, the LML has enjoyed a long association with the St. Kliment Okhridski University, Sofia, Bulgaria whereby members of the LML give courses in knowledge representation and natural language processing, and supervise master's students, and students participate in the projects of the LML.
Both the SfS and the LML have a good record of participation in international research collaborations. For example, the SfS and LML are individually partners in a number of East-West European collaborative projects funded by the European Union Copernicus initiative. Such projects include
TELRI: Trans-European Language Resources Infrastructure. Goal: to create a viable network between leading language and language technology centres in Europe in order to provide a neutral platform where public domain language resources and expertise in language technology are built up, made available and disseminated. Partners: 22 university and academic institutions including the LML.In addition, the SfS will be a partner in a future project ``Extending Computational Grammars by Learning'' funded by the European Union Training and Mobility for Researchers (TMR) initiative. Though not an East-West European collaboration, both the overall aim of the TMR project and several of its individual subprojects - particularly the SfS-based ``Feature Estimation'' subproject concerned with estimating parameters in stochastic feature logics - are of great relevance to the CLARK programme. Even more relevant to the CLARK programme are the fruitful research collaborations the SfS and LML have already enjoyed. Via several visits by Simov and Nevelin Boynov to the SfS and by King to the LML since March 1994, a close working relation has developed, culminating in a number of published papers and the port of a natural language processing system.GLOSSER. Goal: to apply language processing techniques - morphological processing and corpora analysis - to computer-assisted language learning (CALL). Partners: Gröningen University, The Netherlands; Rank Xerox Research Centre, France; Morphologics, Hungary; Tartu University, Estonia; LML.
BILEDITA: Bilingual Electronic Dictionaries and Intelligent Text Alignment. Goal: to provide a uniform dictionary format for existing electronic dictionaries, to create a uniform lexical encoding scheme in terms of both form and content of the lexical entries, and to elaborate a uniform morphological model. Partners: München University, Germany; University of Paris 7, France; University of Warsaw, Poland; TOO Information Systems and Technologies Ltd., Russia; LML.
STEEL: Developing Specialised Translation/Foreign Language Understanding Tools for Eastern European Languages. Goal: to extend the functionality of existing translation aid tools to languages of CEE (Czech and Polish) with a special interest in providing translation assistance for technical and specialised documentation. Partners: Rank Xerox Research Centre, France; Université Louis Lumière, France; Charles University, Czech Republic; Moravia translations a.s., Czech Republic; University of Warsaw, Poland; Lexis, Poland; SfS.
In addition to collaborative research, both host institutions are eager to support international graduate education. For example, the SfS, via the Tübingen office of the International Centre (IC), has taken an active interest in the harmonisation of doctoral education in East and West Europe by actively seeking to implement the policy document ``IC Programs for Ph.D. Students'' that arose in part from the Volkswagen-Stiftung funded ``IC Conference on Training of Ph.D. Students'' in Budapest, 1995. This is reflected in several IC sponsored visits to the SfS by East European scholars (including three visits by Simov), and the hosting of the Volkswagen-Stiftung funded ``IC Workshop on Computational Linguistics'' in 1996. Though the CLARK programme is not strictly part of the IC initiative, we enjoy excellent relations with the IC Tübingen office, and both we and the IC see the CLARK programme as a test implementation of several key recommendations of ``IC Programs for Ph.D. Students''.
The first line addressed the philosophical repercussions of formulating linguistic theories as formal theories in a feature logic, and considered such fundamental questions as what is it for a theory to be true, how might the truth of a theory be subject to experimental verification or falsification, and what is the ontological status (if any) of a theory. Interim results appeared in [King 1994a] and a number of conference presentations, but personal misgivings about the philosophical adequacy of the results lead King to research further rather than publish [King 1994a]. This research has since borne fruit in [King in prep].
The second line studied the mathematical properties of SRL. [King 1989] presented a logic for SRL entailment and showed that this logic is sound and complete. Stephan Kepser, then of the SfS, showed in his master's thesis [Kepser 1994] that the logic was also decidable. The philosophical research undertaken for [King in prep.] unearthed a number of important relations in SRL - including modelability, existential prediction and universal prediction - that were subsequently explored mathematically. For example, Bjørn Aldag of the SfS showed in his master's thesis [Aldag 1997] that no finitary, sound and complete logic for existential prediction can exist, and [King, Simov and Aldag 1998] established that co-r.e. completeness is a least upper bound (though not a lower bound) on the complexity of modelability.
Though built with accuracy more than efficiency in mind, Troll proved to be a surprisingly good system, performing as well as other HPSG implementation systems on the market. Consequently, B4 won funding for not only a new 3 year term, but also a sister project B8 ``An HPSG-Fragment for German''. Götz has since developed a constraint-logic programming successor ConTroll to Troll (see [Götz and Meurers 1997a] and [Götz and Meurers 1997b]), and B8 has both written a large German fragment in pure HPSG fashion as a formal theory in an extension of SRL (see [Hinrichs et al. 1997]) and implemented this theory using ConTroll (see [Courbet et al. 1997]). This theory is an invaluable resource for the programme, since it constitutes a large body of linguistic knowledge that is already represented as a feature-logic theory and implemented as a constraint-logic program.
As principal investigator of SFB 340 project B8, Hinrichs oversees the specification and implementation of a core grammar fragment of German which includes all major syntactic constructions of German and puts special emphasis on adequate linguistic analyses of constituent questions, parentheticals, and verbal clusters.
The first project researched methodologies for establishing semantic correspondences between knowledge bases represented in various knowledge representation languages (see [Simov 1995a]). This methodology was then applied to the problem of using the ACLRN knowledge representation language as a query language for relational databases. The problem was overcome by building an ACLRN knowledge base together with a semantic correspondence between the terminological part of the knowledge base and the relational schemata of the relational database (see [Popova 1994]).
The second project constructed an explicit representation of the control of inference procedures in implementations of declarative knowledge representation languages. This representation allows an expert in some knowledge domain to encode control information to suit specific tasks over that domain. The project developed a special normal form for SRL theories - construed as knowledge bases - and an indexing technique over such normal forms (see [Simov 1997]). The indexing technique enables the automatic reordering of a theory so that the theory exhibits certain relations between elements of the knowledge represented by the theory. The indexing technique also supports the reorganisation of a theory to suit those requirements of a user that are based on knowledge that is not represented by the theory, such as the environment in which the theory is to be used and the type of problem to be decided.
The first approach classified lexical items with respect to a set of morphological classes. The classification itself was done by means of an index over the morphological classes, such that a lexicon writer need only provide minimal information about a particular word in order for it to be correctly classified. In addition to the index, an editor was built to enable checking and editing of the morphological class to which a target word is assigned. The project constructed a Bulgarian dictionary with 32 000 entries and a module for the analysis and synthesis of wordforms (see [Simov et al. 1990] and [Simov et al. 1992]). The system was also applied to Russian nouns, adjectives and verbs, and German nouns.
The second approach automatically constructed a Bulgarian morphological dictionary by extracting the relevant linguistic knowledge about the morphological classes of words from two machine-readable dictionaries. Starting with a minimal grammar of Bulgarian word formation - sufficient to analyse the information in the two machine-readable dictionaries - the project arrived at a complete morphological grammar and a morphological dictionary with 75 000 entries (see [Simov and Popov 1996] and [Popov, Simov and Vidinska 1997]), which together can be used as the morphological component in a system to automatically process the Bulgarian language.
Classificatory systems are widespread in linguistics. For example, consider the declension of German nouns. Each German noun has eight declensions, a singular and a plural declension for each of the cases nominative, accusative, genitive and dative. The declension of each German noun can be classified according to how its nominative singular declension is modified by affixation and umlauting, and almost all German nouns exhibit one of a small set of patterns. This set of patterns constitutes a classification in which each pattern is a label that indicates the class of German nouns whose declensions exhibit the pattern. There is also an index, since the nominative singular, genitive singular and nominative plural declensions of almost all German nouns determine all their declensions. Indeed, all German dictionaries exploit this index, in that each entry for a noun gives only its nominative singular, genitive singular and nominative plural declensions.
Typically, the deduction of a classificatory system from a linguistic theory is done by hand. However, hand-crafting a classificatory system is time consuming, a very significant factor in, say, machine translation, where different classificatory systems for several languages may be required. Moreover, a hand-crafted classificatory system risks unwittingly violating the theory from which it is deduced, with potentially disastrous consequences, as, say, the classification wrongly conflates two distinct classes, or the index puts some linguistic objects into inappropriate classes. For these two reasons alone, a device that effectively generates an accurate classificatory system from a general linguistic theory would be very beneficial to computational linguistics. In [King and Simov 1998] we showed that there exists a device to automatically deduce a classificatory system from a finite SRL theory.
Introduction to Head-Driven Phrase Structure Grammar: This introductory course focused on the mathematical, computational and linguistic aspects of the theory of HPSG. It included a comparison between typed feature logics and knowledge representation formalisms such as Kl-One, and emphasised the constraint-based approach to representing linguistic knowledge inherent in HPSG.Advanced Research Seminar on Head-Driven Phrase Structure Grammar: This course is offered on a regular basis for graduate and advanced undergraduate students. In the past it has been co-taught by Tilman Höhle of Tübingen University, Paola Monachesi now of Utrecht University, Utrecht, The Netherlands, and Carl Pollard of the Ohio State University, Columbus, Ohio. USA. Topics have included the study of German and Romance Languages in HPSG, the syntax-semantics interface, logical foundations of HPSG (with contributions from King), extensions to the expressivity of HPSG's constraint language (with contributions from Frank Richter and Manfred Sailer of the SfS), as well as computational aspects of HPSG (with contributions from Dale Gerdemann, Thilo Götz and Detmar Meurers of the SfS).
Implementation of HPSG Grammar Fragments in ConTroll This course offered hands-on experience to advanced students who are interested in the formalisation of linguistic knowledge in the ConTroll system. The course introduced the theoretical concepts and implementational realisation of the main ingredients of constraint grammars: highly structured lexical representations, constituent structure, and the encoding of well-formedness constraints on grammatical representations. The course combined background lectures with hands-on laboratory sessions. The lectures focused on HPSG, but included a comparison to other constraint and unification grammar formalisms (such as LFG and Extended Categorial Grammar).
Mathematical Logic for Students of the Humanities: This introductory course taught mathematical logic to students with a background in the humanities. Assuming only arithmetic (not set theory), the course covered the syntax and semantics of propositional logic, the soundness and completeness of propositional logic, the syntax and semantics of countable first-order predicate logic, and the soundness and completeness (sketched) of countable first-order predicate logic.Lecture notes or highly recommended text books exist for all of these courses.Set Theory and Mathematical Logic: This advanced course taught set theory and mathematical logic to students with some background in mathematics. The set theory part of the course covered Frege set theory, the class paradoxes, the class/set distinction, basic axiomatic set theory (the existence, pair, union and replacement axioms), ordinals and cardinals, the infinity axiom and its independence, ordinal and cardinal arithmetic, the choice axiom and well ordering, the power axiom and Cantor's theorem, and the foundation axiom and Zermelo-Fraenkel set theory. The mathematical logic part of the course covered the syntax and semantics of first-order predicate logic, the soundness, completeness and compactness of first-order predicate logic, Löwenheim and Skolem's cardinality theorems, Gödel's incompleteness theorems, and Herbrand's theorem and logic programming.
Recursion Theory: This introductory course taught recursion theory to students with mixed backgrounds in linguistics and computer science. The course covered Turing machines, semi-Thue systems, equivalences between the two, decidability and undecidability, diagonalisation and pumping techniques, and the Chomsky hierarchy.
From Unification to Constraint: This advanced course taught a variety of formalisms for HPSG to students with mixed backgrounds in mathematics, computer science and linguistics. The course covered unification formalisms in which algebraic operations are used to construct models of complexes of partial information about linguistic objects, Carpenter feature logic formalisms in which formulae are used to denote models of complexes of partial information about objects, and King feature logic formalisms in which formulae are used to denote sets of objects. A short (5x2 hours) version of this course was presented at the 8th European Summer School in Logic, Language and Information, Prague, the Czech Republic, 1996.Lecture notes exist for all of these courses, including the short summer school course. The lecture notes for the SRL course are being rewritten for publication as [King in prep.].Speciate Reentrant Logic. This advanced course taught SRL and its HPSG applications to students with mixed backgrounds in mathematics, computer science and linguistics. The course covered the syntax and semantics of SRL, the soundness and completeness of SRL, the `junk slot' encoding of relations, expressing HPSG grammars as SRL theories, linguistic truth and exhaustive models, and tokens, types and feature structures.
As mentioned earlier, the scope for applying knowledge representation techniques to linguistics is too broad for the programme to cover in its entirety. Consequently, the programme will offer projects that exploit the existing research strengths of the scientific coordinators in representing, acquiring and using linguistic knowledge represented as formal theories in feature logics. Examples of suitable projects include the following.
Head-driven Phrase Structure Grammar: HPSG grows ever more sophisticated, making new demands of its underlying formalism. Though SRL was created to deal with the most fundamental aspect of an HPSG formalism, namely the classical interpretation of descriptions, the logic has been surprisingly adept at handling these new demands. For example, accounts within SRL have been made of such HPSG exotica as linear precedence (see [Richter and Sailer 1995]) and lexical rules (see [Meurers and Minnen 1997]). Nonetheless, we will develop SRL in order to ensure that it stays abreast of the formal requirements of the most recent advances in HPSG. For example, the ability to recursively define relations among objects is unquestionably one of the most pressing of these requirements. While SRL has hitherto been able to encode all such definitions using so-called ``junk slots'', this technique is both extremely counterintuitive and cumbersome. Better would be an elegant and general method for recursively defining relations that can either be compiled into SRL, or, if necessary, expressed within a properly stronger extension of SRL. Research on such an extension has already begun between King and Simov, and Frank Richter and Manfred Sailer of B8, in order to facilitate B8 in formalising an HPSG account of a large German fragment as an SRL theory. This research has already yielded a tentative specification for Relational SRL (RSRL). We will continue this work and arrive at either an embedding of RSRL in SRL or a well understood logic for a properly more expressive RSRL.Corpus Linguistics: The computational linguistics research community suffers from a number of sharp divisions over approaches to the problem of computationally implementing natural language, such as ``symbolic'' versus ``neural'', ``deterministic'' versus ``stochastic'', and ``theoretical'' versus ``corpus based''. Collaboration across these divides is relatively rare, to the detriment of the entire research community. However, recent times have seen an increase in the number of journals and conferences calling for ``hybrid'' approaches to computational linguistics that bring together hitherto disparate research traditions. By virtue of SRL, HPSG based upon SRL stands in the ``symbolic + deterministic + theoretical'' tradition. However, we will undertake research to bridge the ``theoretical'' versus ``corpus based'' distinction. [Lager 1996] extended the Prolog computer language to include sufficient mark-up features that corpus linguistics can be performed within a formal framework suitable for theoretical linguistics. We will similarly extend SRL, so that theory- and corpus-based linguistics can be performed within a single formal language better suited to linguistics than Prolog. We have picked this divide to bridge because linguistic corpora represent a vast source of linguistic facts that could be used to automatically construct huge linguistic knowledge bases, provided the facts can be extracted in some way. For example, given an SRL with mark up capabilities, a [King and Simov 1998] device for this extended SRL, and good corpus analysis software, the device could readily work in tandem with the corpus-analysis software to automatically digest large corpora and classify the words (or any other suitable target linguistic item) in the corpora.
Description Logics: Kl-One-like knowledge representation languages have been successfully used to represent extralinguistic knowledge in several natural language processing systems (see [Franconi 1994] and [Quantz et al. 1995]). In order to represent such knowledge in SRL without the need for an interface to an external knowledge representation system, we will extend SRL to include relational attributes, general conditions on a type hierarchy, number restrictions and an object level. (Notice that Natali Alt of the SfS has already added relational attributes to SRL in her master's thesis [Alt 1996].) These extensions would allow easy conversion to SRL theories of existing knowledge bases represented in certain Kl-One-based systems.
Tailoring a classificatory system involves changing the granularity of the classes in the system to ensure that the classification truly captures the intuitions of the theory from which it is deduced. Some of the classes produced by the [King and Simov 1998] device can be too general for the linguistic task at hand, while others can be too specific: sometimes it is necessary to make fine distinctions between very specific classes, sometimes it is necessary to allow very general classes. A possible solution to the first problem is for the device to produce not a classificatory system but a classificatory hierarchy built upon a classificatory system, as suggested in the conclusion of [King and Simov 1998]. The second problem can be solved by allowing the user to specify new constraints either on the whole classificatory system or on part of it (via a classificatory hierarchy). These new constraints can be of two forms: low-level constraints imposed when the user wishes to globally change an entire theory; and high-level constraints imposed when the user wishes to locally change specific classes that agree with knowledge already represented by a theory. The new constraints can be achieved by extending the [King and Simov 1998] device.
The [King and Simov 1998] device comprises three algorithms, Class, Index and Clause. The Class algorithm deduces a classification from a finite SRL theory. The Index algorithm deduces an index tree - a finite tree-like structure comprising queries and possible responses to them - from a classification. The Clause algorithm subsequently classifies objects on the basis of the responses of a human or computer oracle to queries read from the index tree by Clause. However, the queries in an index tree need not be those most appropriate to pose to a given oracle. To overcome this problem we must impose additional constraints on which queries can occur in an index tree. If the oracles understand sufficient of the input theory then the problem can be solved by modifying Index to construct, where possible, index trees that comprise only queries from a predefined set of queries. However, if the oracles understand insufficient of the input theory then the problem can be solved by constructing a semantically equivalent theory that the oracles can understand, together with a `translation' between the theories.
The Clause algorithm queries an external oracle in order to elicit sufficient information to classify an object. However, certain inferential processes, such as parsing and generation, themselves pose queries, but to a classification, in order to further instantiate their current information about some objects. [Simov 1997] developed a new indexing technique that involves all of the information in an SRL theory. This indexing technique allows automatic reordering of the evaluation of a query with respect to a theory. It also allows the representation of expert knowledge. We will extend this work in several directions. Firstly, we will extend the current technique to cover the extensions of SRL described earlier. Secondly, in order to use the huge volume of existing natural language text corpora as a source of control information, we expect to incorporate a stochastic mechanism to evaluate the success or failure of an inferential process with respect to a given task and set of texts. Thirdly, we will combine this indexing technique with the modularisation technique of [Simov 1995a] in order to allow the extraction of relevant optimal theories with respect to given tasks. For example, extracting the relevant morphological knowledge and data from a rich morphological grammar and dictionary in order to support a spell checker requires the development of a simplified morphological theory and an appropriate control strategy with respect to the problem of spell checking.
The problem of reclassification is to find - for each object classified under an old classificatory system - the new class it occupies under the new classificatory system. In order to minimise the information needed to reclassify objects, the system must find the differences between the new and old classificatory systems and establish appropriate correspondences among the new and old classes. Once the correspondences are established the system can query an external oracle for additional information about those objects for which the system has insufficient information to reclassify automatically. We are currently considering a process in which the Index algorithm of the [King and Simov 1998] device is directed to construct an index tree that comprises old queries, where possible, so that the Clause algorithm can then take advantage of information already present in the old classification.
Note that classifying objects using a classificatory system can itself indicate changes to the linguistic theory from which the classificatory system was deduced. For example, suppose that a linguistic object cannot be properly assigned a class via the index of a classificatory system. This indicates a fault not with the classificatory system but rather with the theory from which the system was deduced. Simply put, the theory overlooked the object. Clearly, the theory must be modified to accommodate the object, but several problems arise in such a circumstance. What of the existing theory can remain unchanged, and what must be modified? What should those modifications be? We will equip the [King and Simov 1998] device with an abduction mechanism offering appropriate changes to the input theory.
Abduction is an inference rule that modifies a theory in order to accommodate an observation that is at variance with the original theory. Since abduction is known to be unsound, we cannot expect it to work correctly when unconstrained. We will therefore investigate constrained and supervised abduction within a hierarchy of theories, such that an application of abduction to explain an observation at variance with the hierarchy prefers to modify theories low in the hierarchy. For example, suppose that a lexicon writer is constructing a morphological dictionary with respect to a given morphological grammar. The dictionary must be consistent with the grammar. In addition, the morphological grammar must be consistent with universal grammar and possibly other grammars, such as a syntactic grammar or a semantic grammar. If the lexicon writer finds a new morphological fact that conflicts with the morphological grammar then the abduction rule would offer one or more explanations of this fact within the morphological grammar but not within the other grammars.
The way a theory is modified strongly influences reclassification. If the modification is due to an application of an abductive rule within the [King and Simov 1998] device then the reclassification procedure has full knowledge about the modifications to the theory, and can thus build the correspondences between the old and new classifications automatically. But if the modification is external to the device then the reclassification procedure has limited knowledge about the modifications, and must thus consult an external oracle in order to find the right correspondences.
Mathematical Logic: Mathematical logic underlies both feature logics and knowledge representation languages. The course will present the intuitions and techniques of mathematical logic so that students will be able to recognise and exploit mathematical logic in linguistics and knowledge representation. The course materials will be based on existing materials, supplemented and updated as necessary.In order to advertise the CLARK programme outside of the host institutions, we will also submit the best courses from our summer schools to larger international schools, such as the European Summer School in Logic, Language and Information.Knowledge Representation: The course will present an overview of the basic notions of knowledge representation and reasoning. The materials will be example-based in order to give an intuitive understanding of the problems in the area. The course materials will be developed in the programme.
SRL and HPSG: The course will forge the linguistic half of our chosen link between linguistics and knowledge representation. In addition to the mathematical properties of SRL, the course will address the application of SRL to HPSG, focusing particularly on SRL formulations of such notions as linguistic truth and linguistic knowledge. The course material will be based on [King in prep.].
Declarative Knowledge Representation Languages: The course will complement the previous course, and forge the knowledge representation half of our chosen link between linguistics and knowledge representation. The course will present the syntax and semantics of a number of knowledge representation languages based on similar ontological assumptions: objects, sets of objects, and relations over objects. The course will include Kl-One-based languages, Conceptual Graphs, KIF and KQML. The course will also cover general topics such as structures of knowledge bases, functional interfaces to knowledge bases, inference techniques, the open- and closed-world assumptions, and knowledge translations. The course materials will be based on translations of existing Bulgarian materials, and will be further developed in the programme.
Knowledge Management: The course will be based on our existing and ongoing research on knowledge management. The course will include topics such as creating classificatory systems from finite SRL theories, tailoring classificatory systems, indexing over the classes in a classificatory system, reclassification and abduction. The course materials will be developed in the programme.
Implementation of HPSG Grammar Fragments in ConTroll: This course will offer practical hands-on experience to linguists and computer scientists interested in the formalisation of linguistic knowledge in the ConTroll system. The course will be taught in an interactive fashion in a computer laboratory and will combine background lectures with practical exercises on how to specify grammars in ConTroll. Students will be given the opportunity to undertake individualised grammar projects for modelling theoretically and empirically significant syntactic constructions of their native language. The background lectures will introduce the relevant mathematical and computational aspects of the ConTroll system and will focus on the main ingredients of constraint grammars: highly structured lexical representations, constituent structure, and encoding well-formedness constraints on grammatical representations. The course materials will be based on existing on-line teaching materials that have been developed by Hinrichs and Meurers for a course taught at the invitation of the program committee for the 9th Annual European Summer School in Logic, Language and Information, Aix-en-Provence, France, 1997.
Computational Morphology: The course will cover the basic notions in morphology - such as paradigm, lexeme, wordform, affixation and word - as well as problems with the automatic processing of the morphology of a natural language. The main formal approaches, such as two-level morphology, will be presented. Practical problems, such as the construction of a large morphological dictionary, will also be covered. The course materials will be developed in the programme.
[Alt 1996] Natali I. Alt. A typed feature logic with set-valued attributes as a foundation for LP rules. Master's thesis. Seminar für Sprachwissenschaft, Eberhard-Karls-Universität, Tübingen, Germany. 1996.
[Backofen, Trost and Uszkoreit 1991] Rolf Backofen, Harald Trost and Hans Uszkoreit. Linking typed feature formalisms and terminological knowledge representation languages in natural language front-ends. DFKI research report RR-91-28. DFKI, Saarbrücken, Germany. 1991.
[Carpenter 1992] Bob Carpenter. The Logic of Typed Feature Structures. Cambridge Tracts in Theoretical Computer Science, number 32. Cambridge University Press, Cambridge, England. 1992.
[Courbet et al. 1997] Elisabeth Courbet, Kordula De Kuthy, Detmar Meurers, Frank Richter and Manfred Sailer. Ein HPSG-Fragment des Deutschen, Teil 2: Implementierung. Sonderforschungsbereich 340 technical report. Sonderforschungsbereich 340, Seminar für Sprachwissenschaft, Eberhard-Karls-Universität, Tübingen, Germany. 1997. In German.
[Franconi 1994] Enrico Franconi. Description logics for natural language processing. In The Working Notes of the 1994 AAAI Fall Symposium on ``Knowledge Representation for Natural Language Processing in Implemented Systems''. New Orleans, USA. 1994.
[Gerdemann and King 1994] Dale Gerdemann and Paul J. King. The correct and efficient implementation of appropriateness specifications for typed feature structures. In Proceedings of COLING'94, volume 2, pages 956-960. Kyoto, Japan. 1994.
[Götz and Meurers 1997a] Thilo Götz and W. Detmar Meurers. Interleaving universal principles and relational constraints over typed feature logic. In Proceedings of the 35th Meeting of the ACL and 8th Conference of the EACL. Madrid, Spain. 1997.
[Götz and Meurers 1997b] Thilo Götz and W. Detmar Meurers. The ConTroll system as large grammar development platform. In Proceedings of the ACL/EACL post-conference workshop on Computational Environments for Grammar Development and Linguistic Engineering. Madrid, Spain. 1997.
[Hinrichs, Kathol and Nakazawa 1997] Erhard W. Hinrichs, Andreas Kathol and Tsuneko Nakazawa (editors). Complex Predicates in Non-derivational Syntax. Syntax and Semantics Series. Academic Press, San Diego, California, USA. In press.
[Hinrichs and Nakazawa 1989] Erhard W. Hinrichs and Tsuneko Nakazawa. Flipped out: Aux in German. In Proceedings of the 25th Regional Meeting of the Chicago Linguistic Society. Chicago, Illinois, USA. 1989.
[Hinrichs and Nakazawa 1994] Erhard W. Hinrichs and Tsuneko Nakazawa. Linearizing finite Aux in German complex VPs. In John Nerbonne, Klaus Netter and Carl Pollard (editors), German in Head-Driven Phrase Structure Grammar. CSLI Lecture Notes, number 46. CSLI, Stanford, California, USA. 1994.
[Hinrichs and Nakazawa 1996] Erhard W. Hinrichs and Tsuneko Nakazawa. Applying lexical rules under subsumption. In Proceedings of COLING'96. Copenhagen, Denmark. 1996.
[Hinrichs and Nakazawa 1997a] Erhard W. Hinrichs and Tsuneko Nakazawa. PVP and split-NP topicalization in German. In Georgia Green und Beth Levine (editors), Studies in Head-Driven Phrase Structure Grammar. Cambridge University Press, Cambridge, England. In press.
[Hinrichs and Nakazawa 1997b] Erhard W. Hinrichs and Tsuneko Nakazawa. Third construction and VP extraposition in German. In [Hinrichs, Kathol and Nakazawa 1997].
[Hinrichs and Nakazawa in prep.] Erhard W. Hinrichs and Tsuneko Nakazawa. VP relatives in German. Paper presented at the International Conference on Head-Driven Phrase Structure Grammar. In Andreas Kathol, Jean-Pierre Koenig and Gert Webelhuth (editors), Studies in Constraint-Based Lexicalism. CSLI, Stanford, California, USA. In preparation.
[Hinrichs et al. 1994] Erhard W. Hinrichs, Dale Gerdemann, Paul J. King, Guido Minnen, and Thilo Götz. Ergebnisbericht des Teilprojekt B4 ``Constraints on Grammar for Efficient Generation''. In Sonderforschungsbereich 340 ``Sprachtheoretische Grundlagen für die Computerlinguistik'': Arbeits- und Ergebnisbericht 1992-1993-1994, pages 145-187. Sonderforschungsbereich 340, Seminar für Sprachwissenschaft, Eberhard-Karls-Universität, Tübingen, Germany. 1994.
[Hinrichs et al. 1997] Erhard W. Hinrichs, Frank Richter, Detmar Meurers, Manfred Sailer and Heike Winhart. Ein HPSG-Fragment des Deutschen, Teil 1: Theorie. Sonderforschungsbereich 340 technical report 95. Sonderforschungsbereich 340, Seminar für Sprachwissenschaft, Eberhard-Karls-Universität, Tübingen, Germany. 1997. In German.
[Johnson 1988] Mark Johnson. Attribute-Value Logic and the Theory of Grammar. CSLI Lecture Notes, number 16. CSLI, Stanford, California, USA. 1988.
[Kaplan and Bresnan 1982] Ronald M. Kaplan and Joan Bresnan. Lexical-functional grammar: A formal system for grammatical representation. In Joan Bresnan, editor, The Mental Representation of Grammatical Relations, chapter 4, pages 173-281. MIT Press, Cambridge, Massachusetts, USA, 1982.
[Kepser 1994] Stephan Kepser. A satisfiability algorithm for a typed feature logic. Master's thesis. Seminar für Sprachwissenschaft, Eberhard-Karls-Universität, Tübingen, Germany. 1994.
[King 1989] Paul J. King. A Logical Formalism for Head-Driven Phrase Structure Grammar. Doctoral thesis. Department of Mathematics, University of Manchester, Manchester, England. 1989.
[King 1994a] Paul J. King. An expanded logical formalism for Head-driven Phrase Structure Grammar. Sonderforschungsbereich 340 technical report 59. Sonderforschungsbereich 340, Seminar für Sprachwissenschaft, Eberhard-Karls-Universität, Tübingen, Germany. 1994.
[King 1994b] Paul J. King. Reconciling Austinian and Russellian accounts of the liar paradox. The Journal of Philosophical Logic, volume 23, number 5, pages 451-494. 1994.
[King 1994c] Paul J. King. Typed feature structures as descriptions. In Proceedings of COLING'94, volume 2, pages 1250-1254. Kyoto, Japan. 1994.
[King in prep.] Paul J. King. Truth and Verification in Head-driven Phrase Structure Grammar. In preparation.
[King and Simov 1998] Paul J. King and Kiril Iv. Simov. The automatic deduction of classificatory systems from linguistic theories. In Grammars, volume 1, number 2, 1998.
[King, Simov and Aldag 1998] Paul J. King, Kiril Iv. Simov and Bjørn Aldag. The complexity of modelability in finite and computable signatures of a constraint logic for head-driven phrase structure grammar. In The Journal of Logic, Language and Information. In press.
[Lager 1996] Torbjörn Lager A Logical Approach to Computational Corpus Linguistics. Doctoral thesis. Department of Linguistics, University of Göteborg, Göteborg, Sweden. 1996.
[Manandhar 1993] Suresh K. Manandhar. Relational Extensions to Feature Logic: Applications to Constraint Based Grammars. Doctoral thesis. Department of Artificial Intelligence, Faculty of Science and Engineering, University of Edinburgh, Edinburgh, Scotland. 1993.
[Meurers and Minnen 1997] W. Detmar Meurers and Guido Minnen. A computational treatment of lexical rules in HPSG as covariation in lexical entries. Computational Linguistics, volume 23, number 4. 1997.
[Nebel and Smolka 1991] Bernhard Nebel and Gert Smolka. Attributive description formalisms...and the rest of the world. DFKI research report RR-91-15. DFKI, Saarbrücken, Germany. 1991.
[Pollard 1998] Carl J. Pollard. Strong generative capacity in HPSG. In Gert Webelhuth, Jean-Pierre Koenig and Andreas Kathol (editors), Lexical and Constructional Aspects of Linguistic Explanation. CSLI, Stanford, California, USA. 1998.
[Pollard and Sag 1987] Carl J. Pollard and Ivan A. Sag. Information-Based Syntax and Semantics. CSLI Lecture Notes, number 13. CSLI, Stanford, California, USA. 1987.
[Pollard and Sag 1994] Carl J. Pollard and Ivan A. Sag. Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago, Illinois, USA. 1994.
[Popov, Simov and Vidinska 1997] Dimitar G. Popov, Kiril Iv. Simov and Svetlomira M. Vidinska. A Dictionary of Writing, Pronunciation and Punctuation of Bulgarian Language. Atlantis SD, Sofia, Bulgaria. In press. In Bulgarian.
[Popova 1994] Maria P. Popova. Kl-One knowledge representation language as query language to a relational database. Master's thesis. Faculty of Mathematics and Computer Science, St. Kliment Okhridski University, Sofia, Bulgaria. 1994. In Bulgarian.
[Quantz et al. 1995] J. Joachim Quantz, Guido Dunker, Manfred Gehrke, Uwe Küssner and Birte Schmitz. FLEX-based disambiguation in VERBMOBIL. In Alex Borgida, Maurizio Lenzerini, Daniele Nardi and Bernhard Nebel (editors), Proceedings of International Workshop on Description Logics. Dipartimento di Informatica e Sistematica, Universitá di Roma ``La Sapienza'', Rome, Italy. 1995.
[Richter and Sailer 1995] Frank Richter and Manfred Sailer. Remarks on linearization: reflections on the treatment of LP-rules in HPSG in a typed feature logic. Master's thesis. Seminar für Sprachwissenschaft, Eberhard-Karls-Universität, Tübingen, Germany. 1995.
[Simov 1995a] Kiril Iv. Simov. Communication among knowledge bases. Technical report. Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria. 1995. In Bulgarian.
[Simov 1995b] Kiril Iv. Simov. Declarative knowledge representation languages: Kl-One family, speciate re-entrant logic, conceptual graphs - an overview. Technical report. Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria. 1995.
[Simov 1997] Kiril Iv. Simov. Control of inference in declarative knowledge bases. Technical report. Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria. 1997. In Bulgarian.
[Simov and Boynov 1994] Kiril Iv. Simov and Nevelin P. Boynov. Conceptual graphs: the structure of the knowledge base and sublanguages. In Proceedings of the 1st Workshop on Conceptual Structure. Melbourne, Australia. 1994.
[Simov and Popov 1996] Kiril Iv. Simov and Dimitar G. Popov. Creating a morphological dictionary of the Bulgarian Language. In Proceedings of COMPLEX'96 Conference. Budapest, Hungary. 1996.
[Simov et al. 1990] Kiril Simov, Galia Angelova and Elena Paskaleva. MORPHO-ASSISTANT: The proper treatment of morphological knowledge. In Proceedings of COLING'90, volume 3, pages 453-457. Helsinki, Finland. 1990.
[Simov et al. 1992] Kiril Simov, Elena Paskaleva, Mariana Damova and Milena Slavcheva. MORPHO-ASSISTANT - a knowledge based system for Bulgarian morphology. Demo description in Proceeding of Demo Descriptions of Third conference on Natural Language Application. Trento, Italy. 1992.
Frank Richter (fr@sfs.uni-tuebingen.de)