File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/j96-2002_abstr.xml

Size: 4,536 bytes

Last Modified: 2025-10-06 13:48:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="J96-2002">
  <Title>DATR: A Language for Lexical Knowledge Representation</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Irregular lexemes are standardly regular in some respect. Most are just like regular lexemes except that they deviate in one or two characteristics. What is needed is a natural way of saying &amp;quot;this lexeme is regular except for this property.&amp;quot; One obvious approach is to use nonmonotonicity and inheritance machinery to capture such lexical irregularity (and subregularity), and much recent research into the design of representation languages for natural language lexicons has thus made use of nonmonotonic inheritance networks (or &amp;quot;semantic nets&amp;quot;) as originally developed for more general representation purposes in Artificial Intelligence. Daelemans, De Smedt, and Gazdar (1992) provide a rationale for, and an introduction to, this body of research and we will not rehearse the content of that paper here, nor review the work cited there. 1 DATR is a rather spartan nonmonotonic language for defining inheritance networks with path/value equations. In keeping with its intendedly minimalist character, it lacks many of the constructs embodied either in general-purpose knowledge representation languages or in contemporary grammar formalisms. But the present paper seeks to  Calder (1994), Copestake (1992), Daelemans (1994), Daelemans and De Smedt (1994), Ide, Le Maitre, and V6ronis (1994), Lascarides et al. (1996), Mellish and Reiter (1993), Mitamura and Nyberg (1992), Penn and Thomason (1994), Reiter and Mellish (1992), Young (1992), and Young and Rounds (1993). (~) 1996 Association for Computational Linguistics Computational Linguistics Volume 22, Number 2 show that the language is nonetheless sufficiently expressive to represent concisely the structure of lexical information at a variety of levels of language description.</Paragraph>
    <Paragraph position="1"> The development of DATR has been guided by a number of concerns, which we summarize here. Our objective has been a language that (i) has an explicit theory of inference, (ii) has an explicit declarative semantics, (iii) can be readily and efficiently implemented, (iv) has the necessary expressive power to encode the lexical entries presupposed by work in the unification grammar tradition, and (v) can express all the evident generalizations and subgeneralizations about such entries. Our first publications on DATR (Evans and Gazdar 1989a, 1989b) provided a formal theory of inference (i) and a formal semantics (ii) for DATR and we will not recapitulate that material here. 2 With respect to (iii), the core inference engine for DATR can be coded in a page of Prolog (see, e.g., Gibbon 1993, 50). At the time of writing, we know of a dozen different implementations of the language, some of which have been used with large DATR lexicons in the context of big NLP systems (e.g., Andry et al. 1992; Cahill 1993a, 1994; Cahill and Evans 1990). We will comment further on implementation matters in Section 5, below. However, the main purpose of the present paper is to exhibit the use of DATR for lexical description (iv) and the way it makes it relatively easy to capture lexical generalizations and subregularities at a variety of analytic levels (v). We will pursue (iv) and (v) in the context of an informal example-based introduction to the language and to techniques for its use, and we will make frequent reference to the DATR-based lexical work that has been done since 1989.</Paragraph>
    <Paragraph position="2"> The paper is organized as follows: Section 2 uses an analysis of English verbal morphology to provide an informal introduction to DATR. Section 3 describes the language more precisely: its syntax, inferential and default mechanisms, and the use of abbreviatory variables. Section 4 describes a wide variety of DATR techniques, including case constructs and parameters, Boolean logic, finite-state transduction, lists and DAGs, lexical rules, and ways to encode ambiguity and alternation. Section 5 explores more technical issues relating to the language, including functionality and consistency, multiple-inheritance, modes of use, and existing implementations. Section 6 makes some closing observations. Finally, an appendix to the paper replies to the points made in the critical literature on DATR.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML