XML Viewer - e85-1025

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/85/e85-1025_intro.xml
Size: 3,977 bytes
Last Modified: 2025-10-06 14:04:27
<?xml version="1.0" standalone="yes"?>
<Paper uid="E85-1025">
  <Title>TOWARDS A DICTIONARY SUPPORT ENVIRONMENT FOR REALTIME PARSING ABSTRACT</Title>
  <Section position="2" start_page="0" end_page="171" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Recent developments in linguistics, and especially on grammatical theory - for example, Generalised Phrase Structure Grammar' (GPSG) (Gazdar et al., In Press), Lexical Functional Grammar (LFG) (Kaplan &amp; Bresnan, 1982) - and on natural language parsing frameworks - for example, Functional Unification Grammar (FUG) (Kay, 1984a), PATR-II (Shieber, 1984) - make it feasible to consider the implementation of efficient systems for the syntactic analysis of substantial fragments of natural language. These developments also demonstrate that if natural language processing systems are to be able to handle the grammatical and logical idiosyncracies of individual lexical items elegantly and efficiently, then the lexicon must be a central component of the parsing system. Real-time parsing imposes stringent requirements on a dictionary support environment; at the very least it must allow frequent and rapid access to the information in the dictionary via the dictionary head words.</Paragraph>
    <Paragraph position="1"> The idea of using the machine-readable source of a published dictionary has occurred to a wide range of researchers - for spelling correction, lexical analysis, thesaurus construction, machinetranslation, to name but a few applications - very few however have used such a dictionary to support a natural language parsing system. Most of the work on automated dictionaries has concentrated on extracting lexical or other information in, essentially, batch processing (eg. Amsler, 1981; Walker &amp; Amsler, 1983), or on developing dictionary servers for office automation systems (Kay, 1984b). Few parsing systems have substantial lexicons and even those which employ very comprehensive grammars (eg.</Paragraph>
    <Paragraph position="2"> Robinson, 1982; Bobrow, 1978) consult relatively small lexicons, typically generated by hand. Two exceptions to this generalisation are the Linguistic String Project (Sager, 1981) and the Epistle Project (Heidorn et al., 1982); the former employs a dictionary of less than 10,000 words, most of which are specialist medical terms, the latter has well over 100,000 entries, gathered from machine-readable sources, however, their grammar formalism and the limited grammatical information supplied by the dictionary make this achievement, though impressive, theoretically less interesting.</Paragraph>
    <Paragraph position="3"> We chose to employ the Longman Dictionary of Contemporary English (Procter 1978, henceforth LDOCE) as the machine-readable source for our dictionary environment because this dictionary has several properties which make it uniquely appropriate for use as the core knowledge base of a natural language processing system. Most prominent among these are the rich grammatical subcategorisations of the 60,000 entries, the large amount of information concerning phrasal verbs, noun compounds and idioms, the individual subject, collocational and semantic codes for the entries and the consistent use of a controlled 'core' vocabulary in defining the words throughout the dictionary.</Paragraph>
    <Paragraph position="4"> (Michiels (1982) gives further description and discussion of LDOCE from the perspective of natural language processing.) The problem of utilising LDOCE in natural language processing falls into two areas. Firstly, we must provide a dictionary environment which links the dictionary to our existing natural language processing systems in the appropriate fashion and secondly, we must restructure the information in the dictionary in such a way that these systems are able to utilise it effectively. These two tasks form the subject matter of the next two sections.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML