File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/c90-3085_abstr.xml

Size: 4,028 bytes

Last Modified: 2025-10-06 13:46:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3085">
  <Title>Lexicon I Template Filter Words ~1 ndexes Textual Stories \[Parser i I I %c~Ttl to.co, i Structures ~ I~ Terms An Architectural Overview of NAS</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
USA
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> This project note describes a systern that receives, parses, indexes, and routes news reports. The core of this ,'mtomatic indexer is a parser based on Govermnent-Binding Theory which derives thematic and binding relationships of arguments of the sentences of slories. These syntactic structures are interpreted by a semantic processor which is linked to conceptual representations of terms from a controlled indexing vocabulary. As a result, the system is capable of indexing news with respect to a large set of let+ms that denote Ihe content of the articles.</Paragraph>
    <Paragraph position="1"> I',ACKGROUNI) With the rapidly increasing \olume of text being generated.</Paragraph>
    <Paragraph position="2"> transmitted, processed, and stored, it becomes critical that informalion retrieval and routing be highly efficient, both in time of processing and accuracy. To this end, indexing techniques have become the prinmry focus of much research, a~d 3'el dlese methods have relied on automatic keyword identification from texts. This is not to say that natural hmguagc techniques have not been examined with respect to their relevance for indexing and retrieval (cf./Sparck Jones and Kay 1973/, /Walker, Karlgren, and Kay 1977/, and more recently,/Sahon mid Smith 1989/). It is that most systems rely on the presence or absence of keywords with additional mechanisms such as proximity constraints, statistical weighting, word-stem truncation, and boolean relrieval expressions. However, these methods do ~ot take into account the syntactic and semantic structure inherent in tile text being indexed. That is, they make virtually no use of the fact that it is natural language and not a collection of arbitrary strings of characters that is being processed.</Paragraph>
    <Paragraph position="3"> Natural language processing (NLP) can make its most valuable cont,'ibutions to those aspects of indexing where the keyword approaches fail, viz., the assignment of terms ~.o text based on their semantic or conceptual content. This involves deriving abstract relationships among conceptual units. For example, consider a story stating: ( 1 ) China bought 6,000 tonncs of wheat from the United States.</Paragraph>
    <Paragraph position="4"> \[)he plausible categorization of (1) is thai it is about foreign trade. However, the phrase &amp;quot;foreign trade&amp;quot; does not appear in (1), and it is invariably absent fl'om foreign trade stories in general. Furthermore, it is extremely unlikely that such foreign trade stories cottld be retrieved in an efficient manner, i.e., with a tew simple queries. The central issue is that although the particulars (e.g., country names and types o fcommodilies) wuy, the basic lneanings of foreign trade slodes are equivalent at some level, and that this level is wduable for indexing purposes. This suggests that systems that could operate at a conceptual level would be capable of indexing in ways that cotdd permit highly effective retrieval.</Paragraph>
    <Paragraph position="5"> It is with the assumption that NLP technology can provide tile means of categorizing text that guide several recent efforts. In particular, each of/Hayes et al. 1988/,/Kuhns 1988/, and/Rau and Jitcobs 1988/ describes systems that characterize news reports with results that could llOt be obtained by keyword methods alone. Since a news analysis system (NAS) was first reported in/Kuhns 1988/, a number of major enhancements to its design and underlying functionality have been incorporated.</Paragraph>
    <Paragraph position="6"> It is the purpose of this paper to report on tbe current state of NAS and ils COlllpolleI\]ts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML