File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/h90-1069_abstr.xml

Size: 4,851 bytes

Last Modified: 2025-10-06 13:46:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1069">
  <Title>Towards Understanding Text with a Very Large Vocabulary</Title>
  <Section position="1" start_page="0" end_page="354" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In order to meet the information processing demands of the next decade, natural language systems must have the capability of processing very large amounts of text, commonly called &amp;quot;messages&amp;quot;, from highly diverse sources written in any of a few dozen languages. One of the key issues in building systems with this scale of competence is handling large numbers of different words and word senses.</Paragraph>
    <Paragraph position="1"> Natural language understanding systems today are typically limited to vocabularies of less than 10,000 words; tomorrow's systems will need vocabularies at least 5 times that to effectively handle the volume and diversity of messages needing to be processed.</Paragraph>
    <Paragraph position="2"> One method of handling large vocabularies is simply increasing the size of the lexicon. Research efforts at IBM \[Chodorow, et al. 1988; Neff, et al. 1989\], Bell Labs \[Church, et al. 1989\], New Mexico State University \[Wilks 1987\], and elsewhere have used mechanical processing of on-line dictionaries to infer at least minimal syntactic and semantic information from dictionary definitions. However, even assuming a very large lexicon already exists, it can never be complete. Systems aiming for coverage of unrestricted language in broad domains must continually deal with new words and novel word senses.</Paragraph>
    <Paragraph position="3"> Systems with very large lexicons have the additional problems of an exploding search space, of disambiguating multiple syntactic and semantic possibilities when full interpretations are possible, and of combining partial interpretations into something meaningful when a full interpretation is not found. For instance, in The Wall Street Journal, the average sentence length is 21 words, more than twice the average sentence length of the corpus for the Air Travel Information System used in spoken language systems research. If the worst case complexity of a parser is n 3, then the search space can be eight times worse than in spoken language interfaces.</Paragraph>
    <Paragraph position="4"> A key element of our approach to these problems is the use of probabilistic models to control the greatly increased search space inherent in large vocabularies. We have observed that the state of the art in natural language processing (NLP) today is analogous to that in speech processing roughly prior to 1980, when purely knowledge-based approaches required much detailed, hand-crafted knowledge from several sources (e.g., acoustic, phonetic, etc.). Speech systems then, like NLP systems today, were brittle, required much hand-crafting, were limited in accuracy, and were not scalable. A revolution in speech technology has occurred since 1980, when probabilistic models were incorporated into the control structure for combining multiple sources of knowledge (providing improved accuracy and increased scalability) and as algorithms for training the system on large bodies (&amp;quot;corpora&amp;quot;) of data were applied (providing reduced cost in moving the technology to a new application domain).</Paragraph>
    <Paragraph position="5"> We are exploring the use of probabilistic models and training in NLP in a new pilot study, whose overall goal is to increase the robustness, precision, and scalability of natural language understanding systems. In the initial phase of the study, we are addressing issues raised by the huge vocabularies in ope n texts. We are experimenting with a variety of techniques for disambiguating word uses, selecting syntactic interpretations, and acquiring information about new words--techniques that can be applied both when a word is initially encountered and in handling the word more effectively the next time it is encountered: This paper reports the results of the first three months of this new effort. We have applied techniques from speech processing, such as &amp;quot;tri-tag&amp;quot; models and probability models on context-free grammars. We report on our initial experiments in using tri-tag models for hypothesizing parts of speech, as well as new results on the size of the corpus needed for training these models, and their use in processing unknown words. We discuss our use of a context-free probabilistic language model to help in selecting the correct parse from among multiple parses. Finally, we present a preliminary approach to the problem of learning the lexical syntax of new words in context and using our probabilistic language model to aid in selecting the interpretation to learn from.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML