File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1044_intro.xml

Size: 5,037 bytes

Last Modified: 2025-10-06 14:04:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1044">
  <Title>Parsing the Voyager Domain Using Pearl</Title>
  <Section position="2" start_page="0" end_page="231" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> All natural language grammars are ambiguous. Even tightly fitting natural language grammars are ambiguous in some ways.</Paragraph>
    <Paragraph position="1"> Loosely fitting grammars, which are necessary for handhng the variability and complexity of unrestricted text and speech, are worse. The standard technique for dealing with this ambiguity, prtming grammars by hand, is painful, time-consuming, and usually arbitrary. The solution which many people have proposed is to use stochastic models to train statistical grammars automatically from a large corpus.</Paragraph>
    <Paragraph position="2"> Attempts in applying statistical techniques to natural language parsing have exhibited varying degrees of success. These successful and unsuccessful attempts have suggested to us that:  be applied with restraint (poor estimates of context axe worse than none\[6|).</Paragraph>
    <Paragraph position="3"> * Interactive, interlea~'ed architectures axe preferable to pipeline architectures in NLU systems, because they use more of the available information in the decision-malting process.</Paragraph>
    <Paragraph position="4"> We have constructed a stochastic parser, &amp;quot;Pearl, which is based on these ideas.</Paragraph>
    <Paragraph position="5"> The development of the Pearl parser is an ettbrt to combine the statistical models developed recently into a single tool which incorporates all of these models into the decision-making component of a. parser. While we hax'e only attempted to incorporate a few simple statistical models into this parser, Peaxl is structured in a way which allows any number of syntactic, semantic, and other knowledge sources to contribute to parsing decisions. 'l'he current implementation of Pearl uses Church's part-of-speech assignment trigram model, a simple probabilistic unknown word model, and a conditional probability model for grammar rules based on part-of-speech trigrams and parent rules.</Paragraph>
    <Paragraph position="6"> By combining multiple knowledge sources and using a chart-parsing framework, Pearl attempts to handle a number of difficult problems. Pearl has the capability to parse word lattices, an ability which is useful in recognizing idioms in text processing, as well as in speech processing. The parser uses probabilistic training from a corpus to disambiguate between grammatically acceptable structures, such as determining prepositional phrase attachment and conjunction scope. Finally, Pearl ms|mains a well-formed subs|ring table within its chart to allow for partial parse retrieval. Partial parses are useful both for error-message generation and for processing ungrammatical or incomplete sentences.</Paragraph>
    <Paragraph position="7"> For preliminary tests of Pearl's capabilities, we are using the Voyager direction-finding domain, a spoken-language system developed at MiT. 3 We have selected this domain for a number of reasons. First, it exhibits the attachment regularities which we are trying to capture with the context-sensitive probability model. Also, since both MIT and Unisys have developed parsers and grammars for this domain, there are existing parsers with which we can compare 7Pearl. Finally, pearl's dependence on a parsed corpus to train its models and to deri~ its grammar  required that we use a domain for which a parsed corpus existed. A corpus of 1100 parsed sentences was generated by the Unisys' I-'I.tNDIT Language Understanding System. These parse trees were evaluated to be semantically correct by PUNDIT'S semantics component, although no hand-verification of this corpus was performed. PUNDIT'S parser uses a string grammar with many comphcated, hand-generated restrictions. The goal of the experiments we performed was to reproduce (or improve upon) the parsing accuracy of PUNDIT USing jUSt the context-free backbone of the PIINDIT grammar, without the hand-generated restrictions and, equally important, without the benefit of semantic analysis. In a. test on 40 Voyager sentences excluded from the training material, Pearl has shown promising results in handling part-of-speech assignment, prepositional phrase attachment, and unknown word categorization. Pearl correctly parsed 35 out of 40 or 87.5% of these sentences, where a correc~ parse is defined to mean one which would produce a correct response from the Voyager system. We will describe the details of this experiment later. In this paper, we will first explain our contribution to the stochastic models which axe used in Pearl: a context-free grammar with context-sensitive conditional probabilities. Then, we will describe the purser's architecture and the parsing algorithm. Finally, we will gi~m the results of experiments we performed using Pearl which explore its capabilities.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML