File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0301_intro.xml

Size: 2,644 bytes

Last Modified: 2025-10-06 14:06:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0301">
  <Title>A. Linear Observed Time Statistical Parser Based on Maximum Entropy Models</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper presents a statistical parser for natural language that finds one or more scored syntactic parse trees for a given input sentence. The parsing accuracy--roughly 87% precision and 86% recall-surpasses the best previously published results on the Wall St. Journal domain. The parser consists of the following three conceptually distinct parts:  1. h set of procedures that use certain actions to incrementally construct parse trees.</Paragraph>
    <Paragraph position="1"> 2. A set of maximum entropy models that compute probabilities of the above actions, and effectively &amp;quot;score&amp;quot; parse trees.</Paragraph>
    <Paragraph position="2"> * The author acknowledges the support of AI:tPA grant N66001-94C-6043.</Paragraph>
    <Paragraph position="3"> 3. A search heuristic which attempts to find the  highest scoring parse tree for a given input sentence. null The maximum entropy models used here are similar in form to those in (Ratnaparkhi, 1996; Berger, Della Pietra, and Della Pietra, 1996; Lau, Rosenfeld, and Roukos, 1993). The models compute the probabilities of actions based on certain syntactic characteristics, or features, of the current context. The features used here are defined in a concise and simple manner, and their relative importance is determined automatically by applying a training procedure on a corpus of syntactically annotated sentences, such as the Penn Treebank (Marcus, Santorini, and Marcinkiewicz, 1994). Although creating the annotated corpus requires much linguistic expertise, creating the feature set for the parser itself requires very little linguistic effort.</Paragraph>
    <Paragraph position="4"> Also, the search heuristic is very simple, and its observed running time on a test sentence is linear with respect to the sentence length. Furthermore, the search heuristic returns several scored parses for -a sentence, and this paper shows that a scheme to pick the best parse from the 20 highest scoring parses could yield a dramatically higher accuracy of 93% precision and recall.</Paragraph>
    <Paragraph position="5"> Sections 2, 3, and 4 describe the tree-building procedures, the maximum entropy models, and the search heuristic, respectively. Section 5 describes experiments with the Penn Treebank and section 6 compares this paper with previously published works.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML