XML Viewer - w97-0105

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-0105_abstr.xml
Size: 16,114 bytes
Last Modified: 2025-10-06 13:48:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0105">
  <Title>Probabilistic Parsing of Unrestricted English Text, With a Highly-Detailed Grammar</Title>
  <Section position="2" start_page="0" end_page="19" type="abstr">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> This article describes a grammar-based probabilistic parser, and presents experimental results for the parser as trained and tested on a large, highly varied treebank of unrestricted English text. Probabilistic decision trees are utilized as a mea.ns of prediction, roughly as in (Jelinek et al., 1994; Magermau, 1995), and as in these references, training is supervised, and in particular is treebank-based. In all other respects, our work departs from previous research on broad--coverage</Paragraph>
    <Paragraph position="2"> probabilistic parsing, which either attempts to learn to predict gr~rarn~tical structure of test data directly from a training treebank (Brill, 1993; Collins, 1996; Eisner, 1996; Jelinek et al., 1994; Magerman, 1995; S~kine and Orishman, 1995; Sharman et al., 1990), or employs a grammar and sometimes a dictionary to capture linguistic expertise directly (Black et al., 1993a; GrinBerg et al., 1995; Schabes; 1992), but arguably at a less detailed and informative level than in the research reported here.</Paragraph>
    <Paragraph position="3"> In what follows, Section 2 explains the contribution to the prediction process of the grammar and of the lexical generalizations created by our grammarian. Section 3 shows; from a formal standpoint, how prediction is carried out, and more generally how the parser operates. Section 4 presents experimental results. Finally, Section 5 details our efforts to radically expand the size of our training corpus by employing techniques of treebank conversion.</Paragraph>
    <Paragraph position="4">  2. HOW THE GB.AiVIMAB. AND LEXICAL GENERALIZATIONS HELP 2.1. How the Grammar Helps  Figure 1 shows a sampling of parsed sentences from the one-million-word ATR/Lancaster 'IYeebauk of General English (Black et al., 1996), which we employ for training; smoothing and testing our parser. The Treebank consists of a correct parse for each sentence it contains; with respect to the ATR English Grammar. 1 Every non-terminal node is labelled with the n~rae of the ATR English Grammar rule 2 that generates the node; and each word is labelled with one of the 2843 tags in the Gramm,r's tagset. 3 Together, the bracket locations, rule names, and lexical tags of a Treebank parse specify a unique parse within the Gr~rnra~r. In the Grammar parse, rule names and lexical tags are replaced by bundles of feature/value pairs. Each node contains values for 66 features, and there are 12 values per feature, on average.</Paragraph>
    <Paragraph position="5"> Prediction in our parser is conditioned partially on questions about feature values of words and non-terminal nodes. For instance, when we predict whether a constituent has ended, we ask how many words until the next finite verb; the next comma; the next noun; etc. In tagging, we ask if the same word has already occurred in the sentence; and ff so, what its value is for various features. By labelling Treeb~n~ nodes with Gr~ramar rule names, and not with phrasal and clausal n~raes, as in other (non-gr~rarnar-based) treebanks' (Eyes and Leech, 1993; Garside and McEnery, 1993; Marcus et al., 1993), we gain access to all information provided by the Grammar regarding each ~reebank node.</Paragraph>
    <Paragraph position="6"> It would be difficult to attempt to induce this information from the Treebank alone. The parent of a rule in the Grammar often contains feature values that are not derived from any of its children. Further~ the parent inherits some feature values from one child, and some from another. Each rule in the Grammar is associated with a primary and secondary head, and head information is passed up the parse tree. Finally, extensive Boolean conditions are imposed on the application of each individual rule. These conditions are intended to permit only useful applications of a given rule, and reflect experience gained by parsing millions of words with the Grammar, and crucially, by generalizing this experience in ways believed appropriate.</Paragraph>
    <Paragraph position="7"> Since the ATR English Grammar was created specifically for use in machine parsing; some of its features are designed expressly to facilitate parse prediction. For example, the feature  and Two (Non-Sequential) from Chinese Take-Out Food Flier &amp;quot;np_modification&amp;quot; helps to predict attachment events by carrying up to the top node of each noun phrase, data as to how much more modification the noun phrase can probably take. At one extreme, a noun phrase may not have been modified at all so far, and so, other things being equal, it is a prime target for post-modification. At the other extreme, it may already have been modified in a way that tends not to permit further modification, such as a noun phrase followed immediately by a postmodffying comparative phrase (&amp;quot;Such as can understand the topic (may attend)&amp;quot;; &amp;quot;More reasons than you can imagine (were adduced)&amp;quot;).</Paragraph>
    <Paragraph position="8"> Another feature of this type is &amp;quot;det_pos', which reveals, concerning a noun phrase, whether it includes a determiner phrase, and if so, what type. Determinerless noun phrases tend to have different chances of occurring in certain gT~rnrnatical constructions than noun phrases with determiners, and this feature makes it possible for our models to take account of this tendency. Note that it is far from trivial to capture and then percolate this information up a treebank parse without a grammar: demarcation of the determl-er phrase in each case is involved, along with identification of the type of determiner phrase, and other steps.</Paragraph>
    <Paragraph position="9"> The ATR English Grammar is particularly detailed and comprehensive, and this both helps in parse prediction and enhances the value of output that is correctly parsed by our system. For instance, complete syntactic and semantic analysis is performed on all nominal compounds, e.g. ~he Third Annual Long Branch, New Jersey Rod and Gun Club Picnic and Turkey Shoot&amp;quot;, or &amp;quot;high fidelity equipment&amp;quot;. Further, the full range of attachment sites is available within the Gr~mm~r for sentential and phrasal modffers, so that differences in meaning can be accurately reflected in parses. For instance, in &amp;quot;She didn't attend because she was tired, and didn't call for the same reason,&amp;quot; the phrases &amp;quot;because she was tied&amp;quot; and &amp;quot;for the same reason&amp;quot; should probably postmodify their entire respective verb phrases, &amp;quot;didn't attend&amp;quot; and &amp;quot;didn't call&amp;quot;, for maximum clarity. A full range of</Paragraph>
    <Paragraph position="11"> attachment sites are available in the Grsmm~r, are used precisely in the ~Preeba~k, and are required to be handled correctly by our parser for its output to be considered correct.</Paragraph>
    <Paragraph position="12"> 2.2. How Lexical Generalizations Help Prediction in our parser is conditioned not only on questions about feature values of words and non-terminal nodes, but also on questions about &amp;quot;raw&amp;quot; words, wordstrings, and whole sentences. One category of contextual question asks about characteristics of a sentence as a whole. For instance, very short &amp;quot;sentences&amp;quot; in our trsJulng data tend to be free-standing noun phrases or other non-sententiai units. Many of these are titles, speaker-turn indicators, etc. So we ask about the length of the overall &amp;quot;sentence&amp;quot; in all models. In tagging, for instance, there tend not to be any finite verbs in these contexts, and this fact helps with the task of differentiating, say, preterit forms from past participles functioning adjectivally, e.g. &amp;quot;Said plaintiff and plaintiff's counsel:&amp;quot;. Similarly, the first and last words of a sentence can be powerful predictors. If the first word of a sentence is a typical beginning for sentential premodifying phrases (e.g. &amp;quot;Since&amp;quot;), and if there is just one comma in the sentence, and that comma occurs in the fn'st quadrant, then there is a good chance that the overall structure of the sentence is: premodlfying phrase, then main clause.</Paragraph>
    <Paragraph position="13"> Effective questions about words and expressions, for the purpose of predicting the semantic portion of the lexical tags, are essential to the success of our models. One strategy we utilize is to identify contexts strongly associated with a given semantic event. For instance, the context: FirstName ~X&amp;quot; LastName (e.g. Edward &amp;quot;Stubby&amp;quot; Smith) is one of many that are associated with the semantic category NickName.</Paragraph>
    <Paragraph position="14"> 2.3. Formulating Grsmmar and Lexical Questions For Prediction We have developed a flexible language for formulating grammar-based and lexically-based questions about Treeb~n~ text. The ~nswers to these questions are made available to the models in our parser.</Paragraph>
    <Paragraph position="15"> The language provides facilities for navigating a parse tree, determining feature values of a given node, and m~Hng simple boolean or arithmetic computations. In addition, it allows us to translate answers returned by the question into a more natural format for input to the decision-tree models. The language provides easy access to word and tag nodes at any offset from the begln~ng or end of the sentence. It also provides a reference position--the &amp;quot;cu~ent&amp;quot; node, i.e. the node about which a prediction is being made. It is easy to navigate from any node to previous nodes, parent/child nodes, and word/tag nodes relative to the node's constituent boundaries. The navigational commands are recursive, so that, for example, one can arrive at a grandchild of a node by asking about a child's child.</Paragraph>
    <Paragraph position="16"> There is nothing in the language itself which restricts the context which can be used in models.</Paragraph>
    <Paragraph position="17"> For example, changing a bigram tagger into a trigram tagger requires only adding questions about the additional nodes. More generally, the ability to ask questions about the entire sentence (and, in the future, document), means that the '~context&amp;quot; is of variable length.</Paragraph>
    <Paragraph position="18"> Every question b~-s access to the current parse state, which cont~i~ everything known or predicted about the parse tree up to the time the question is asked. Any of this information is available for a selected node. For word nodes, this includes membership on vocabulary lists, whether the word contains various pref~.xes, s~mxes, substrings, etc. In addition, for tag and nontermi~l nodes, the name of the label and the values of all the Gr~mmar's features (including those based on information propagated up the parse tree from lower down) at that node are also available. Finally, for nonterm~nal nodes, general information about the number of children, span, constituent boundaries, etc. is available.</Paragraph>
    <Paragraph position="19">  . t Answers to the questions are of various types: Boolean, categorical, integer, sets of integers. But we transform all these types of answers into binary strings. Some transformations are obvious. Boolean values, for example, are mapped to a single bit. Other transformations are based on clustering, either expert or automatic. For example, the sets of tags and rule labels have been clustered by our team gr~:mm~trian, while a vocabulary of about 60,000 words has been clustered by machine (Brown et al., 1992; Ushioda~ 1996a; Ushioda, 1996b).</Paragraph>
    <Paragraph position="20"> 3. HOW PREDICTION IS CARRIED OUT 3.1. System Design  The ATR parser is a probabilistic parser which uses decision-tree models. A parse is built up from a succession of parse states, each of which represents a partial parse tree. Transition between states is accomplished by one of the following steps: (1) assigning syntax to a word; (2) assigning semantics to a word; (3) deciding whether the current parse tree node is the last node of a constituent; (4) assigning a (rule) label to an internal node of the parse tree. Note that the first two steps together determine the tag for a word, and the third determines the topology of the tree. Working from the bottom up, left to right, constrains the parser to produce a unique derivation for each parse state. Alternatively; we can tag the entire sentence first, then work from tags up, left to right, which also yields a unique derivation for each parse state.</Paragraph>
    <Paragraph position="21"> Statistical models corresponding to each type of step provide estimates of the probability of each step's outcome. 4 Each model uses as input the answers to a set of questions about context designed specifically for that model by our team grammarian, using the language described in Section 2.3. Thus the probability of each decision depends on features extracted from the context, including information about any word(s) in the sentence and any tags and parse structure already predicted. The estimated probability of any parse state is the product of the probabilities of each step taken to reach that state. Strictly speaking, we estimate relative likelihoods rather than probabilities, since we make no attempt to normMize over all possible parses for a given sentence.</Paragraph>
    <Paragraph position="22"> Given a set of models for estimating the probabilities of parse steps, the problem of predicting a parse reduces to searching the space of possible parses for the most likely one. We use a chart parser (Ka~mi~ 1965) to build a compact representation of all legal parses for the sentence, which in turn constrains the search to consider only those parse steps guaranteed to lead to a complete (legal) parse. Even so, because the Grs.mm~r generates a large number of parses for each sentence, s it is not feasible to rau~ the parses exhaustively. Fortunately, incomplete parse states are assigned probabilities, which can be used to guide a search by r, ling out unlikely parses without constructing the complete parse. We have found that a greedy search, which chooses the most likely outcome for each parsing step, usually finds a good candidate parse. Occasionally, though, choosing a less likely step at one point leads to a parse with higher overall likelihood. To allow for this possibility, we use the greedy candidate parse to &amp;quot;seed&amp;quot; the stack-based decoder described in (Jelinek, 1969). There is some freedom in the order in which the parsing steps are taken. The context in which a model makes its prediction includes any parts of the parse tree which have already been built.</Paragraph>
    <Paragraph position="23"> Hence, the order chosen determines what information is available to each model. We choose to tag the entire sentence first, producing an N-best list of tag sequences. Specifically, starting from a sequence of words, we first tag the sentence as follows:  Next, starting from the tag of the first word, which is the left-most leaf node of the parse tree, we take the following steps: * estimate the probability that the current node of the parse tree is the last child of its parent (e.g. the probability that a constituent ends at this node); * if a constituent is deemed to end at this node, estimate the probability of possible rule labels for that consitutent, i.e. of only those rules which are known to lead to legal parses; make that node the current node; and return to the first step; * otherwise, make the top of the next subtree to the right the current node and return to the first step.</Paragraph>
    <Paragraph position="24"> This approach decouples the search over tag sequences from the search over parse trees.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML