File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1005_metho.xml
Size: 14,594 bytes
Last Modified: 2025-10-06 14:11:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1005"> <Title>A STOCHASTIC APPROACH TO SENTENCE PARSING</Title> <Section position="2" start_page="0" end_page="17" type="metho"> <SectionTitle> Z. INTRODUCTION </SectionTitle> <Paragraph position="0"> To prepare a grammar which can parse arbitrary sentances taken from a natural corpus is a difficult task. One of the most serious problems is the potentlally unbounded number of ambiguities. Pure syntactic analysis with an imprudent grammar will sometimes result in hundreds of parses.</Paragraph> <Paragraph position="1"> With prepositional phrase attachments and conjunctions, for example, it is known that the actual growth of ambiguities can be approximated by a Catfan number \[Knuth\], the number of ways to insert parentheses into a formula of M terms: 1, 2, 5, 14, 42, 132, 469, 1430, 4892, ... The five ambiguities in the following sentence with three ambiguous constructions can be well explained wlth this number.</Paragraph> <Paragraph position="2"> \[ I saw a man in a park with a scope. \[ I ! This Catalan number is essentially exponentlal and \[Martin\] reported a syntactically amblguous sentence with 455 parses: List the sales of products produced in 1973 I I with the products produced in 1972. I On the other hand, throughout the long history of natural language understanding work, semantic and pragmatic constraints are known to be indispensable and are recommended to be represented in some formal way and to be referred to during or after the syntactic analysis process.</Paragraph> <Paragraph position="3"> However, to represent semantic and pragmatic constraints, (which are usually domain sensitive) in a well-formed way is a very difficult and expensive task. A lot of effort in that direction has been expended, especially in Artificial Intelligence, using semantic networks, frame theory, etc. However, to our knowledge no one has ever succeeded in preparing them except in relatlvely small restricted domains. \[Winograd, Sibuya\].</Paragraph> <Paragraph position="4"> Faced with this situation, we propose in this paper to use statistics as a device for reducing ambiguities. In other words, we propose a scheme for grammatical inference as defined by \[Fu\], a stochastic augmentatlon of a given grammar; furthermore, we propose to use the resultant statistics as a device for semantic and pragmatic constraints. Wlthin this stochastic framework, semantic and pragmatic constraints are expected to be coded implicitly in the statistics. A simple bottom-up parse referring to the grammar rules as well as the statistics will assign relative probabilities among ambiguous derivations. And these relative probabilities should be useful for filtering meaningless garbage parses because high probabilities will be asslgned to the parse trees corresponding to meaningful interpretations and iow probabilities, hopefully 0.0, to other parse trees which are grammatlcally correct but are not meaningful.</Paragraph> <Paragraph position="5"> Most importantly, stochastic augmentation of a grammar will be done automatically by feeding a set of sentences as samples from the relevant domain in which we are interested, while the preparation of semantic and pragmatic constraints in the form of usual semantic network, for example, should be done by human experts for each specific domain.</Paragraph> <Paragraph position="6"> This paper first introduces the basic ideas of automatic training process of statistics from given example sentences, and then shows how it works wit experimental results.</Paragraph> <Paragraph position="7"> II. GRAMMATICAL INFERENCE OF A STOCHASTIC GRAMMAR A. Estimation of Markov Parameters for sample texts Assume a Markov source model as a collectlon of states connected to one another by transitions which produce symbols from a finite alphabet. To each transition, t from a state s, is associated a probability q(s,t), which is the probability that t will be chosen next when s is reached.</Paragraph> <Paragraph position="8"> When output sentences \[B(i)} from this markov model are observed, we can estimate the transition probabilities {q(s,t)} through an iteration process in the following way: i. Make an initial guess of {q(s,t\]}.</Paragraph> <Paragraph position="9"> 2. Parse each output sentence B(1). Let d(i,j) be a j-th derivation of the i-th output sentence B(i\].</Paragraph> <Paragraph position="10"> 3.</Paragraph> <Paragraph position="11"> Then the probability p|d(i,J}} of each derivation d{i,J\] can be defined in the following way: p{d|i,j}} is the product of probability of all the transitions q{s,~) which contribute to that derivation d(~,~).</Paragraph> <Paragraph position="12"> From this p(d(i,~}), the Bayes a posterlori estimate of the count c{s,t,i,j), how many times the transition t from state $ is used on the derivation d\[i,J}, can be estimated as follows:</Paragraph> <Paragraph position="14"> where n{s,t,i,~} is a number of times the transition t from state s is used in the derivation d{i,j}.</Paragraph> <Paragraph position="15"> Obviously, c{s,t,i,~} becomes nfs,t,i,J} in an unambiguous case.</Paragraph> <Paragraph position="16"> From this ={a,t,l,j}, new estimate of the probabillties @{$,t} can be calculated.</Paragraph> <Paragraph position="18"> 6. Replace {qfs, t}} with this new estimate {@{s,t}} and repeat from step 2.</Paragraph> <Paragraph position="19"> Through this process, asymptotic convergence will hold in the entropy of {q{$,t\]} which is defined as:</Paragraph> <Paragraph position="21"> and the {q(s,t)) will approach the real transition probability \[Baum-1970~1792\].</Paragraph> <Paragraph position="22"> Further optimized versions of this algorlthm can be found in \[Bahl-1983\] and have been successfully used for estimating parameters of various Markov models which approximate speech processes \[Bahl - 1978, 1980\].</Paragraph> <Paragraph position="23"> B. Extension to context-free grammar&quot; This procedure for automatically estimating Markov source parameters can easily be extended to context-free grammars in the following manner.</Paragraph> <Paragraph position="24"> Assume that each state in the Markov model corresponds to a possible sentential form based on a given context-free grammar. Then each transition corresponds to the application of a context-free production rule to the previous state, i.e. previous sentential form. For example, the state NP. VP can be reached from the state S by applying a rule S->NP VP, the state ART. NOUN. VP can be reached from the state NP. VP by applying the rule NP->ART NOUN to the first NP of the state NP. VP, and so on.</Paragraph> <Paragraph position="25"> Since the derivations correspond to sequences of state transitions among the states defined above, parsin E over the set of sentences given as training data will enable us to count how many times each transition is fired from the given sample sentences.</Paragraph> <Paragraph position="26"> For example, transitions from the state S to the state NP. VP may occur for almost every sentence because the correspondin E rule, 'S->NP VP', must be used to derive the most frequent declarative sentences; the transition from state ART. NOUN. VP to the stats 'every'.NOUN. VP may happen 103 times; etc. If we associate each grammar rule with an a priori probabillty as an initial guess, then the Bayes a posteriorl estimate of the number of times each transition will be traversed can be calculated from the initial probabilities and the actual counts observed as described above.</Paragraph> <Paragraph position="27"> Since each production is expected to occur independently of the context, the new estimate of the probabillty for a rule will be calculated at each iteration step by masking the contexts. That is, the Bayes estimate counts from all of the transitions which correspond to a single context free rule; all transitions between states llke xxx. A. yyy and xxx. B.C. yyy correspond to the production rule 'A->B C' regardless of the contents of xxx and yyy; are tied together to get the new probability estimate of the corresponding rule.</Paragraph> <Paragraph position="28"> Renewing the probabilities of the rules with new estimates, the same steps will be repeated until they converge.</Paragraph> <Paragraph position="29"> ZZZ. EXPERIHENTATZON A. Base Grammar As the basis of this research, the grammar developed by Prof. S. Kuno in the 1960's for the machine translation project at Harvard University \[Ktmo-1963, 1966\] was chosen, with few modifications. The set of grammar specifications in that grammar, whlchare in Greibach normal form, were translated into a form which is favorable to our method. 2118 rules of the original rules were rewrlttenas 5241 rules in Chomsky normal form.</Paragraph> <Paragraph position="30"> B. Parser A bottom-up context-free parser based on Cocke-Kasami-Yotmg algorithm was developed especially for this purpose. Special emphasis was put on the design of the parser to get better performance in highly ambiguous cases. That is, alternative-links, the dotted llnk shown in the figure below, are introduced to reduce the number of intermediate substructure as far as possible.</Paragraph> <Paragraph position="31"> A/P</Paragraph> </Section> <Section position="3" start_page="17" end_page="18" type="metho"> <SectionTitle> C. Test Corpus </SectionTitle> <Paragraph position="0"> Training sentences were selected from the magazines, 31 articles from Reader's Digest and Datamation, and from IBM correspondence. Among 5528 selected sentences from the magazine articles, 3582 sentences were successfully parsed with 0.89 seconds of CPU time ( IBM 3033-UP ) and with 48.5 ambiguities per a sentence. The average word lengths were 10.85 words from this corpus.</Paragraph> <Paragraph position="1"> From the corpus of IBM correspondence, 1001 sentences, 12.65 words in length in average, were chosen end 624 sentences were successfully parsed with --average of 13.5 ambiguities.</Paragraph> <Paragraph position="2"> D. Resultant Stochastic Context-free Grammar After a certain number of iterations, probabilities were successfully associated to all of the grammar rules and the lexlcal rules as shown below: In the above llst, (a) means that &quot;HELP&quot; will be generated from part-of-speech &quot;IT4&quot; with the probability 0.98788, and (b) means that &quot;SEE&quot; will be generated from part-of-speech &quot;IT4&quot; with the probability 0.00931. (c) means that the non-terminal &quot;SE (sentence)&quot; will generate the sequence, &quot;PRN (pronoun)&quot;, &quot;VX (predicate)&quot; and &quot;PD (period or post sententlal modifiers followed by period)&quot; with the probability 0.28754. (d) means that &quot;SE&quot; will generate the sequence, &quot;AAA(artlcle, adjective, etc.)&quot; , &quot;4X (subject noun phrase)&quot;, &quot;VX&quot; and &quot;PD&quot; with the probability 0.25530. The remaining lines are to be interpreted similarly.</Paragraph> <Paragraph position="3"> E. Parse Trees with Probabilities Parse trees were printed as shown below including relative probabilities of each parse.</Paragraph> <Paragraph position="4"> This example shows that the sentence 'We do not utilize outside art services directly.' was parsed in three different ways. The differences are shown as the difference of the sub-trees identified by A, B and C in the figure.</Paragraph> <Paragraph position="5"> The numbers following the identifiers are the relative probabilities. As shown in this case, the correct parse, the third one, got the highest relatlve probability, as was expected.</Paragraph> <Paragraph position="6"> F. Result 63 ambiguous sentences from magazine corpus and 21 ambiguous sentences from IBM correspondence were chosen at random from the sample sentences and their parse trees with probabilities were manually examined as shown in the table below: with no correct parse I ~umber of sentences 54 which got highest prob.</Paragraph> <Paragraph position="7"> on most natural parse Number of sentences 5 which did not get the highest prob. on the most natural parse Taking into consideration that the grammar is not tailored for this experiment in any way, the result is quite satisfactory.</Paragraph> <Paragraph position="8"> The only erroneous case of the IBM corpus is due to a grammar problem. That is, in this grammar, such modifier phrases as TO-infinltives, prepositional phrases, adverbials, etc. after the main verb will be derived from the 'end marker' of the sentence, i.e. period, rather then from the relevant constituent being modified. The parse tree in the previous figure is a typical example, that is, the adverb 'DIRECTLY' is derived from the 'PERIOD' rather then from the verb 'UTILIZE '. This simplified handling of dependencies will not keep information between modifying and modified phrases end as a result, will cause problems where the dependencies have crucial roles in the analysis. This error occurred in a sentenoe ' ... is going ~o work out', where the two interpretations for the phrase '%o work' exist: '~0 work' modifies 'period' as: 1. A TO-infinitlve phrase 2. A prepositional phrase Ignoring the relationship to the previous context 'Is going', the second interpretation got the higher probability because prepositionalphrases occur more frequently then TO-infinltivephrases if the context is not taken into account.</Paragraph> <Paragraph position="9"> IV. CONCLUSION The result from the trials suggests the strong potential of this method. And this also suggests some application possibility of this method such as: refining, minimizing, and optimizing a given context-free grammar. It will be also useful for giving a dlsamblguation capability to a given ambiguous context-free grammar.</Paragraph> <Paragraph position="10"> In this experiment, an existing grammar was picked with few modlflcatlons, therefore, only statistics due to the syntactic differences' of the sub-struttured units were gathered. Applying this method to the collection of statistics which relate more to sementlcs should be investigated as the next step of this project* Introduction into the grammar of a dependency relationship among sub-structured units, semantically categorized parts-of-speech, head word inheritance among sub-structured units, etc. might be essential for this purpose. More investigation should be done on this direction.</Paragraph> </Section> class="xml-element"></Paper>