File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1051_metho.xml
Size: 8,612 bytes
Last Modified: 2025-10-06 14:13:49
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1051"> <Title>AUTOMATIC GRAMMAR ACQUISITION</Title> <Section position="4" start_page="0" end_page="268" type="metho"> <SectionTitle> 2. LEARNING A CONTEXT-FREE GRAMMAR </SectionTitle> <Paragraph position="0"> A context-free grammar is acquired from a parsed Treebank corpus by straightforward memorization of the grammar rules used in each training example. Figure 1 shows a typical parse tree from our training corpus; Table 1 shows the grammar rules</Paragraph> <Paragraph position="2"> In order to parse new sentences, a simple bottom-up chart parser is combined with the acquired grammar. In our experiments, the parser is run until a single parse tree is found that spans all of the words in the input sentence. If no such tree is found, then the minimum number of disjoint fragments is returned such that the set of fragments spans the entire input sentence. Because the acquired grammar is highly ambiguous, the returned parse tree is dependent on the order in which the grammar rules are applied. To account for this sensitivity to rule order, we repeat our experiments several times with different rule orderings. We have found that, although different orderings produce different parse trees, the overall accuracy of the results do not differ significantly. As expected, the high degree of rule ambiguity, together with our procedure that returns a single parse tree for each sentence, yields rather poor performance. Nevertheless, the performance of this system serves as a baseline which we use to assess the performance of other systems based on alternative grammar types.</Paragraph> </Section> <Section position="5" start_page="268" end_page="269" type="metho"> <SectionTitle> 3. LEARNING A CONTEXT-DEPENDENT GRAMMAR </SectionTitle> <Paragraph position="0"> In this experiment, we closely follow the approach of Simmons and Yu, with extensions to accommodate grammar rules of a form derivable from the Treebank. Unlike our other experiments, the grammar rules in this experiment are situation / action rules for a shift-reduce parser. In the following sections based on contextual information.</Paragraph> <Section position="1" start_page="268" end_page="268" type="sub_section"> <SectionTitle> 3.1. Shift-Reduce Parser </SectionTitle> <Paragraph position="0"> The shift-reduce parser consists of two primary data structures: a five position input buffer, and an unlimited depth push down stack. New words arriving at the parser flow, in the order in which they are received, through the input buffer. Shift operations remove the leading word from the input buffer and push it onto the top of the stack.</Paragraph> <Paragraph position="1"> When this occurs, all other words are shifted one position toward the front of the buffer, and the next unprocessed word is moved into the last buffer position. Reduction operations remove two or more elements from the top of the stack, and replace them with a single constituent. Reduction operations are equivalent to constructing parse subtrees, as in Figure 2.</Paragraph> </Section> <Section position="2" start_page="268" end_page="268" type="sub_section"> <SectionTitle> 3.2. Context-dependent Rules </SectionTitle> <Paragraph position="0"> The determination of what action the parser takes in a particular situation is governed by context-dependent rules. Constraints given by the rules are matched against actual situations in a two part process. First, for a rule to be applicable, a hard constraint specified by two or more elements on the top of the stack must be satisfied. Next, those rules that satisfy this condition are ordered by preference based on soft constraints specified by context elements of the stack and buffer.</Paragraph> <Paragraph position="1"> Hard constraints for reduction rules are determined directly by the reductions themselves. For example, to apply a rule reducing {DT JJ NN ...} to {NP ...}, the top three stack elements must be NN, JJ, and DT. For shift operations, the hard constraints are always given by the top two stack elements.</Paragraph> <Paragraph position="2"> Soft constraints are specified by a'two part context comprised of a stack portion and a buffer portion.. The stack portion is comprised of the three stack positions directly below the hard constraint, while the buffer portion is comprised of the entire five element buffer. Soft constraints are scored by a weighted sum of the number of matches between rule and situation contexts. These weights were hand tuned to maximize parsing accuracy.</Paragraph> </Section> <Section position="3" start_page="268" end_page="269" type="sub_section"> <SectionTitle> 3.3. Learning Shift-Reduce Rules </SectionTitle> <Paragraph position="0"> In order to train the shift-reduce parser, it is first necessary to convert the Treebank parse trees into sequences of shift and reduce operations. A simple treewalk algorithm that performs the conversion is shown in Figure 4.</Paragraph> <Paragraph position="1"> Training examples are presented successively to the parser. For each example, all rules with satisfied hard constraints are formed into a list. Next, a shorter list is formed by extracting only those rules that best satisfy the soft constraints. If the correct parser action is among those specified by the shortened list of vales, then no action is taken. Otherwise, a new rule is formed from the current context and parser action, and is stored in the hash table of rules. When training is complete, the rule matching mechanism can present a short list of possible rules, one of which is guaranteed to be correct, for every situation presented in the training examples.</Paragraph> <Paragraph position="2"> Parsing a sentence is considered as a search problem, the goal of which is to fmd a sequence of actions that lead from an initial parser state to a final state. The initial state for a sentence is characterized by an empty stack and a buffer filled with the first five words of the sentence. The fmal state is characterized by an empty buffer and a single element at the top of the stack. Valid transitions between states are determined by the rules acquired during training.</Paragraph> <Paragraph position="3"> Given a parser state, the rule matching mechanism returns a list of rules specifying actions that cause transitions to new states. The rules are guaranteed to be legal by hard constraints, but vary to the degree to which soft constraints on context are satisfied. Each alternative rule corresponds to a different syntactic interpretation of a phrase, only one of which is correct. The premise, put forth by Simmons and Yu, is that the use of context information significantly reduces ambiguity.</Paragraph> <Paragraph position="4"> To parse a sentence, a beam search is used to fred a path through the state space that maximizes the soft constraints specifying context. Upon completion, a list of shift and reduce operations is returned. These operations correspond directly to a parse tree for the input sentence.</Paragraph> </Section> </Section> <Section position="6" start_page="269" end_page="269" type="metho"> <SectionTitle> 4. LEARNING A PROBABILISTIC CONTEXT-FREE GRAMMAR </SectionTitle> <Paragraph position="0"> In this experiment, probabilities are used to select one among the set of alternative parse trees derivable for an input sentence.</Paragraph> <Paragraph position="1"> A straightforward evaluation of the probability of a parse tree is obtained from the probabilities of the individual grammar rules that comprise the tree. For each rule r of the form ct ~ t, the rule probability is given by P(r)= P(fllct). Then, given a parse tree t constructed according to a derivation D(t), the probability of t is the product of all the conditional nile probabilities in the derivation:</Paragraph> <Paragraph position="3"> Using the Treebank training corpus, P(~a) is estimated by counting the number of times the rule a ~ fl appears in the training, divided by the number of times the nonterminal symbol Ot appears. In order to parse new sentences, a simple bottom-up chart parser is extended to include probability measures. In particular, an extra field is added to the chart structure in order to store a probability value corresponding to each completed edge. When multiple derivations result in the same edge, the probability value stored is the maximum among the competing theories. When parsing a sentence, all possible derivations are considered, and the derivation with the highest probability is then returned.</Paragraph> </Section> class="xml-element"></Paper>