XML Viewer - n06-1045

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-1045_metho.xml
Size: 16,280 bytes
Last Modified: 2025-10-06 14:10:10
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1045">
  <Title>A Better -BestList: Practical Determinization ofWeighted Finite Tree Automata</Title>
  <Section position="4" start_page="351" end_page="353" type="metho">
    <SectionTitle>
3 Grammars, Recognizers, and
</SectionTitle>
    <Paragraph position="0"> ticethatwehaveintroduced astartstate,modified thenotion of initial assignment, and changed the arity of nullary symbols to unary symbols. Thisistomake treeautomata morepalatable to those accustomed to string automata and to allow for a useful graphical interpretation.</Paragraph>
    <Paragraph position="1">  nizer is aranked alphabet, isthe initial state, isaset of final states, and isafiniteset  of transitions from a vector of states to one state that reads a -ary symbol. Consider the following tree recognizer:  Aswithstring automata, itishelpful tohave avisualization tounderstand whattherecognizer isrecognizing. Figure 2 provides a visualization of the recognizer above. Notice that some members of are drawn as arcs with multiple (and ordered) tails. This is the key difference in visualization between string and tree automata - to capture the arity of the symbol being read we must visualize the automata asan ordered hypergraph.</Paragraph>
    <Paragraph position="2"> The function of the members of in the hypergraph visualization leads us to refer to the vector of states as an input vector of states, and the single state as an output state. We will refer to as the transition set of the recognizer.</Paragraph>
    <Paragraph position="3"> In string automata, a path through a recognizer consists ofasequence ofedges thatcan befollowed fromastarttoanendstate. Theconcatenation oflabelsoftheedgesofapath,typically inaleft-to-right order, forms a string in the recognizer's language. In tree automata, however, a hyperpath through a recognizer consists ofasequenceofhyperedges that can be followed, sometimes in parallel, from a start  The number denotes the arity of the symbol.</Paragraph>
    <Paragraph position="4">  to an end state. We arrange the labels of the hyperedges to form a tree in the recognizer's language but must now consider proper order in two dimensions. The proper vertical order is specified by the order of application of transitions, i.e., the labels of transitions followed earlier are placed lower in the treethanthelabels oftransitions followed later. The properhorizontalorderwithinonelevelofthetreeis specified bytheorderofstates inatransition's input vector. Intheexamplerecognizer,thetrees and arevalid. Noticethat maybe recognized intwodifferent hyperpaths.</Paragraph>
    <Paragraph position="5"> Like tree recognizers, tree transducers read tree input and decide whether the input is in the language, but they simultaneously produce some output as well. Since we wish to associate a weight with every acceptable tree in a language, we will consider transducers that produce weights as their output. Note that in transitioning from recognizers to transducers we are following the convention established in (Mohri, 1997) where a transducer with weight outputs is used to represent a weighted recognizer. One may consider the determinization of tree-to-weight transducers as equivalent to the determinization ofweighted tree recognizers.</Paragraph>
    <Paragraph position="6"> Formally, a bottom-up tree-to-weight transducer is defined by where , , ,and aredefined asfor recognizers, and: is a finite set of transitions from a vector of states toone state,reading a -arysymbol and outputting some weight is the initial weight function mapping to is the final weight function mapping  to .</Paragraph>
    <Paragraph position="7"> We must also specify a convention for propagating the weight calculated in every transition. This can be explicitly defined for each transition but we will simplify matters by defining the propagation of theweighttoadestinationstateasthemultiplication ofthe weightateach source state withtheweight of the production.</Paragraph>
    <Paragraph position="8"> We modify the previous example by adding weightsasfollows: Asanexample,considerthefollowingtree-to-weight transducer ( , , ,and are asbefore): Figure 3 shows the addition of weights onto the automata, forming the above transducer. Notice the tree yields the weight 0.036 ( ), and yields the weight 0.012 ( ) or0.054 ( ), depending on the hyperpath followed.</Paragraph>
    <Paragraph position="9"> Thistransducer isanexampleofanonsubsequential transducer. A tree transducer is subsequential if foreachvector vof statesandeach there isatmostonetransitionin withinputvectorvand label . These restrictions ensure a subsequential transducer yields a single output for each possible input, that is, itisdeterministic inits output. Becausewewillreasonaboutthedestinationstate of a transducer transition and the weight of a transducer transition separately, we make the following definition. For a given v where v is a vector of states, , , and , let v and v . Equivalent shorthand forms are and .</Paragraph>
  </Section>
  <Section position="5" start_page="353" end_page="354" type="metho">
    <SectionTitle>
4 Determinization
</SectionTitle>
    <Paragraph position="0"> do not consider any ofthese cases in this work.</Paragraph>
    <Paragraph position="1"> Determinizing a tree-to-weight transducer can be thoughtofasatwo-stageprocess. First,thestructure of the automata must be determined such that a single hyperpath exists for each recognized input tree. This is achieved by a classic powerset construction, i.e., a state must be constructed in the output transducerthatrepresentsallthepossiblereachabledesti- null nation states given aninput and alabel. Because we areworkingwithtreeautomata, ourinputisavector of states, not a single state. A comparable powerset construction on unweighted tree automata and a proofofcorrectness canbefoundin(Comon,et. al., 1997).</Paragraph>
    <Paragraph position="2"> The second consideration to weighted determinizationisproperpropagationofweights. Forthis we will use (Mohri, 1997)'s concept of the residual weight. We represent in the construction of states in the output transducer not only a subset of states oftheinputtransducer, butalsoanumberassociated with each of these states, called the residual. Since we want 's hyperpath of a particular input tree to have as its associated weight the sumof the weights of the all of 's hyperpaths of the input tree, wereplace a set of hyperedges in that have the same input state vector and label with a single hyperedge in bearing the label and the sum of 's hyperedge weights. The destination state of the hyperedge represents the states reachable by 's applicable hyperedges and for each state, the proportion of the weight fromthe relevant transition.</Paragraph>
    <Paragraph position="3"> Figure 4 shows the determinization of a portion of the example transducer. Note that the hyperedge  leading to state in the input transducer contributes of the weight on the output transducer hyperedge and the hyperedge leading to state in the input transducer contributes the remaining . This is reflected in the state construction in the output transducer. Thecompletedeterminization oftheexample transducer is shown inFigure 5.</Paragraph>
    <Paragraph position="4"> To encapsulate the representation of states from theinputtransducerandassociatedresidualweights, we define a state in the output transducer as a set of tuples, where and . Since the algorithm builds new states progressively, we will need to represent a vector of states from the output transducer, typically depicted as v. We may construct the vector pair q w from v, where q is a vector of states of the input transducer and w is a vector of residual weights, by choosing a (state, weight) pair from each output state in v. For example, let . Then two possible output transducer states could be and . Ifwechoose v thena valid vector pair q w is q , w .</Paragraph>
    <Paragraph position="5"> The sets v , v , and v are defined asfollows:  v q w from v q .</Paragraph>
    <Paragraph position="6"> v q w from v q .</Paragraph>
    <Paragraph position="7"> v q w from v</Paragraph>
    <Paragraph position="9"> v is the set of vector pairs q w constructed from v where each q is an input vector in atransition withlabel . v isthesetof unique transitions paired with the appropriate pair foreach q w in v . v isthesetofstates reachable fromthe transitions in v .</Paragraph>
    <Paragraph position="10"> The consideration of vectors of states on the incident edge of transitions effects two noticeable changes on the algorithm as it is presented in (Mohri, 1997). The first, relatively trivial, change is the inclusion of the residual of multiple states in the calculation of weights and residuals on lines 16 and 17. The second change is the production of vectors for consideration. Whereas the string-based algorithm considered newly-created states in turn, we must consider newly-available vectors. Foreach newly created state, newly available vectors can be formed by using that state with the other states of the output transducer. This operation is performed on lines 7and 22 ofthe algorithm.</Paragraph>
  </Section>
  <Section position="6" start_page="354" end_page="356" type="metho">
    <SectionTitle>
5 Empirical Studies
</SectionTitle>
    <Paragraph position="0"> Wenowturntosomeempiricalstudies. Weexamine the practical impact of the presented work by showing: null That the multiple derivation problem is pervasive in practice and determinization iseffective atremoving duplicate trees.</Paragraph>
    <Paragraph position="1"> That duplication causes misleading weighting of individual trees and the summing achieved from weighted determinization corrects this error, leading to re-ordering of the -best list.</Paragraph>
    <Paragraph position="2"> That weighted determinization positively affects end-to-end system performance.</Paragraph>
    <Paragraph position="3"> We also compare our results to a commonly used technique for estimation of -best lists, i.e., summing over the top derivations to get weight estimates of thetop unique elements.</Paragraph>
    <Section position="1" start_page="354" end_page="356" type="sub_section">
      <SectionTitle>
5.1 Machine translation
</SectionTitle>
      <Paragraph position="0"> We obtain packed-forest English outputs from 116 short Chinese sentences computed by a string-to-tree machine translation system based on (Galley, et. al., 2004). The system is trained on all Chinese-English parallel data available from the Linguistic Data Consortium. The decoder for this system is a CKY algorithm that negotiates the space described in (DeNeefe, et. al.,2005). Nolanguage model was used in this experiment.</Paragraph>
      <Paragraph position="1"> The forests contain a median of English parse trees each. We remove cycles from each  containing members of and at least one member of . */ if v is a new state then19 for each u COMBINATIONS v   applyourdeterminizationalgorithm,andextract the -best trees using a variant of (Huang and Chiang,2005). Theeffectsofweighteddeterminization on an -best list are obvious to casual inspection. Figure 7 shows the improvement in quality of the top 10 trees from our example translation after the application ofthe determinization algorithm.</Paragraph>
      <Paragraph position="2"> The improvement observed circumstantially holds up to quantitative analysis as well. The forestsobtained bythedeterminized grammarshave between 1.39% and 50% of the number of trees of their undeterminized counterparts. On average, the determinized forests contain 13.7% of the original  As in (Mohri, 1997), determinization may be applicable to some automata that recognize infinite languages. In practice, cycles in tree automata of MT results are almost never desired, since these represent recursive insertion of words. number of trees. Since a determinized forest containsnorepeated treesbutcontains exactly thesame unique trees as its undeterminized counterpart, this indicates that an average of 86.3% of the trees in an undeterminized MToutput forest are duplicates.</Paragraph>
      <Paragraph position="3"> Weighted determinization also causes a surprisingly large amount of -best reordering. In 77.6% of the translations, the tree regarded as &amp;quot;best&amp;quot; is different after determinization. This means that in a large majority of cases, the tree with the highest weight is not recognized as such in the undeterminized list because its weight is divided among its multiple derivations. Determinization allows these instances and their associated weights to combine and puts the highest weighted tree, not the highest weighted derivation, atthe top ofthe list.</Paragraph>
      <Paragraph position="4">  We can compare our method with the more commonly used methods of &amp;quot;crunching&amp;quot; -best lists, where . The duplicate sentences in the trees are combined, hopefully resulting in at least unique members with an estimation of the true tree weight for each unique tree. Our results indicate this is a rather crude estimation. When the top 500 derivations of the translations of our test corpus are summed, only 50.6% of them yield an estimated highest-weighted tree that is the same as the true highest-weighted tree.</Paragraph>
      <Paragraph position="5"> As a measure of the effect weighted determinization and its consequential re-ordering has on an actual end-to-end evaluation, we obtain Bleu scores forour1-besttranslations fromdeterminization, and compare them with the 1-best translations from the undeterminized forest and the 1-best translations from the top-500 &amp;quot;crunching&amp;quot; method. The results are tabulated in Figure 6. Note that in 26.7% of cases determinization did not terminate in a reasonable amount of time. For these sentences we used the best parse from top-500 estimation instead. It is notsurprisingthatdeterminization mayoccasionally take a long time; even for a language of monadic trees (i.e. strings) the determinization algorithm is NP-complete, as implied by (Casacuberta and de la Higuera, 2000) and, e.g. (Dijkstra, 1959).</Paragraph>
    </Section>
    <Section position="2" start_page="356" end_page="356" type="sub_section">
      <SectionTitle>
5.2 Data-Oriented Parsing
</SectionTitle>
      <Paragraph position="0"> Weighted determinization of tree automata is also useful for parsing. Data-Oriented Parsing (DOP)'s methodology is to calculate weighted derivations, but asnoted in(Bod,2003), itis thehighest ranking parse,notderivation,thatisdesired. Since(Sima'an, 1996) showed that finding the highest ranking parse is an NP-complete problem, it has been common to estimate the highest ranking parse bythe previously  bank. The use of best derivation (undeterminized),  estimateofbesttree(top-500),andtruebesttree(determinized) for selection of parse output isshown. described &amp;quot;crunching&amp;quot; method.</Paragraph>
      <Paragraph position="1"> We create a DOP-like parsing model  by extracting and weighting a subset of subtrees from sections 2-21 of the Penn Treebank and use a DOPstyle parser to generate packed forest representations of parses of the 2416 sentences of section 23. The forests contain a median of parse trees. We then remove cycles and apply weighted determinization to the forests. The number of trees in each determinized parse forest is reduced by a factor of between 2.1 and . On average, the number of trees is reduced by a factor of  900,000,demonstratingamuchlargernumberofduplicate parses prior to determinization than in the machine translation experiment. The top-scoring parse after determinization isdifferent fromthe top-scoring parse before determinization for 49.1% of the forests, and when the determinization method is &amp;quot;approximated&amp;quot; by crunching the top-500 parses from the undeterminized list only 55.9% of the top-scoring parses are the same, indicating the crunching method is not a very good approximation of determinization. We use the standard F-measure combination of recall and precision to score the top-scoring parse in each method against reference parses. The results are tabulated in Figure 8. Note that in 16.9% of cases determinization did not terminate. For those sentences we used the best parse fromtop-500 estimation instead.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML