XML Viewer - p06-1109

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1109_metho.xml
Size: 8,120 bytes
Last Modified: 2025-10-06 14:10:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1109">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An All-Subtrees Approach to Unsupervised Parsing</Title>
  <Section position="5" start_page="866" end_page="867" type="metho">
    <SectionTitle>
3 U-DOP
</SectionTitle>
    <Paragraph position="0"> U-DOP extends DOP1 to unsupervised parsing (Bod 2006). Its key idea is to assign all unlabeled binary trees to a set of sentences and to next use (in principle) all subtrees from these binary trees to parse new sentences. U-DOP thus proposes one of the richest possible models in bootstrapping trees.</Paragraph>
    <Paragraph position="1"> Previous models like Klein and Manning's (2002, 2005) CCM model limit the dependencies to &amp;quot;contiguous subsequences of a sentence&amp;quot;. This means that CCM neglects dependencies that are non-contiguous such as between more and than in &amp;quot;BA carried more people than cargo&amp;quot;. Instead, U-DOP's all-subtrees approach captures both contiguous and non-contiguous lexical dependencies. null As with most other unsupervised parsing models, U-DOP induces trees for p-o-s strings rather than for word strings. The extension to word strings is straightforward as there exist highly accurate unsupervised part-of-speech taggers (e.g.</Paragraph>
    <Paragraph position="2"> Schutze 1995) which can be directly combined with unsupervised parsers.</Paragraph>
    <Paragraph position="3"> To give an illustration of U-DOP, consider the WSJ p-o-s string NNS VBD JJ NNS which may correspond for instance to the sentence Investors suffered heavy losses. U-DOP starts by assigning all possible binary trees to this string, where each root node is labeled S and each internal node is labeled X. Thus NNS VBD JJ NNS has a total of five binary trees shown in figure 4 -- where for readability we add words as well.</Paragraph>
    <Paragraph position="4">  (Investors suffered heavy losses) While we can efficiently represent the set of all binary trees of a string by means of a chart, we need to unpack the chart if we want to extract subtrees from this set of binary trees. And since the total number of binary trees for the small WSJ10 is almost 12 million, it is doubtful whether we can apply the unrestricted U-DOP model to such a corpus. U-DOP therefore randomly samples a large subset from the total number of parse trees from the chart (see Bod 2006) and next converts the subtrees from these parse trees into a PCFG-reduction (Goodman 2003). Since the computation of the most probable parse tree is NP-complete (Sima'an 1996), U-DOP estimates the most probable tree from the 100 most probable derivations using Viterbi n-best parsing. We could also have used the more efficient k-best hypergraph parsing technique by Huang and Chiang (2005), but we have not yet incorporated this into our implementation.</Paragraph>
    <Paragraph position="5"> To give an example of the dependencies that U-DOP can take into account, consider the following subtrees in figure 5 from the trees in  figure 4 (where we again add words for readability). These subtrees show that U-DOP takes into account both contiguous and non-contiguous substrings.</Paragraph>
    <Paragraph position="6">  Of course, if we only had the sentence Investors suffered heavy losses in our corpus, there would be no difference in probability between the five parse trees in figure 4. However, if we also have a different sentence where JJ NNS ( heavy losses) appears in a different context, e.g. in Heavy losses were reported, its covering subtree gets a relatively higher frequency and the parse tree where heavy losses occurs as a constituent gets a higher total probability.</Paragraph>
  </Section>
  <Section position="6" start_page="867" end_page="868" type="metho">
    <SectionTitle>
4 ML-DOP
</SectionTitle>
    <Paragraph position="0"> ML-DOP (Bod 2000) extends DOP with a maximum likelihood reestimation technique based on the expectation-maximization (EM) algorithm (Dempster et al. 1977) which is known to be statistically consistent (Shao 1999). ML-DOP reestimates DOP's subtree probabilities in an iterative way until the changes become negligible. The following exposition of ML-DOP is heavily based on previous work by Bod (2000) and Magerman (1993).</Paragraph>
    <Paragraph position="1"> It is important to realize that there is an implicit assumption in DOP that all possible derivations of a parse tree contribute equally to the total probability of the parse tree. This is equivalent to saying that there is a hidden component to the model, and that DOP can be trained using an EM algorithm to determine the maximum likelihood estimate for the training data. The EM algorithm for this ML-DOP model is related to the Inside-Outside algorithm for context-free grammars, but the reestimation formula is complicated by the presence of subtrees of depth greater than 1. To derive the reestimation formula, it is useful to consider the state space of all possible derivations of a tree. The derivations of a parse tree T can be viewed as a state trellis, where each state contains a partially constructed tree in the course of a leftmost derivation of T. st denotes a state containing the tree t which is a subtree of T. The state trellis is defined as follows.</Paragraph>
    <Paragraph position="2"> The initial state, s0, is a tree with depth zero, consisting of simply a root node labeled with S. The final state, sT, is the given parse tree T.</Paragraph>
    <Paragraph position="3"> A state st is connected forward to all states stf such that tf = t deg t', for some t' . Here the appropriate t' is defined to be tf [?] t.</Paragraph>
    <Paragraph position="4"> A state st is connected backward to all states stb such that t = tb deg t', for some t' . Again, t' is defined to be t [?] tb.</Paragraph>
    <Paragraph position="5"> The construction of the state lattice and assignment of transition probabilities according to the ML-DOP model is called the forward pass. The probability of a given state, P(s), is referred to as a(s). The forward probability of a state st is computed recursively</Paragraph>
    <Paragraph position="7"> The backward probability of a state, referred to as b(s), is calculated according to the following recursive formula:</Paragraph>
    <Paragraph position="9"> where the backward probability of the goal state is set equal to the forward probability of the goal state, b(sT) = a(sT).</Paragraph>
    <Paragraph position="10"> The update formula for the count of a subtree t is (where r(t) is the root label of t):  The updated probability distribution, P'(t  |r(t)), is defined to be</Paragraph>
    <Paragraph position="12"> In practice, ML-DOP starts out by assigning the same relative frequencies to the subtrees as DOP1, which are next reestimated by the formulas above.</Paragraph>
    <Paragraph position="13"> We may in principle start out with any initial parameters, including random initializations, but since ML estimation is known to be very sensitive to the initialization of the parameters, it is convenient to start with parameters that are known to perform well.</Paragraph>
    <Paragraph position="14"> To avoid overtraining, ML-DOP uses the subtrees from one half of the training set to be trained on the other half, and vice versa. This crosstraining is important since otherwise UML-DOP would assign the training set trees their empirical frequencies and assign zero weight to all other subtrees (cf. Prescher et al. 2004). The updated probabilities are iteratively reestimated until the decrease in cross-entropy becomes negligible.</Paragraph>
    <Paragraph position="15"> Unfortunately, no compact PCFG-reduction of ML-DOP is known. As a consequence, parsing with ML-DOP is very costly and the model has hitherto never been tested on corpora larger than OVIS (Bonnema et al. 1997). Yet, we will show that by clever pruning we can extend our experiments not only to the WSJ, but also to the German NEGRA and the Chinese CTB. (Zollmann and Sima'an 2005 propose a different consistent estimator for DOP, which we cannot go into here).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML