File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1510_metho.xml

Size: 7,941 bytes

Last Modified: 2025-10-06 14:10:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1510">
  <Title>Probabilistic models for disambiguation of an HPSG-based chart generator</Title>
  <Section position="4" start_page="95" end_page="97" type="metho">
    <SectionTitle>
3 Disambiguation models for chart
</SectionTitle>
    <Paragraph position="0"> generation</Paragraph>
    <Section position="1" start_page="95" end_page="96" type="sub_section">
      <SectionTitle>
3.1 Packed representation of a chart
</SectionTitle>
      <Paragraph position="0"> As mentioned in Section 2.3, to estimate log-linear models for HPSG generation, we need all alternative derivation trees a40 a5a8a44 a9 generated from the input a44 . However, the size of a40 a5a8a44a10a9 is exponential to the cardinality of a44 and they cannot be enumerated explicitly.</Paragraph>
      <Paragraph position="1"> This problem is especially serious in wide-coverage grammars because such grammars are designed to cover a wide variety of linguistic phenomena, and thus produce many realizations. In this section, we present a method of making the estimation tractable which is similar to a technique developed for HPSG parsing.</Paragraph>
      <Paragraph position="2"> When estimating log-linear models, we map a40 a5a8a44 a9 in the chart into a packed representation called a feature forest, intuitively an &amp;quot;AND-OR&amp;quot; graph. Miyao and Tsujii (2005) represented a set of HPSG parse  trees using a feature forest and succeeded in estimating a1 a5a1a0 a4a2 a9 given a sentence a2 and a parse tree a0 using dynamic programming without unpacking the chart. If a40 a5a8a44 a9 is represented in a feature forest in generation, a1 a5a1a0 a4a44a10a9 can also be estimated in the same way.</Paragraph>
      <Paragraph position="3"> Figure 3 shows a feature forest representing the chart in Figure 2. Each node corresponds to either a lexical entry or a tuple of a36 a7a4a3 a28 a7a6a5 a28 a7 a18 a38 where a7a6a3 , a7a6a5 and a7 a18 are respectively the mother edge, the left daughter, and the right daughter in a single rule application. Nodes connected by dotted lines represent OR-nodes, i.e., equivalence classes in the same cell. Feature functions are assigned to OR-nodes.</Paragraph>
      <Paragraph position="4"> By doing so, we can capture important features for disambiguation in HPSG, i.e., combinations of a mother and its daughter(s). Nodes connected by solid arrows represent AND-nodes corresponding to the daughters of the parent node. By using feature forests, we can efficiently pack the node generated more than once in the set of trees. For example, the nodes corresponding to &amp;quot;the book&amp;quot; in &amp;quot;He bought the book.&amp;quot; and &amp;quot;the book he bought&amp;quot; are identical and described only once in the forest. The merits of using forest representations in generation instead of lattices or simple enumeration are discussed thoroughly by Langkilde (2000).</Paragraph>
    </Section>
    <Section position="2" start_page="96" end_page="97" type="sub_section">
      <SectionTitle>
3.2 Model variation
</SectionTitle>
      <Paragraph position="0"> We implemented and compared four different disambiguation models as Velldal and Oepen (2005) did. Throughout the models, we assigned a score called figure-of-merit (FOM) on each edge and calculated the FOM of a mother edge by dynamic programming. FOM represents the log probability of an edge which is not normalized.</Paragraph>
      <Paragraph position="1"> Baseline model We started with a simple baseline model, a1 a5a1a0 a4</Paragraph>
      <Paragraph position="3"> in the input semantic representation a44 and a41 is a lexical entry assigned to a0 . The FOM of the mother edge</Paragraph>
      <Paragraph position="5"> erence distribution (Miyao and Tsujii, 2005), i.e., a13 a11 is estimated to maximize the likelihood of the training data a15a1 , which is calculated with the following equation.</Paragraph>
      <Paragraph position="7"> Bigram model The second model is a log-linear model with only one feature that corresponds to bigram probabilities for adjacent word-pairs in the sentence. We estimated a bigram language model using a part of the British National Corpus as training data2. In the chart each edge is identified with the first and last word in the phrase as well as its feature structure and covered relations. When two edges are combined, a11 a5a8a7 a3 a9 is computed as</Paragraph>
      <Paragraph position="9"> last word of the left daughter, a2 a18 is the first word of the right daughter, and a1 a2 a11a4a3 a18a6a5 a3 represents a log probability of a bigram. Contrary to the method of Velldal and Oepen (2005) where the input is a set of sentences and a1a8a7a10a9 a3 a18a6a5 a3 is computed on a whole sentence, we computed a1 a2 a11a11a3 a18a12a5 a3 on each phrase as Langkilde (2000) did The language model can be extended to a37 -gram if each edge holds last a37 a1 a25 words although the number of edges increase.</Paragraph>
      <Paragraph position="10"> Syntax model The third model incorporates a variety of syntactic features and lexical features where a11 a5a8a7a6a3a21a9 is computed as a11 a5a8a7a4a5a9 a13 a11 a5a8a7 a18 a9 a13</Paragraph>
      <Paragraph position="12"> binations of atomic features shown in Table 1. The atomic features and their combinations are imported from the previous work on HPSG parsing (Miyao and Tsujii, 2005). We defined three types of feature combinations to capture the characteristics of binary and unary rule applications and root edges as described below.</Paragraph>
      <Paragraph position="14"> An example of extracted features is shown in Figure 4 where &amp;quot;bought the book&amp;quot; is combined with its subject &amp;quot;he&amp;quot;. Since the mother edge is a root edge, two features (a15 a18a6a55a12a55 a20 and a15 a2 a11 a7 a5 a18a56a13 ) are extracted from this node. In the a15 a18a6a55a56a55 a20 feature, the phrasal category SYM becomes S (sentence), the head word  RULE the name of the applied schema DIST the distance between the head words of the daughters COMMA whether a comma exists between daughters and/or inside of daughter phrases SPAN the number of words dominated by the phrase SYM the symbol of the phrasal category (e.g., NP, VP) WORD the surface form of the head word LE the lexical entry assigned to the head word WORD becomes &amp;quot;bought&amp;quot;, and its lexical entry LE becomes that of transitive verbs. In the a15 a2 a11 a7 a5 a18a12a13 feature, properties of the left and right daughters are instantiated in addition to those of the mother edge. Combined model The fourth and final model is the combination of the syntax model and the bigram model. This model is obtained by simply adding the bigram feature to the syntax model.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="97" end_page="98" type="metho">
    <SectionTitle>
4 Iterative beam search
</SectionTitle>
    <Paragraph position="0"> For efficient statistical generation with a wide-coverage grammar, we reduce the search space by pruning edges during generation. We use beam search where edges with low FOMs are pruned during generation. We use two parameters, a37 and a57 : in each cell, the generator prunes except for top a37 edges, and edges whose FOMs are lower than that of the top edge a57 are also pruned.</Paragraph>
    <Paragraph position="1"> Another technique for achieving efficiency is iterative generation which is adopted from iterative CKY parsing (Tsuruoka and Tsujii, 2004). When beam width is too narrow, correct edges to constitute a correct sentence may be discarded during gen- null eration and it causes degradation in coverage, i.e., the ratio the generator successfully outputs a sentence. The appropriate beam width depends on inputs and cannot be predefined. In iterative generation, the process of chart generation is repeated with increasing beam width until a complete sentence is generated or the beam width exceeds the predefined maximum.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML