XML Viewer - p05-2026

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-2026_metho.xml
Size: 16,455 bytes
Last Modified: 2025-10-06 14:09:47
<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2026">
  <Title>A Domain-Specific Statistical Surface Realizer</Title>
  <Section position="4" start_page="151" end_page="151" type="metho">
    <SectionTitle>
2.1 Outline
</SectionTitle>
    <Paragraph position="0"> The core of our method is a heuristic search of the space of possible SS trees. Our search goal is to find the N best complete SS trees that express the given semantic structure. We take 'best' here to be the trees which have the highest conditional likelihood given that they express the right semantic structure.</Paragraph>
    <Paragraph position="1"> If S is our semantic structure and LM is our statistical language model, we want to find syntactic trees T that maximize P</Paragraph>
  </Section>
  <Section position="5" start_page="151" end_page="151" type="metho">
    <SectionTitle>
LM
</SectionTitle>
    <Paragraph position="0"> (T|S).</Paragraph>
    <Paragraph position="1"> In order to search the space of trees, we build up trees by expanding one node at a time. During the search, then, we deal with incomplete trees; that is, trees with some nodes not fully expanded. This means that we need a way to determine how promising an incomplete tree T is: i.e., how good the best complete trees are that can be built up by expanding T. As it turns out (Section 2.2), we can efficiently approximate the function</Paragraph>
    <Paragraph position="3"> (T|S) for an incomplete tree, and this function is a good heuristic for the maximum likelihood of a complete tree extended from T.</Paragraph>
    <Paragraph position="4"> Here is an outline of the algorithm:  * Start with a root tree.</Paragraph>
    <Paragraph position="5"> - Take the top N trees and expand one node in each.</Paragraph>
    <Paragraph position="6"> - Score each expanded tree for P LM (T|S), and put in the search order accordingly. - Repeat until we find enough trees that satisfy S.</Paragraph>
    <Paragraph position="7"> * Complete the trees.</Paragraph>
    <Paragraph position="8"> * Linearize and lexicalize the trees. * Rank the complete trees according to some scoring function.</Paragraph>
    <Section position="1" start_page="151" end_page="151" type="sub_section">
      <SectionTitle>
2.2 Heuristic
</SectionTitle>
      <Paragraph position="0"> Our search goal is to maximize P</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="151" end_page="151" type="metho">
    <SectionTitle>
LM
</SectionTitle>
    <Paragraph position="0"> (T|S). (Henceforth we abbreviate P</Paragraph>
  </Section>
  <Section position="7" start_page="151" end_page="153" type="metho">
    <SectionTitle>
LM
</SectionTitle>
    <Paragraph position="0"> as just P.) Ideally, then, we would at each step expand the incomplete tree that can be extended to the highest-likelihood complete tree, i.e. that has the highest value of  Since finding this maximum explicitly is not feasible, we use the heuristic P(T|S). By Bayes' rule, P(T|S)=P(S|T)P(T)/P(S), where P(S) is a normalizing factor, P(T) can be easily calculated using the language model (as the product of the probabilities of the node expansions that appear in  |S), since the maximum is one of the terms in the sum. This fact is analogous to showing that P(T|S) is an admissible heuristic (in the sense of A* search).</Paragraph>
    <Paragraph position="1"> We can see how to calculate P(T|S) in practice by decomposing the structure of a tree T  extends T, the top of T prime is identical to T. The semantic tree S will have some of its nodes in T, and some in the part of T prime that extends beyond T. Let a(S,T) be the set containing the highest nodes in S that are not in T. Each node s [?] a(S,T) is the root node of a subtree in T</Paragraph>
    <Paragraph position="3"> of these subtrees can be considered separately.</Paragraph>
    <Paragraph position="4"> First we consider how the these subtrees are joined to the nodes in T. The condition of consistent ordering requires that each node in a(S,T) be a descendant in T prime of its parent in S, and moreover it should not be a descendant of any of its siblings in S. Let sib be a set of siblings in a(S,T), and let p be their semantic parent. Then p is the root node of a subtree of T, called T p . We will designate the T-set of sib as the set of leaves of T p that are not descended from any nodes in S below p-in particular, that are not descended from any other siblings of the nodes in sib. Then in T prime all of the nodes in sib must descend from the T-set of sib. In other words, there is a set of subtrees of T prime which are rooted at the nodes in the T-set of sib, and all of the nodes in sib appear in these subtrees such that none of them are descended from each other.</Paragraph>
    <Paragraph position="5"> This analysis sets us up to rewrite P(T|S) in terms of sums over these various subtrees. We use the notation P({x  Rather than calculating the value of formula 2 exactly, we now introduce an approximation to our heuristic function. For sets X, Y , we approximate</Paragraph>
    <Paragraph position="7"> to two simplifications: first, we drop the restriction that no node be descended from its semantic sibling; second, we assume that the probabilities of each node descending from X are independent from one another.</Paragraph>
    <Paragraph position="9"> one' function.</Paragraph>
    <Paragraph position="10">  This means that we can approximate</Paragraph>
    <Paragraph position="12"> That is, given the probabilities of a set of events, the Atleast-one function gives the probability of at least one of the events occuring. For independent events, AL1{} =0and  The calculation of P(T|S) has been reduced to finding P(x - y) for individual nodes. These values are retrieved from the inheritance table, described below.</Paragraph>
    <Paragraph position="13"> Note that when we expand a single node of an incomplete tree, only a few factors in Equation 3 change. Rather than recalculating each tree's score from scratch, then, by caching intermediate results we can recompute only the terms that change. This allows for efficient calculation of the heuristic function. null</Paragraph>
    <Section position="1" start_page="153" end_page="153" type="sub_section">
      <SectionTitle>
2.3 Inheritance Table
</SectionTitle>
      <Paragraph position="0"> The inheritance table (IT) allows us to predict the potential descendants of an incomplete tree. For each pair of SS nodes x and y, the IT stores P(x y), the probability that y will eventually appear as a descendant of x. The IT is precomputed once from the language model; the same IT is used for all queries.</Paragraph>
      <Paragraph position="1"> We can compute the IT using an iterative process. Consider the transformation T that takes a distribution Q(x - y) to a new distribution T(Q) such that T(Q)(x - y) is equal to 1 when x = y, and otherwise is equal to summationdisplay</Paragraph>
      <Paragraph position="3"> (z|x) is the probability of the expansion z according to the language model.</Paragraph>
      <Paragraph position="4"> The defining property of the IT's distribution P is that T(P)=P. We can use this property to compute the table iteratively. Begin by setting</Paragraph>
      <Paragraph position="6"> ). When this process converges, the limiting function is the correct inheritance distribution.</Paragraph>
    </Section>
    <Section position="2" start_page="153" end_page="153" type="sub_section">
      <SectionTitle>
2.4 Completing Trees
</SectionTitle>
      <Paragraph position="0"> A final important issue is termination. Ordinarily, it would be sensible to remove a tree from the search order only when it is a goal state--that is, if it is a complete tree that satisfies S. However, this turns out to be not the best approach in this case due to a quirk of our heuristic. P(T|S) has two non-constant factors, P(S|T) and P(T). Once all of the nodes in S appear in an incomplete tree T, P(S|T)=1, and so it won't increase as the tree is expanded further. Moreover, with each node expanded, P(T) decreases. This means that we are unlikely to make progress beyond the point where all of the semantic content appears in a tree.</Paragraph>
      <Paragraph position="1"> An effective way to deal with this is to remove trees from the search order as soon as P(S|T) reaches 1. When the search terminates by finding enough of these 'almost complete' trees, these trees are completed: we find the optimal complete trees by repeatedly expanding the N most likely almostcomplete trees (ranked by P(T)) until sufficiently many complete trees are found.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="153" end_page="154" type="metho">
    <SectionTitle>
3 Implementation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="153" end_page="154" type="sub_section">
      <SectionTitle>
3.1 Representation
</SectionTitle>
      <Paragraph position="0"> Our semantic representation is based on the HALogen input structure (Langkilde-Geary, 2002). The meaning of a sentence is represented by a tree whose nodes are each marked with a concept and a semantic role. For example, the meaning of the sentence &amp;quot;Turn left at the second traffic light&amp;quot; is represented by the following structure:  The syntax model we use is statistical dependency grammar. As we outlined in Section 2, the semantic and syntactic structures are attached to one another in an SS tree. In order to accomodate the requirement that each semantic node is attached to no more than one syntactic node, collocations like &amp;quot;traffic light&amp;quot; or &amp;quot;John Hancock Tower&amp;quot;, are treated as single syntactic nodes. It can also be convenient to extend this idea, treating phrases like &amp;quot;turn around&amp;quot; or &amp;quot;thank you very much&amp;quot; as atomic. In the case where a concept attaches to multi-word expression, but where it is inconvenient to treat the expression as a syntactic atom, we adopt the convention of attaching the concept to the hierarchically dominant word in the expression. For instance, the concept of turning can be attached to the expression &amp;quot;make a  turn&amp;quot;; in this case we attach the concept to the word &amp;quot;make&amp;quot;, and not to &amp;quot;turn&amp;quot;.</Paragraph>
      <Paragraph position="1"> The nodes of an SS tree are (word, part of speech, concept, semantic role) 4-tuples, where the concept and role are left empty for function words, and the word and part of speech are left empty for concepts with no direct syntactic correlate. Generally we omit the word itself from the tree in order to mitigate sparsity issues; these are added to the final full tree by a lexical choice module.</Paragraph>
      <Paragraph position="2"> We use a domain-trained language model based on the same dependency structure as our syntactic-semantic representations. The currently implemented model calculates the probability of expansions given a parent node based on an explicit tabular representation of the distribution P(z|x) for each x. This language model is also used to score and rank generated sentences.</Paragraph>
    </Section>
    <Section position="2" start_page="154" end_page="154" type="sub_section">
      <SectionTitle>
3.2 Corpus and Annotation
</SectionTitle>
      <Paragraph position="0"> Training this language model requires an annotated corpus of in-domain text. Our main corpus comes from transcripts of direction-giving in a simulation context, collected using the &amp;quot;Wizard of Oz&amp;quot; set-up described in (Cheng et al., 2004). For development and testing, we extracted approximately 600 instructions, divided into training and test sets. The training set was used to train the language model used for search, the lexical choice module, and the scoring function. Both sets both underwent four partiallyautomated stages of annotation.</Paragraph>
      <Paragraph position="1"> First we tag words with their part of speech, using the Brill tagger with manually modified lexicon and transformation rules for our domain (Brill, 1995).</Paragraph>
      <Paragraph position="2"> Second, the words are disambiguated and assigned a concept tag. For this we construct a domain ontology, which is used to automatically tag the unambiguous words and prompt for human disambiguation in the remaining cases. The third step is to assign semantic roles. This is accomplished by using a list of contextual rules, similar to the rules used by the Brill tagger. For example, the rule CON intersection PREV1OR2OR3WD at : spatial-locating assigns the role &amp;quot;spatial-locating&amp;quot; to a word whose concept is &amp;quot;intersection&amp;quot; if the word &amp;quot;at&amp;quot; appears one, two, or three words before it. A segment of the corpus was automatically annotated using such rules, then a human annotater made corrections and added new rules, repeating these steps until the corpus was fully annotated with semantic roles.</Paragraph>
      <Paragraph position="3"> After the first three stages, the sentence, &amp;quot;Turn left at the next intersection&amp;quot; is annotated as follows: turn/VB/maketurn left/RB/ $leftright/direction at/IN the/ DT next/JJ/first/modifier intersection/NN/intersection/ spatial-locating The final annotation step is parsing. For this we use an approach similar to Pereira and Schabes' grammar induction from partially bracketed text (Pereira and Schabes, 1992). First we annotate a segment of the corpus. Then we use the inside-outside algorithm to simultaneously train a dependency grammar and complete the annotation. We then manually correct a further segment of the annotation, and repeat until acceptable parses are obtained. null</Paragraph>
    </Section>
    <Section position="3" start_page="154" end_page="154" type="sub_section">
      <SectionTitle>
3.3 Rendering
</SectionTitle>
      <Paragraph position="0"> Linearizing an SS tree amounts to deciding the order of the branches and whether each appears on the left or the right side of the head. We built this information into our language model, so a grammar rule for expanding a node includes full ordering information. This makes the linearization step trivial at the cost of adding sparsity to the language model.</Paragraph>
      <Paragraph position="1"> Lexicalization could be relegated to the language model in the same way, by including lexemes in the representation of each node, but again this would incur sparsity costs. The other option is to delegate lexical choice to a separate module, which takes a SS tree and assigns a word to each node. We use a hybrid approach: content words are assigned using a lexical choice module, while most function words are included explicitly in the language model. The current lexical choice module simply assigns each unlabeled node the most likely word conditioned on its (POS, concept, role) triple, as observed in the training corpus.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="154" end_page="155" type="metho">
    <SectionTitle>
4 Example
</SectionTitle>
    <Paragraph position="0"> We take the semantic structure presented in Section 3.1 as an example generation query. The search  stage terminates when 100 trees that embed this semantic structure have been found. The best-scoring sentence has the following lexicalized tree:  This is finally rendered thus: turn left at the second traffic light.</Paragraph>
  </Section>
  <Section position="10" start_page="155" end_page="155" type="metho">
    <SectionTitle>
5 Preliminary Results
</SectionTitle>
    <Paragraph position="0"> For initial testing, we separated the annotated corpus into a 565-sentence training set and a 57-sentence test set. We automatically extracted semantic structures from the test set, then used these structures as generation queries, returning only the highest-ranked sentence for each query. The generated results were then evaluated by three independent human annotaters along two dimensions: (1) Is the generated sentence grammatical? (2) Does the generated sentence have the same meaning as the original sentence? For 11 of the 57 sentences (19%), the query extraction failed due to inadequate grammar coverage. null  Of the 46 instances where a query was successfully extracted, 3 queries (7%) timed out without producing output. Averaging the annotaters' judgments, 1 generated sentence (2%) was ungrammatical, and 3 generated sentences (7%) had different meanings from their originals. 39 queries (85%) produced output that was both grammatical and faithful to the original sentence's meaning.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML