XML Viewer - w05-0602

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0602_metho.xml
Size: 21,144 bytes
Last Modified: 2025-10-06 14:09:53
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0602">
  <Title>A Statistical Semantic Parser that Integrates Syntax and Semantics</Title>
  <Section position="5" start_page="9" end_page="10" type="metho">
    <SectionTitle>
3 Semantic Parsing Framework
</SectionTitle>
    <Paragraph position="0"> This section describes our basic framework for semantic parsing, which is based on a fairly standard approach to compositional semantics (Jurafsky and Martin, 2000). First, a statistical parser is used to construct a SAPT that captures the semantic interpretation of individual words and the basic predicate-argument structure of the sentence.</Paragraph>
    <Paragraph position="1"> Next, a recursive procedure is used to compositionally construct an MR for each node in the SAPT from the semantic label of the node and the MR's  of its children. Syntactic structure provides information of how the parts should be composed. Ambiguities arise in both syntactic structure and the semantic interpretation of words and phrases. By integrating syntax and semantics in a single statistical parser that produces an SAPT, we can use both semantic information to resolve syntactic ambiguities and syntactic information to resolve semantic ambiguities. null In a SAPT, each internal node in the parse tree is annotated with a semantic label. Figure 1 shows the SAPT for a simple sentence in the CLANG domain. The semantic labels which are shown after dashes are concepts in the domain. Some type concepts do not take arguments, like team and unum (uniform number). Some concepts, which we refer to as predicates, take an ordered list of arguments, like player and bowner (ball owner). The predicate-argument knowledge, C3, specifies, for each predicate, the semantic constraints on its arguments. Constraints are specified in terms of the concepts that can fill each argument, such as player(team, unum) and bowner(player). A special semantic label D2D9D0D0 is used for nodes that do not correspond to any concept in the domain.</Paragraph>
    <Paragraph position="2"> Figure 2 shows the basic algorithm for building an MR from an SAPT. Figure 3 illustrates the  construction of the MR for the SAPT in Figure 1.</Paragraph>
    <Paragraph position="3"> Nodes are numbered in the order in which the construction of their MR's are completed. The first step, GETSEMANTICHEAD, determines which of a node's children is its semantic head based on having a matching semantic label. In the example, node N3 is determined to be the semantic head of the sentence, since its semantic label, bowner, matches N8's semantic label. Next, the MR of the semantic head is constructed recursively. The semantic head of N3 is clearly N1. Since N1 is a part-of-speech (POS) node, its semantic label directly determines its MR, which becomes bowner( ). Once the MR for the head is constructed, the MR of all other (non-head) children are computed recursively, and COMPOSEMR assigns their MR's to fill the arguments in the head's MR to construct the complete MR for the node. Argument constraints are used to determine the appropriate filler for each argument. Since, N2 has a null label, the MR of N3 also becomes bowner( ). When computing the MR for N7, N4 is determined to be the head with the MR: player( , ). COMPOSEMR then assigns N5's MR to fill the team argument and N6's MR to fill the unum argument to construct N7's complete MR: player(our, 2). This MR in turn is composed with the MR for N3 to yield the final MR for the sentence: bowner(player(our,2)).</Paragraph>
    <Paragraph position="4"> For MRL's, such as CLANG, whose syntax does not strictly follow a nested set of predicates and arguments, some final minor syntactic adjustment of the final MR may be needed. In the example, the final MR is (bowner (player our CU2CV)). In the following discussion, we ignore the difference between these two.</Paragraph>
    <Paragraph position="5"> There are a few complications left which require special handling when generating MR's, like coordination, anaphora resolution and non-compositionality exceptions. Due to space limitations, we do not present the straightforward techniques we used to handle them.</Paragraph>
  </Section>
  <Section position="6" start_page="10" end_page="11" type="metho">
    <SectionTitle>
4 Corpus Annotation
</SectionTitle>
    <Paragraph position="0"> This section discusses how sentences for training SCISSOR were manually annotated with SAPT's.</Paragraph>
    <Paragraph position="1"> Sentences were parsed by Collins' head-driven model 2 (Bikel, 2004) (trained on sections 02-21 of the WSJ Penn Treebank) to generate an initial syntactic parse tree. The trees were then manually corrected and each node augmented with a semantic label.</Paragraph>
    <Paragraph position="2"> First, semantic labels for individual words, called semantic tags, are added to the POS nodes in the tree. The tag null is used for words that have no corresponding concept. Some concepts are conveyed by phrases, like &amp;quot;has the ball&amp;quot; for bowner in the previous example. Only one word is labeled with the concept; the syntactic head word (Collins, 1997) is preferred. During parsing, the other words in the phrase will provide context for determining the semantic label of the head word.</Paragraph>
    <Paragraph position="3"> Labels are added to the remaining nodes in a bottom-up manner. For each node, one of its children is chosen as the semantic head, from which it will inherit its label. The semantic head is chosen as the child whose semantic label can take the MR's of the other children as arguments. This step was done mostly automatically, but required some manual corrections to account for unusual cases.</Paragraph>
    <Paragraph position="4"> In order for COMPOSEMR to be able to construct the MR for a node, the argument constraints for its semantic head must identify a unique concept to fill each argument. However, some predicates take multiple arguments of the same type, such as point.num(num,num), which is a kind of point that represents a field coordinate in CLANG.</Paragraph>
    <Paragraph position="5"> In this case, extra nodes are inserted in the tree with new type concepts that are unique for each argument. An example is shown in Figure 4 in which the additional type concepts num1 and num2 are introduced. Again, during parsing, context will be used to determine the correct type for a given word.</Paragraph>
    <Paragraph position="6"> The point label of the root node of Figure 4 is the concept that includes all kinds of points in CLANG.</Paragraph>
    <Paragraph position="7"> Once a predicate has all of its arguments filled, we  use the most general CLANG label for its concept (e.g. point instead of point.num). This generality avoids sparse data problems during training.</Paragraph>
  </Section>
  <Section position="7" start_page="11" end_page="11" type="metho">
    <SectionTitle>
5 Integrated Parsing Model
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
5.1 Collins Head-Driven Model 2
</SectionTitle>
      <Paragraph position="0"> Collins' head-driven model 2 is a generative, lexicalized model of statistical parsing. In the following section, we follow the notation in (Collins, 1997).</Paragraph>
      <Paragraph position="1"> Each non-terminal CG in the tree is a syntactic label, which is lexicalized by annotating it with a word, DB, and a POS tag, D8 D7DDD2 . Thus, we write a non-terminal as CGB4DCB5, where X is a syntactic label and</Paragraph>
      <Paragraph position="3"> CX. CGB4DCB5 is then what is generated by the generative model.</Paragraph>
      <Paragraph position="4"> Each production C4C0CB B5 CAC0CB in the PCFG is in the form:</Paragraph>
      <Paragraph position="6"> where C0 is the head-child of the phrase, which inherits the head-word CW from its parent C8. C4  are left and right modifiers of C0.</Paragraph>
      <Paragraph position="7"> Sparse data makes the direct estimation of C8B4CAC0CBCYC4C0CBB5 infeasible. Therefore, it is decomposed into several steps - first generating the head, then the right modifiers from the head outward, then the left modifiers in the same way. Syntactic subcategorization frames, LC and RC, for the left and right modifiers respectively, are generated before the generation of the modifiers. Subcat frames represent knowledge about subcategorization preferences. The final probability of a production is composed from the following probabilities:  1. The probability of choosing a head constituent label H: C8 CW B4C0CYC8BNCWB5.</Paragraph>
      <Paragraph position="8"> 2. The probabilities of choosing the left and right subcat frames LC and RC: C8  Where A1 is the measure of the distance from the head word to the edge of the constituent,</Paragraph>
      <Paragraph position="10"> The model stops generating more modifiers when CBCCC7C8 is generated.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
5.2 Integrating Semantics into the Model
</SectionTitle>
      <Paragraph position="0"> We extend Collins' model to include the generation of semantic labels in the derivation tree. Unless otherwise stated, notation has the same meaning as in Section 5.1. The subscript D7DDD2 refers to the syntactic part, and D7CTD1 refers to the semantic part. We redefine CG and DC to include semantics, each non-terminal CG is now a pair of a syntactic la- null ure 5 shows a lexicalized SAPT (but omitting D8</Paragraph>
      <Paragraph position="2"> Similar to the syntactic subcat frames, we also condition the generation of modifiers on semantic subcat frames. Semantic subcat frames give semantic subcategorization preferences; for example, player takes a team and a unum. Thus C4BV and CABV are now: CWC4BV  new definitions of CGB4DCB5, C4BV and CABV. The implementation of semantic subcat frames is similar to syntactic subcat frames. They are multisets specifying the semantic labels which the head requires in its left or right modifiers.</Paragraph>
      <Paragraph position="3"> As an example, the probability of generating the phrase &amp;quot;our player 2&amp;quot; using NP-[player](player) AX</Paragraph>
    </Section>
    <Section position="3" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
5.3 Smoothing
</SectionTitle>
      <Paragraph position="0"> Since the left and right modifiers are independently generated in the same way, we only discuss smoothing for the left side. Each probability estimation in the above generation steps is called a parameter. To reduce the risk of sparse data problems, the parameters are decomposed as follows:  separately in the syntactic and semantic outputs. We make the independence assumption that the syntactic output is only conditioned on syntactic features, and semantic output on semantic ones. Note that the syntactic and semantic parameters are still integrated in the model to find the globally most likely parse. The syntactic parameters are the same as in Section 5.1 and are smoothed as in (Collins, 1997). We've also tried different ways of conditioning syntactic output on semantic features and vice versa, but they didn't help. Our explanation is the integrated syntactic and semantic parameters have already captured the benefit of this integrated approach in our experimental domains.</Paragraph>
      <Paragraph position="1"> Since the semantic parameters do not depend on any syntactic features, we omit the D7CTD1 subscripts in the following discussion. As in (Collins, 1997),  Note this smoothing is different from the syntactic counterpart. This is due to the difference between POS tags and semantic tags; namely, semantic tags are generally more specific.</Paragraph>
      <Paragraph position="2"> Table 1 shows the various levels of back-off for each semantic parameter. The probabilities from these back-off levels are interpolated using the techniques in (Collins, 1997). All words occurring less than 3 times in the training data, and words in test data that were not seen in training, are unknown words and are replaced with the &amp;quot;UNKNOWN&amp;quot; token. Note this threshold is smaller than the one used in (Collins, 1997) since the corpora used in our experiments are smaller.</Paragraph>
    </Section>
    <Section position="4" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
5.4 POS Tagging and Semantic Tagging
</SectionTitle>
      <Paragraph position="0"> For unknown words, the POS tags allowed are limited to those seen with any unknown words during training. Otherwise they are generated along with the words using the same approach as in (Collins, 1997). When parsing, semantic tags for each known word are limited to those seen with that word during training data. The semantic tags allowed for an unknown word are limited to those seen with its associated POS tags during training.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="11" end_page="13" type="metho">
    <SectionTitle>
6 Experimental Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="11" end_page="13" type="sub_section">
      <SectionTitle>
6.1 Methodology
</SectionTitle>
      <Paragraph position="0"> Two corpora of NL sentences paired with MR's were used to evaluate SCISSOR. For CLANG, 300 pieces of coaching advice were randomly selected from the log files of the 2003 RoboCup Coach Competition. Each formal instruction was translated into English by one of four annotators (Kate et al., 2005). The average length of an NL sentence in this corpus is 22.52 words. For GEOQUERY, 250 questions were collected by asking undergraduate students to generate English queries for the given database. Queries were then manually translated  into logical form (Zelle and Mooney, 1996). The average length of an NL sentence in this corpus is 6.87 words. The queries in this corpus are more complex than those in the ATIS database-query corpus used in the speech community (Zue and Glass, 2000) which makes the GEOQUERY problem harder, as also shown by the results in (Popescu et al., 2004). The average number of possible semantic tags for each word which can represent meanings in CLANG is 1.59 and that in GEOQUERY is 1.46.</Paragraph>
      <Paragraph position="1"> SCISSOR was evaluated using standard 10-fold cross validation. NL test sentences are first parsed to generate their SAPT's, then their MR's were built from the trees. We measured the number of test sentences that produced complete MR's, and the number of these MR's that were correct. For CLANG, an MR is correct if it exactly matches the correct representation, up to reordering of the arguments of commutative operators like and. For GEOQUERY, an MR is correct if the resulting query retrieved the same answer as the correct representation when submitted to the database. The performance of the parser was then measured in terms of precision (the percentage of completed MR's that were correct) and recall (the percentage of all sentences whose MR's were correctly generated).</Paragraph>
      <Paragraph position="2"> We compared SCISSOR's performance to several previous systems that learn semantic parsers that can map sentences into formal MRL's. CHILL (Zelle and Mooney, 1996) is a system based on Inductive Logic Programming (ILP). We compare to the version of CHILL presented in (Tang and Mooney, 2001), which uses the improved COCKTAIL ILP system and produces more accurate parsers than the original version presented in (Zelle and Mooney, 1996). SILT is a system that learns symbolic, pattern-based, transformation rules for mapping NL sentences to formal languages (Kate et al., 2005). It comes in two versions, SILT-string, which maps NL strings directly to an MRL, and SILT-tree, which maps syntactic  parse trees (generated by the Collins parser) to an MRL. In the GEOQUERY domain, we also compare to the original hand-built parser GEOBASE.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="13" end_page="14" type="metho">
    <SectionTitle>
6.2 Results
</SectionTitle>
    <Paragraph position="0"> Figures 6 and 7 show the precision and recall learning curves for GEOQUERY, and Figures 8 and 9 for CLANG. Since CHILL is very memory intensive, it could not be run with larger training sets of the CLANG corpus.</Paragraph>
    <Paragraph position="1"> Overall, SCISSOR gives the best precision and recall results in both domains. The only exception is with recall for GEOQUERY, for which CHILL is slightly higher. However, SCISSOR has significantly higher precision (see discussion in Section 7).</Paragraph>
    <Paragraph position="2">  Results on a larger GEOQUERY corpus with 880 queries have been reported for PRECISE (Popescu et al., 2003): 100% precision and 77.5% recall. On the same corpus, SCISSOR obtains 91.5% precision and 72.3% recall. However, the figures are not comparable. PRECISE can return multiple distinct SQL queries when it judges a question to be ambiguous and it is considered correct when any of these SQL queries is correct. Our measure only considers the top result. Due to space limitations, we do not present complete learning curves for this corpus.</Paragraph>
  </Section>
  <Section position="10" start_page="14" end_page="14" type="metho">
    <SectionTitle>
7 Related Work
</SectionTitle>
    <Paragraph position="0"> We first discuss the systems introduced in Section</Paragraph>
  </Section>
  <Section position="11" start_page="14" end_page="15" type="metho">
    <SectionTitle>
6. CHILL uses computationally-complex ILP meth-
</SectionTitle>
    <Paragraph position="0"> ods, which are slow and memory intensive. The string-based version of SILT uses no syntactic information while the tree-based version generates a syntactic parse first and then transforms it into an MR. In contrast, SCISSOR integrates syntactic and semantic processing, allowing each to constrain and inform the other. It uses a successful approach to statistical parsing that attempts to find the SAPT with maximum likelihood, which improves robustness compared to purely rule-based approaches. However, SCISSOR requires an extra training input, gold-standard SAPT's, not required by these other systems. Further automating the construction of training SAPT's from sentences paired with MR's is a subject of on-going research.</Paragraph>
    <Paragraph position="1"> PRECISE is designed to work only for the specific task of NL database interfaces. By comparison, SCISSOR is more general and can work with other MRL's as well (e.g. CLANG). Also, PRECISE is not a learning system and can fail to parse a query it considers ambiguous, even though it may not be considered ambiguous by a human and could potentially be resolved by learning regularities in the training data.</Paragraph>
    <Paragraph position="2"> In (Lev et al., 2004), a syntax-driven approach is used to map logic puzzles described in NL to an MRL. The syntactic structures are paired with hand-written rules. A statistical parser is used to generate syntactic parse trees, and then MR's are built using compositional semantics. The meaning of open-category words (with only a few exceptions) is considered irrelevant to solving the puzzle and their meanings are not resolved. Further steps would be needed to generate MR's in other domains like CLANG and GEOQUERY. No empirical results are reported for their approach.</Paragraph>
    <Paragraph position="3"> Several machine translation systems also attempt to generate MR's for sentences. In (et al., 2002), an English-Chinese speech translation system for limited domains is described. They train a statistical parser on trees with only semantic labels on the nodes; however, they do not integrate syntactic and semantic parsing.</Paragraph>
    <Paragraph position="4"> History-based models of parsing were first introduced in (Black et al., 1993). Their original model also included semantic labels on parse-tree nodes, but they were not used to generate a formal MR. Also, their parsing model is impoverished compared to the history included in Collins' more recent model. SCISSOR explores incorporating semantic labels into Collins' model in order to produce a complete SAPT which is then used to generate a formal MR.</Paragraph>
    <Paragraph position="5"> The systems introduced in (Miller et al., 1996; Miller et al., 2000) also integrate semantic labels into parsing; however, their SAPT's are used to pro- null duce a much simpler MR, i.e., a single semantic frame. A sample frame is AIRTRANSPORTATION which has three slots - the arrival time, origin and destination. Only one frame needs to be extracted from each sentence, which is an easier task than our problem in which multiple nested frames (predicates) must be extracted. The syntactic model in (Miller et al., 2000) is similar to Collins', but does not use features like subcat frames and distance measures. Also, the non-terminal label CG is not further decomposed into separately-generated semantic and syntactic components. Since it used much more specific labels (the cross-product of the syntactic and semantic labels), its parameter estimates are potentially subject to much greater sparse-data problems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML