File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1021_metho.xml

Size: 22,344 bytes

Last Modified: 2025-10-06 14:14:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1021">
  <Title>A DOP Model for Semantic Interpretation*</Title>
  <Section position="4" start_page="0" end_page="159" type="metho">
    <SectionTitle>
2 Data-Oriented Syntactic Analysis
</SectionTitle>
    <Paragraph position="0"> So far, the data-oriented processing method has mainly been applied to corpora with simple syntactic annotations, consisting of labelled trees. Let us illustrate this with a very simple imaginary example.</Paragraph>
    <Paragraph position="1"> Suppose that a corpus consists of only two trees:  We employ one operation for combining subtrees, called composition, indicated as o; this operation identifies the leftmost nonterminal leaf node of one tree with the root node of a second tree (i.e., the second tree is substituted on the leftmost nontermi- null nal leaf node of the first tree). A new input sentence like &amp;quot;A woman whistles&amp;quot; can now be parsed by combining subtrees from this corpus. For instance:  parse for &amp;quot;A woman whistles&amp;quot; Thus, a parse tree can have many derivations involving different corpus-subtrees. DOP estimates the probability of substituting a subtree t on a specific node as the probability of selecting t among all subtrees in the corpus that could be substituted on that node. This probability is equal to the number of occurrences of a subtree t, divided by the total number of occurrences of subtrees t' with the same root node label as t : P(t) = Itl/~t':root(e)=roo~(t) It'l&amp;quot; The probability of a derivation tl o ... o tn can be computed as the product of the probabilities of the subtrees this derivation consists of: P( tl o... o t,~) = rL P(ti). The probability of a parse tree is equal to 1Here t o u o v o w should be read as ((t o u) o v) o w. the probability that any of its distinct derivations is generated, which is the sum of the probabilities of all derivations of that parse tree. Let t~ be the i-th sub-tree in the derivation d that yields tree T, then the probability of T is given by: P(T) = ~d 1-Ii P(tid).</Paragraph>
    <Paragraph position="2"> The DOP method differs from other statistical approaches, such as Pereira and Schabes (1992), Black et al. (1993) and Briscoe (1994), in that it does not predefine or train a formal grammar; instead it takes subtrees directly from annotated sentences in a treebank with a probability proportional to the number of occurrences of these sub-trees in the treebank. Bod (1993b) shows that DOP can be implemented using context-free parsing techniques. To select the most probable parse, Bod (1993a) gives a Monte Carlo approximation algorithm. Sima'an (1995) gives an efficient polynomial algorithm for a sub-optimal solution.</Paragraph>
    <Paragraph position="3"> The model was tested on the Air Travel Information System (ATIS) corpus as analyzed in the Penn Treebank (Marcus et al. (1993)), achieving better test results than other stochastic grammars (cf. Bod (1996), Sima'an (1996a), Goodman (1996)). On Penn's Wall Street Journal corpus, the data-oriented processing approach has been tested by Sekine and Grishman (1995) and by Charniak (1996). Though Charniak only uses corpus-subtrees smaller than depth 2 (which in our experience constitutes a less-than-optimal version of the data-oriented processing method), he reports that it &amp;quot;outperforms all other non-word-based statistical parsers/grammars on this corpus&amp;quot;. For an overview of data-oriented language processing, we refer to (Bod and Scha, 1996).</Paragraph>
  </Section>
  <Section position="5" start_page="159" end_page="162" type="metho">
    <SectionTitle>
3 Data-Oriented Semantic Analysis
</SectionTitle>
    <Paragraph position="0"> To use the DOP method not just for syntactic analysis, but also for semantic interpretation, four steps must be taken:  1. decide on a formalism for representing the meanings of sentences and surface-constituents. 2. annotate the corpus-sentences and their surface-constituents with such semantic representations. null 3. establish a method for deriving the mean null ing representations associated with arbitrary corpus-subtrees and with compositions of such subtrees.</Paragraph>
    <Paragraph position="1"> 4. reconsider the probability calculations. We now discuss these four steps.</Paragraph>
    <Section position="1" start_page="159" end_page="160" type="sub_section">
      <SectionTitle>
3.1 Semantic formalism
</SectionTitle>
      <Paragraph position="0"> The decision about the representational formalism is to some extent arbitrary, as long as it has a well- null defined model-theory and is rich enough for representing the meanings of sentences and constituents that are relevant for the intended application domain. For our exposition in this paper we will use a wellknown standard formalism: extensional type theory (see Gamut (1991)), i.e., a higher-order logical language that combines lambda-abstraction with connectives and quantifiers. The first implemented system for data-oriented semantic interpretation, presented in Bonnema (1996), used a different logical language, however. And in many application contexts it probably makes sense to use an A.I.-style language which highlights domain structure (frames, slots, and fillers), while limiting the use of quantification and negation (see section 5).</Paragraph>
    </Section>
    <Section position="2" start_page="160" end_page="161" type="sub_section">
      <SectionTitle>
3.2 Semantic annotation
</SectionTitle>
      <Paragraph position="0"> We assume a corpus that is already syntactically annotated as before: with labelled trees that indicate surface constituent structure. Now the basic idea, taken from van den Berg et al. (1994), is to augment this syntactic annotation with a semantic one: to every meaningful syntactic node, we add a type-logical formula that expresses the meaning of the corresponding surface-constituent. H we would carry out this idea in a completely direct way, the toy corpus of Figure 1 might, for instance, turn into the toy corpus of Figure 5.</Paragraph>
      <Paragraph position="1"> Van den Berg et al. indicate how a corpus of this sort may be used for data-oriented semantic interpretation. Their algorithm, however, requires a procedure which can inspect the semantic formula of a node and determine the contribution of the semantics of a lower node, in order to be able to &amp;quot;factor out&amp;quot; that contribution. The details of this procedure have not been specified. However, van den Berg et ai. also propose a simpler annotation convention which avoids the need for this procedure, and which is computationally more effective: an annotation convention which indicates explicitly how the semantic formula for a node is built up on the basis of the semantic formulas of its daughter nodes.</Paragraph>
      <Paragraph position="2"> Using this convention, the semantic annotation of the corpus trees is indicated as follows: * For every meaningful lexical node a type logical formula is specified that represents its meaning.</Paragraph>
      <Paragraph position="3"> * For every meaningful non-lexical node a formula schema is specified which indicates how its meaning representation may be put together out of the formulas assigned to its daughter nodes.</Paragraph>
      <Paragraph position="4"> In the examples below, these schemata use the variable dl to indicate the meaning of the leftmost daughter constituent, d2 to indicate the meaning of the second daughter constituent, etc. Using this notation, the semantically annotated version of the toy corpus of Figure 1 is the toy corpus rendered in Figure 6. This kind of semantic annotation is what will be used in the construction of the corpora described in section 5 of this paper. It may be noted that the rather oblique description of the semantics of the higher nodes in the tree would easily lead to mistakes, if annotation would be carried out completely manually. An annotation tool that makes the expanded versions of the formulas visible for the annotator is obviously called for. Such a tool was developed by Bonnema (1996), it will be briefly described in section 5.</Paragraph>
      <Paragraph position="5">  This annotation convention obviously, assumes that the meaning representation of a surface-constituent can in fact always be composed out of the meaning representations of its subconstituents.</Paragraph>
      <Paragraph position="6"> This assumption is not unproblematic. To maintain it in the face of phenomena such as non-standard quantifier scope or discontinuous constituents creates complications in the syntactic or semantic analyses assigned to certain sentences and their constituents. It is therefore not clear yet whether our current treatment ought to be viewed as completely general, or whether a treatment in the vein of van den Berg et al. (1994) should be worked out.</Paragraph>
    </Section>
    <Section position="3" start_page="161" end_page="161" type="sub_section">
      <SectionTitle>
3.3 The meanings of subtrees and their
compositions
</SectionTitle>
      <Paragraph position="0"> As in the purely syntactic version of DOP, we now want to compute the probability of a (semantic) analysis by considering the most probable way in which it can be generated by combining subtrees from the corpus. We can do this in virtually the same way. The only novelty is a slight modification in the process by which a corpus tree is decomposed into subtrees, and a corresponding modification in the composition operation which combines subtrees.</Paragraph>
      <Paragraph position="1"> If we extract a subtree out of a tree, we replace the semantics of the new leaf node with a unification variable of the same type. Correspondingly, when the composition operation substitutes a subtree at this node, this unification variable is unified with the semantic formula on the substituting tree. (It is required that the semantic type of this formula matches the semantic type of the unification variable.) null A simple example will make this clear. First, let us consider what subtrees the corpus makes available now. As an example, Figure 7 shows one of the decompositions of the annotated corpus sentence &amp;quot;A man whistles&amp;quot;. We see that by decomposing the tree into two subtrees, the semantics at the breakpointnode N: man is replaced by a variable. Now an analysis for the sentence &amp;quot;A woman whistles&amp;quot; can, for instance, be generated in the way shown in Figure 8.</Paragraph>
    </Section>
    <Section position="4" start_page="161" end_page="162" type="sub_section">
      <SectionTitle>
3.4 The Statistical Model of Data-Oriented
Semantic Interpretation
</SectionTitle>
      <Paragraph position="0"> We now define the probability of an interpretation of an input string.</Paragraph>
      <Paragraph position="1"> Given a partially annotated corpus as defined above, the multiset of corpus subtrees consists of all subtrees with a well-defined top-node semantics, that are generated by applying to the trees of the corpus the decomposition mechanism described  above. The probability of substituting a subtree t on a specific node is the probability of selecting t among all subtrees in the multiset that could be substituted on that node. This probability is equal to the number of occurrences of a subtree t, divided by the total number of occurrences of subtrees t' with the same root node label as t:</Paragraph>
      <Paragraph position="3"> A derivation of a string is a tuple of subtrees, such that their composition results in a tree whose yield is the string. The probability of a derivation tl o... o tn is the product of the probabilities of these subtrees:</Paragraph>
      <Paragraph position="5"> A tree resulting from a derivation of a string is called a parse of this string. The probability of a parse is  the probability that any of its derivations occurs; this is the sum of the probabilities of all its derivations. Let rid be the i-th subtree in the derivation d that yields tree T, then the probability of T is given by:</Paragraph>
      <Paragraph position="7"> An interpretation of a string is a formula which is provably equivalent to the semantic annotation of the top node of a parse of this string. The probability of an interpretation I of a string is the sum of the probabilities of the parses of this string with a top node annotated with a formula that is provably equivalent to I. Let ti4p be the i-th subtree in the derivation d that yields parse p with interpretation I, then the probability of I is given by:</Paragraph>
      <Paragraph position="9"> We choose the most probable interpretation/.of a string s as the most appropriate interpretation of s.</Paragraph>
      <Paragraph position="10"> In Bonnema (1996) a semantic extension of the DOP parser of Sima'an (1996a) is given. But instead of computing the most likely interpretation of a string, it computes the interpretation of the most likely combination of semantically annotated subtrees. As was shown in Sima'an (1996b), the most likely interpretation of a string cannot be computed in deterministic polynomial time. It is not yet known how often the most likely interpretation and the interpretation of the most likely combination of semantically enriched subtrees do actually coincide.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="162" end_page="164" type="metho">
    <SectionTitle>
4 Implementations
</SectionTitle>
    <Paragraph position="0"> The first implementation of a semantic DOP-model yielded rather encouraging preliminary results on a semantically enriched part of the ATIS-corpus. Implementation details and experimental results can be found in Bonnema (1996), and Bod et al. (1996).</Paragraph>
    <Paragraph position="1"> We repeat the most important observations: * Data-oriented semantic interpretation seems to be robust; of the sentences that could be parsed, a significantly higher percentage received a correct semantic interpretation (88%), than an exactly correct syntactic analysis (62%).</Paragraph>
    <Paragraph position="2"> * The coverage of the parser was rather low (72%), because of the sheer number of different semantic types and constructs in the trees. * The parser was fast: on the average six times as fast as a parser trained on syntax alone.</Paragraph>
    <Paragraph position="3"> The current implementation is again an extension of Sima'an (1996a), by Bonnema 2. In our experiments, we notice a robustness and speed-up comparable to our experience with the previous implementation. Besides that, we observe higher accuracy, and higher coverage, due to a new method of organizing the information in the tree-bank before it is used for building the actual parser.</Paragraph>
    <Paragraph position="4"> A semantically enriched tree-bank will generally contain a wealth of detail. This makes it hard for a probabilistic model to estimate all parameters. In sections 4.1 and 4.2, we discuss a way of generalizing over semantic information in the tree-bank, be\]ore a DOP-parser is trained on the material. We automatically learn a simpler, less redundant representation of the same information. The method is employed in our current implementation.</Paragraph>
    <Section position="1" start_page="162" end_page="163" type="sub_section">
      <SectionTitle>
4.1 Simplifying the tree-bank
</SectionTitle>
      <Paragraph position="0"> A tree-bank annotated in the manner described above, consists of tree-structures with syntactic and semantic attributes at every node. The semantic attributes are rules that indicate how the meaning-representation of the expression dominated by that node is built-up out of its parts. Every instance of a semantic rule at a node has a semantic type associated with it. These types usually depend on the lexical instantiations of a syntactic-semantic structure. null If we decide to view subtrees as identical iff their syntactic structure, the semantic rule at each node, and the semantic type of each node is identical, any fine-grained type-system will cause a huge increase in different instantiations of subtrees. In the two tree-banks we tested on, there are many sub-trees that differ in semantic type, hut otherwise share the same syntactic/semantic structure. Disregarding the semantic types completely, on the other hand, will cause syntactic constraints to govern both syntactic substitution and semantic unification. The semantic types of constituents often give rise to differences in semantic structure. If this type information is not available during parsing, important clues will be missing, and loss of accuracy will result.</Paragraph>
      <Paragraph position="1"> Apparently, we do need some of the information present in the types of semantic expressions. Ignoring semantic types will result in loss of accuracy, but distinguishing all different semantic types will result in loss of coverage and generalizing power. With these observations in mind, we decided to group the types, and relax the constraints on semantic unification. In this approach, every semantic expression, 2With thanks to Khalil Sima'an for fruitful discussions, and for the use of his parser  and every variable, has a set of types associated with it. In our semantic DOP model, we modify the constraints on semantic unification as follows: A variable can be unified with an expression, if the intersection of their respective sets of types is not empty. The semantic types are classified into sets that can be distinguished on the basis of their behavior in the tree-bank. We let the tree-bank data decide which types can be grouped together, and which types should be distinguished. This way we can generalize over semantic types, and exploit relevant type-information in the parsing process at the same time. In learning the optimal grouping of types, we have two concerns: keeping the number of different sets of types to a minimum, and increasing the semantic determinacy of syntactic structures enhanced with type-information. We say that a subtree T, with type-information at every node, is semantically determinate, iff we can determine a unique, correct semantic rule for every CFG rule R 3 occurring in T.</Paragraph>
      <Paragraph position="2"> Semantic determinacy is very attractive from a computational point of view: if our processed tree-bank has semantic determinacy, we do not need to involve the semantic rules in the parsing process. Instead, the parser yields parses containing information regarding syntax and semantic types, and the actual semantic rules can be determined on the basis of that information. In the next section we will elaborate on how we learn the grouping of semantic types from the data.</Paragraph>
    </Section>
    <Section position="2" start_page="163" end_page="164" type="sub_section">
      <SectionTitle>
4.2 Classification of semantic types
</SectionTitle>
      <Paragraph position="0"> The algorithm presented in this section proceeds by grouping semantic types occurring with the same syntactic label into mutually exclusive sets, and assigning to every syntactic label an index that indicates to which set of types its corresponding semantic type belongs. It is an iterative, greedy algorithm. In every iteration a tuple, consisting of a syntactic category and a set of types, is selected. Distinguishing this tuple in the tree bank, leads to the greatest increase in semantic determinacy that could be found. Iteration continues until the increase in semantic determinacy is below a certain threshold.</Paragraph>
      <Paragraph position="1"> Before giving the algorithm, we need some definitions: null 3By &amp;quot;CFG rule&amp;quot;, we mean a subtree of depth 1, without a specified root-node semantics, but with the features relevant for substitution, i.e. syntactic category and semantic type. Since the subtree of depth 1 is the smallest structural building block of our DOP model, semantic determinacy of every CFG rule in a subtree, means the whole subtree is semantically determinate.</Paragraph>
      <Paragraph position="2"> tuplesO tuples(T) is the set of all pairs (c, s) in a tree-bank T, where c is a syntactic category, and s is the set of all semantic types that a constituent of category c in T can have.</Paragraph>
      <Paragraph position="3"> apply() if c is a category, s is a set of types, and T is a tree-bank then apply((c, s), T) yields a tree-bank T', by indexing each instance of category c in T, such that the c constituent is of semantic type t E s, with a unique index i.</Paragraph>
      <Paragraph position="4"> ambO if T is a tree-bank then arab(T) yields an n E N, such that n is the sum of the frequencies of all CFG rules R that occur in T with more than one corresponding semantic rule.</Paragraph>
      <Paragraph position="5"> The algorithm starts with a tree-bank To; in To, the cardinality of tuples(To) equals the number of different syntactic categories in To.</Paragraph>
      <Paragraph position="7"> 21sl is the powerset of s. In the implementation, a limit can be set to the cardinality of s' E 21sl, to avoid excessively long processing time. Obviously, the iteration will always end, if we require 5 to be &gt; 0. When the algorithm finishes, TO,... , Ti--1 contain the category/set-of-types pairs that took the largest steps towards semantic determinacy, and are therefore distinguished in the tree-bank. The semantic types not occurring in any of these pairs are grouped together, and treated as equivalent.</Paragraph>
      <Paragraph position="8"> Note that the algorithm cannot be guaranteed to achieve full semantic determinacy. The degree of semantic determinacy reached, depends on the consistency of annotation, annotation errors, the granularity of the type system, peculiarities of the language, in short: on the nature of the tree-bank. To force semantic determinacy, we assign a unique index to those rare instances of categories, i.e, left hand sides  of CFG-rules, that do not have any distinguishing features to account for their differing semantic rule.</Paragraph>
      <Paragraph position="9"> Now the resulting tree-bank embodies a function from CFG rules to semantic rules. We store this function in a table, and strip all semantic rules from the trees. As the experimental results in the next section show, using a tree-bank obtained in this way for data oriented semantic interpretation, results in high coverage, and good probability estimations.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML