File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/p92-1023_metho.xml

Size: 14,931 bytes

Last Modified: 2025-10-06 14:13:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1023">
  <Title>GPSM: A GENERALIZED PROBABILISTIC SEMANTIC MODEL FOR AMBIGUITY RESOLUTION</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
GPSM: A GENERALIZED PROBABILISTIC
SEMANTIC MODEL FOR AMBIGUITY RESOLUTION
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
*Behavior Design Corporation
</SectionTitle>
    <Paragraph position="0"> No. 28, 2F, R&amp;D Road II, Science-Based Industrial Park Hsinchu, TAIWAN 30077, R.O.C.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> In natural language processing, ambiguity resolution is a central issue, and can be regarded as a preference assignment problem. In this paper, a Generalized Probabilistic Semantic Model (GPSM) is proposed for preference computation. An effective semantic tagging procedure is proposed for tagging semantic features. A semantic score function is derived based on a score function, which integrates lexical, syntactic and semantic preference under a uniform formulation. The semantic score measure shows substantial improvement in structural disambiguation over a syntax-based approach.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="177" type="metho">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In a large natural language processing system, such as a machine translation system (MTS), ambiguity resolution is a critical problem. Various rule-based and probabilistic approaches had been proposed to resolve various kinds of ambiguity problems on a case-by-case basis.</Paragraph>
    <Paragraph position="1"> In rule-based systems, a large number of rules are used to specify linguistic constraints for resolving ambiguity. Any parse that violates the semantic constraints is regarded as ungrammatical and rejected. Unfortunately, because every &amp;quot;rule&amp;quot; tends to have exception and uncertainty, and ill-formedness has significant contribution to the error rate of a large practical system, such &amp;quot;hard rejection&amp;quot; approaches fail to deal with these situations. A better way is to find all possible interpretations and place emphases on preference, rather than weU-formedness (e.g., \[Wilks 83\].) However, most of the known approaches for giving preference depend heavily on heuristics such as counting the number of constraint satisfactions. Therefore, most such preference measures can not be objectively justified. Moreover, it is hard and cosily to acquire, verify and maintain the consistency of the large fine-grained rule base by hand.</Paragraph>
    <Paragraph position="2"> Probabilistic approaches greatly relieve the knowledge acquisition problem because they are usually trainable, consistent and easy to meet certain optimum criteria. They can also provide more objective preference measures for &amp;quot;soft rejection.&amp;quot; Hence, they are attractive for a large system. The current probabilistic approaches have a wide coverage including lexical analysis \[DeRose 88, Church 88\], syntactic analysis \[Garside 87, Fujisaki 89, Su 88, 89, 91b\], restricted semantic analysis \[Church 89, Liu 89, 90\], and experimental translation systems \[Brown 90\]. However, there is still no integrated approach for modeling the joint effects of lexical, syntactic and semantic information on preference evaluation.</Paragraph>
    <Paragraph position="3"> A generalized probabilistic semantic model (GPSM) will be proposed in this paper to overcome the above problems. In particular, an integrated formulation for lexical, syntactic and semantic knowledge will be used to derive the semantic score for semantic preference evaluation.</Paragraph>
    <Paragraph position="4"> Application of the model to structural disam- null biguation is investigated. Preliminary experiments show about 10%-14% improvement of the semantic score measure over a model that uses syntactic information only.</Paragraph>
  </Section>
  <Section position="5" start_page="177" end_page="177" type="metho">
    <SectionTitle>
2. Preference Assignment Using
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="177" end_page="177" type="sub_section">
      <SectionTitle>
Score Function
</SectionTitle>
      <Paragraph position="0"> In general, a particular semantic interpretation of a sentence can be characterized by a set of lexical categories (or parts of speech), a syntactic structure, and the semantic annotations associated with it. Among the-various interpretations of a sentence, the best choice should be the most probable semantic interpretation for the given input words.</Paragraph>
      <Paragraph position="1"> In other words, the interpretation that maximizes the following score function \[Su 88, 89, 91b\] or analysis score \[Chen 91\] is preferred: Score (Semi, Sgnj, Lexk, Words)</Paragraph>
      <Paragraph position="3"> where (Lex,, Synj, Semi) refers to the kth set of lexical categories, the jth syntactic structure and the ith set of semantic annotations for the input Words. The three component functions are referred to as semantic score (Ssem), syntactic score (Ssyn) and lexical score (Stex), respectively. The global preference measure will be referred to as compositional score or simply as score. In particular, the semantic score accounts for the semantic preference on a given set of lexical categories and a particular syntactic structure for the sentence.</Paragraph>
      <Paragraph position="4"> Various formulation for the lexical score and syntactic score had been studied extensively in our previous works \[Su 88, 89, 91b, Chiang 92\] and other literatures. Hence, we will concentrate on the formulation for semantic score.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="177" end_page="179" type="metho">
    <SectionTitle>
3. Semantic Tagging
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="177" end_page="179" type="sub_section">
      <SectionTitle>
Canonical Form of Semantic
Representation
</SectionTitle>
      <Paragraph position="0"> Given the formulation in Eqn. (1), first we will show how to extract the abstract objects (Semi, Synj, LexD from a semantic representation. In general, a particular interpretation of a sentence can be represented by an annotated syntax tree (AST), which is a syntax tree annotated with feature structures in the tree nodes. Figure 1 shows an example of AST. The annotated version of a node A is denoted as A = A \[fa\] in the figure, where fA is the feature structure associated with node A. Because an AST preserves both syntactic and semantic information, it can be converted to other deep structure representations easily. Therefore, without lose of generality, the AST representation will be used as the canonical form of semantic representation for preference evaluation. The techniques used here, of course, can be applied to other deep structure representations as well.</Paragraph>
      <Paragraph position="2"> (AST) and Phrase Levels (PL).</Paragraph>
      <Paragraph position="3"> The hierarchical AST can be represented by a set of phrase levels, such as L\] through L8 in Figure 1. Formally, a phrase level (PL) is a set of symbols corresponding to a sententialform of the sentence. The phrase levels in Figure 1 are derived from a sequence of rightmost derivations, which is commonly used in an LR parsing mechanism. For example, 1-,5 and L4 correspond to the rightmost derivation B F Ca ~+ B c3 c4. Note rm that the first phrase level L\] consists of all lexical categories cl ... cn of the terminal words (wl ...</Paragraph>
      <Paragraph position="4"> w,,). A phrase level with each symbol annotated with its feature structure is called an annotated phrase level (APL). The i-th APL is denoted as Fi. For example, L5 in Figure 1 has an annotated phrase level F5 = {B \[fB\], F \[fF\], c4 \[fc,\]} as its  counterpart, where fc, is the atomic feature of the lexical category c4, which comes from the lexical item of the 4th word w4. With the above notations, the score function can be re-formulated as follows: Score (Semi, Synj , Lexk, Words)</Paragraph>
      <Paragraph position="6"> where c\]&amp;quot; (a short form for {cl ... c,,}) is the kth set of lexical categories (Lexk), /-,1&amp;quot; ({L\] ...</Paragraph>
      <Paragraph position="7"> Lr,,}) is the jth syntactic structure (Synj), and rl m ({F1 ... Fro}) is the ith set of semantic annotations (Semi) for the input words wl&amp;quot; ({wl ... wn}). A good encoding scheme for the Fi's will allow us to take semantic information into account without using redundant information. Hence, we will show how to annotate a syntax tree so that various interpretations can be characterized differently.</Paragraph>
      <Paragraph position="8"> Semantic Tagging A popular linguistic approach to annotate a tree is to use a unification-based mechanism. However, many information irrelevant to disambiguation might be included. An effective encoding scheme should be simple yet can preserve most discrimination information for disambiguation. Such an encoding scheme can be accomplished by associating each phrase structure rule A --+ X1X2... XM with a head list (Xi,,Xi,...XiM). The head list is formed by arranging the children nodes (X1,X2,...,XM) in descending order of importance to the compositional semantics of their mother node A. For this reason, Xi~, Xi~ and Xi, are called the primary, secondary and the j-th heads of A, respectively.</Paragraph>
      <Paragraph position="9"> The compositional semantic features of the mother node A can be represented as an ordered list of the feature structures of its children, where the order is the same as in the head list. For example, for S ~ NP VP, we have a head list (VP, NP), because VP is the (primary) head of the sentence.</Paragraph>
      <Paragraph position="10"> When composing the compositional semantics of S, the features of VP and NP will be placed in the first and second slots of the feature structure of S, respectively.</Paragraph>
      <Paragraph position="11"> Because not all children and all features in a feature structure am equally significant for disambiguation, it is not really necessary to annotate a node with the feature structures of all its children. Instead, only the most important N children of a node is needed in characterizing the node, and only the most discriminative feature of a child is needed to be passed to its mother node.</Paragraph>
      <Paragraph position="12"> In other words, an N-dimensional feature vector, called a semantic N-tuple, could be used to characterize a node without losing much information for disambiguation. The first feature in the semantic N-tuple comes from the primary head, and is thus called the head feature of the semantic Ntuple. The other features come from the other children in the order of the head list. (Compare these notions with the linguistic sense of head and head feature.) An annotated node can thus be approximated as A ,~ A(fl,f2,... ,fN), where fj = HeadFeature X~7~,~) is the (primary) head feature of its j-th head (i.e., Xij) in the head list. Non-head features of a child node Xij will not be percolated up to its mother node. The head feature of ~ itself, in this case, is fx. For a terminal node, the head feature will be the semantic tag of the corresponding lexical item; other features in the N-tuple will be tagged as ~b (NULL).</Paragraph>
      <Paragraph position="13"> Figure 2 shows two possible annotated syntax trees for the sentence &amp;quot;... saw the boy in the park.&amp;quot; For instance, the &amp;quot;loc(ation)&amp;quot; feature of &amp;quot;park&amp;quot; is percolated to its mother NP node as the head feature; it then serves as the secondary head feature of its grandmother node PP, because the NP node is the secondary head of PP. Similarly, the VP node in the left tree is annotated as VP(sta,anim) according to its primary head saw(sta,q~) and secondary head NP(anim,in).</Paragraph>
      <Paragraph position="14"> The VP(sta,in) node in the fight tree is tagged differently, which reflects different attachment preference of the prepositional phrase.</Paragraph>
      <Paragraph position="15"> By this simple mechanism, the major characteristics of the children, namely the head features, can be percolated to higher syntactic levels, and  sta: stative verb S S def.&amp;quot; definite article .~.~~~. loc: location anim: animate</Paragraph>
      <Paragraph position="17"> their correlation and dependency can be taken into account in preference evaluation even if they are far apart. In this way, different interpretations will be tagged differently. The preference on a particular interpretation can thus be evaluated from the distribution of the annotated syntax trees. Based on the above semantic tagging scheme, a semantic score will be proposed to evaluate the semantic preference on various interpretations for a sentence. Its performance improvement over syntactic score \[Su 88, 89, 91b\] will be investigated.</Paragraph>
      <Paragraph position="18"> Consequently, a brief review of the syntactic score evaluation method is given before going into details of the semantic score model. (See the cited references for details.)</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="179" end_page="180" type="metho">
    <SectionTitle>
4. Syntactic Score
</SectionTitle>
    <Paragraph position="0"> According to Eqn. (2), the syntactic score can be formulated as follows \[Su 88, 89, 91b\]:</Paragraph>
    <Paragraph position="2"> where at, fit are the left context and right context under which the derivation At =~ X1X2... XM occurs. (Assume that Lt = {at, At,fit} and</Paragraph>
    <Paragraph position="4"> symbols in al and R right context symbols in fit are consulted to evaluate the syntactic score, it is said to operate in LLRR mode of operation. When the context is ignored, such an LoRo mode of operation reduces to a stochastic context-free grammar.</Paragraph>
    <Paragraph position="5"> To avoid the normalization problem \[Su 91b\] arisen from different number of transition probabilities for different syntax trees, an alternative formulation of the syntactic score is to evaluate the transition probabilities between configuration changes of the parser. For instance, the configuration of an LR parser is defined by its stack contents and input buffer. For the AST in Figure 1, the parser configurations after the read of cl, c2, c3, c4 and $ (end-of-sentence) are equivalent to L1, L2, L4, 1-.5 and Ls, respectively. Therefore, the syntactic score can be approximated as \[Su</Paragraph>
    <Paragraph position="7"> In this way, the number of transition probabilities in the syntactic scores of all AST's will be kept the same as the sentence length.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML