XML Viewer - p90-1005

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/p90-1005_metho.xml
Size: 23,217 bytes
Last Modified: 2025-10-06 14:12:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="P90-1005">
  <Title>STRUCTURAL DISAMBIGUATION WITH CONSTRAINT PROPAGATION</Title>
  <Section position="3" start_page="0" end_page="31" type="metho">
    <SectionTitle>
1 INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> We are interested in an efficient treatment of structural ambiguity in natural language analysis. It is known that &amp;quot;every-way&amp;quot; ambiguous constructs, such as prepositional attachment in English, have a Catalan number of ambiguous parses (Church and Patil 1982), which grows at a faster than exponential rate (Knuth 1975). A parser should be provided with a disambiguation mechanism that does not involve generating such a combinatorial number of parse trees explicitly.</Paragraph>
    <Paragraph position="1"> We have developed a parsing method in which an intermediate parsing result is represented as a data structure called a constraint network. Every solution that satisfies all the constraints simultaneously corresponds to an individual parse tree. No explicit parse trees are generated until ultimately necessary. Parsing and successive disambiguation are performed by adding new constraints to the constraint network.</Paragraph>
    <Paragraph position="2"> Newly added constraints are efficiently propagated over the network by Constraint Propagation (Waltz 1975, Montanari 1976) to remove inconsistent values.</Paragraph>
    <Paragraph position="3"> In this paper, we present the basic ideas of a formal grammatical theory called Constraint Dependency Grammar (CDG for short) that makes this parsing technique possible. CDG has a reasonable time bound in its parsing, while its weak generative capacity is strictly greater than that of Context Free Grammar (CFG).</Paragraph>
    <Paragraph position="4"> We give the definition of CDG in the next section.</Paragraph>
    <Paragraph position="5"> Then, in Section 3, we describe the parsing method based on constraint propagation, using a step-by-step example. Formal properties of CDG are discussed in Section 4.</Paragraph>
  </Section>
  <Section position="4" start_page="31" end_page="31" type="metho">
    <SectionTitle>
2 CDG: DEFINITION
</SectionTitle>
    <Paragraph position="0"> Let a sentence s = wlw2 ... w,, be a finite string on a finite alphabet E. Let R -- {rl,r2,...,rk} be a finite set of role-iris. Suppose that each word i in a sentence s has k-different roles rl(i), r2(i) .... , rk(i). Roles are like variables, and each role can have a pair &lt;a, d&gt; as its value, where the label a is a member of a finite set L = {al,a2,...,at} and the modifiee d is either 1 &lt; d &lt; n or a special symbol nil. An analysis of the sentence s is obtained by assigning appropriate values to the n x k roles (we can regard this situation as one in which each word has a frame with k slots, as shown in Figure 1).</Paragraph>
    <Paragraph position="1"> An assignment A of a sentence s is a function that assigns values to the roles. Given an assignment A, the label and the modifiee of a role x are determined.</Paragraph>
    <Paragraph position="2"> We define the following four functions to represent the various aspect of the role x, assuming that x is an rj-role of the word i:  We also define word(i) as the terminal symbol occurring at the position i. 1 An individual grammar G =&lt; ~, R, L, C &gt; in the CDG theory determines a set of possible assignments of a given sentence, where</Paragraph>
    <Paragraph position="4"> satisfy.</Paragraph>
    <Paragraph position="5"> A constraint C is a logical formula in a form</Paragraph>
    <Paragraph position="7"> where the wHables Xl, x2, ..., xp range over the set of roles in an assignment A and each subformula P~ consists only of the following vocabulary:  * Variables: xl, x2, ..., xp * Constants: elements and subsets of</Paragraph>
    <Paragraph position="9"> * Function symbols: word(), posO, rid(), lab(), and modO lIn this paper, when referring to a word, we purposely use the position (1,2,...,n) of the word rather than the word itself (Wl,W2, ,--,Wn), because the same word can occur in many different positions in a sentence. For readability, however, we sometimes use the notation word~os~tion.</Paragraph>
    <Paragraph position="10"> * Predicate symbols: =, &lt;, &gt;, and E * Logical connectors: &amp;, l, &amp;quot;~, and Specifically, we call a subformula Pi a unary constraint when P.i contains only one variable, and a binary constraint when Pi contains exactly two variables. null The semantics of the functions have been defined above. The semantics of the predicates and the logical connectors are defined as usual, except that comparing an expression containing nil with another value by the inequality predicates always yields the truth value false.</Paragraph>
    <Paragraph position="11"> These conditions guarantee that, given an assignment A, it is possible to compute whether the values of xl, x2 .... , xp satisfy C in a constant time, regardless of the sentence length n.</Paragraph>
    <Section position="1" start_page="31" end_page="31" type="sub_section">
      <SectionTitle>
Definition
</SectionTitle>
      <Paragraph position="0"> ated iff there exits an assignment A that satisfies the constraint C.</Paragraph>
      <Paragraph position="1"> * L(G) is a language generated by the grammar G iff L(G) is the set of all sentences generated by a grammar G.</Paragraph>
      <Paragraph position="3"> The formula P1 of the constraint C1 is the conjunction of the following four subformulas (an informal description is attached to each constraint):</Paragraph>
      <Paragraph position="5"> &amp;quot;A determiner (D) modifies a noun (N) on the right with the label DET.&amp;quot; 32  &amp;quot;No two words can modify the same word with the same label.&amp;quot; Analyzing a sentence with G1 means assigning a label-modifiee pair to the only role &amp;quot;governor&amp;quot; of each word so that the assignment satisfies (GI-1) to (G1-4) simultaneously. For example, sentence (1) is analyzed as shown in Figure 2 provided that the words &amp;quot;a,&amp;quot; &amp;quot;dog,&amp;quot; and &amp;quot;runs&amp;quot; are given parts-of-speech D, N, and V, respectively (the subscript attached to the words indicates the position of the word in the sentence).</Paragraph>
      <Paragraph position="7"> Thus, sentence (1) is generated by the grammar G1. On the other hand, sentences (2) and (3) are not generated since there are no proper assignments for such sentences.</Paragraph>
      <Paragraph position="8">  (2) A runs.</Paragraph>
      <Paragraph position="9"> (3) Dog dog runs.</Paragraph>
      <Paragraph position="10">  We can graphically represent the parsing result of sentence (1) as shown in Figure 3 if we interpret the governor role of a word as a pointer to the syntactic governor of the word. Thus, the syntactic structure produced by a CDG is usually a dependency structure</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="31" end_page="36" type="metho">
    <SectionTitle>
3 PARSING WITH
CONSTRAINT PROPAGATION
</SectionTitle>
    <Paragraph position="0"> CDG parsing is done by assigning values to n x k roles, whose values are selected from a finite set L x {1,2,...,n, nil}. Therefore, CDG parsing can be viewed as a constraint satisfaction problem over a finite domain. Many interesting artificial intelligence problems, including graph coloring and scene labeling, are classified in this group of problems, and much effort has been spent on the development of efficient techniques to solve these problems. Constraint propagation (Waltz 1975, Montanari 1976), sometimes called filtering, is one such technique. One advantage of the filtering algorithm is that it allows new constraints to be added easily so that a better solution can be obtained when many candidates remain. Usually, CDG parsing is done in the following  three steps: 1. Form an initial constraint network using a &amp;quot;core&amp;quot; grammar.</Paragraph>
    <Paragraph position="1"> 2. Remove local inconsistencies by filtering.</Paragraph>
    <Paragraph position="2"> 3. If any ambiguity remains, add new constraints and go to Step 2.</Paragraph>
    <Paragraph position="3">  In this section, we will show, through a step-by-step example, that the filtering algorithms can be effectively used to narrow down the structural ambiguities of CDG parsing.</Paragraph>
    <Paragraph position="4"> The Example We use a PP-attachment example. Consider sentence (4). Because of the three consecutive prepositional phrases (PPs), this sentence has many structural ambiguities.</Paragraph>
    <Paragraph position="5"> (4) Put the block on the floor on the table in the room. 33 Pu._t the block on the floor on the table in the room V, NI~ PP3 PP4 PPs  One of the possible syntactic structures is shown in Figure 42 .</Paragraph>
    <Paragraph position="6"> To simplify tile following discussion, we treat the grammatical symbols V, NP, and PP as terminal symbols (words), since the analysis of the internal structures of such phrases is irrelevant to the point being made. The correspondence between such simplified dependency structures and the equivalent phrase structures should be clear. Formally, the input sentence that we will parse with CDG is (5).</Paragraph>
    <Paragraph position="7"> (5) V1 NP2 PP3 PP4 PP5 First, we consider a &amp;quot;core&amp;quot; grammar that contains purely syntactic rules only. We define a CDG</Paragraph>
    <Paragraph position="9"/>
    <Paragraph position="11"> &amp;quot;Modification links do not cross each other.&amp;quot; where the formula P2 is the conjunction of the following unary and binary constraints :</Paragraph>
    <Paragraph position="13"> &amp;quot;If a PP modifies a V, its label should be L0C/.&amp;quot; 2In linguistics, arrows are usually drawn in the opposite direction in a dependency diagram: from a governor (modifiee) to its dependent (modifier). In this paper, however, we draw an arrow from a modifier to its modifiee in order to emphasize that this information is contained in a modifier's role.</Paragraph>
    <Paragraph position="14"> According to the grammar G2a , sentence (5) has</Paragraph>
    <Paragraph position="16"> do not generate these syntactic structures one by one, since the number of the structures may grow more rapidly than exponentially when the sentence becomes long. Instead, we build a packed data structure, called a constraint network, that contains all the syntactic structures implicitly. Explicit parse trees can be generated whenever necessary, but it may take a more than exponential computation time.</Paragraph>
    <Paragraph position="17"> Formation of initial network Figure 5 shows the initial constraint network for sentence (5). A node in a constraint network corresponds to a role. Since each word has only one role governor in the grammar G2, the constraint network has five nodes corresponding to the five words in the  sentence. In the figure, the node labeled Vl represents the governor role of the word Vl, and so on. A node is associated with a set of possible values that the role can take as its value, called a domain. The domains of the initial constraint network are computed by examining unary constraints ((G2a-1) to (G2a-5) in our example). For example, the modifiee of the role of the word Vl must be ROOT and its label must be nil according to the unary constraint (G2a5), and therefore the domain of the corresponding node is a singleton set {&lt;R00T,nil&gt;). In the figure, values are abbreviated by concatenating the initial letter of the label and the modifiee, such as Rnil for &lt;R00T,nil&gt;, 01 for &lt;0BJ,I&gt;, and so on.</Paragraph>
    <Paragraph position="18"> An arc in a constraint network represents a binary constraint imposed on two roles. Each arc is associated with a two-dimensional matrix called a constraint matlqx, whose xy-elements are either 1 or 0. The rows and the columns correspond to the possible values of each of the two roles. The value 0 indicates that this particular combination of role values violates the binary constraints. A constraint matrix is calculated by generating every possible pair of values and by checking its validity according to the binary constraints. For example, the case in which governor(PP3) = &lt;LOC,I&gt; and governor(PP4) -- &lt;POSTMOD,2&gt; violates the binary constraint (G2a-6), so the L1-P2 element of the constraint matrix between PPs and PPa is set to zero.</Paragraph>
    <Paragraph position="19"> The reader should not confuse the undirected arcs in a constraint network with the directed modification links in a dependency diagram. An arc in a constraint network represents the existence of a binary constraint between two nodes, and has nothing to do with the modifier-modifiee relationships. The possible modification relationships are represented as the modifiee part of the domain values in a constraint network.</Paragraph>
    <Paragraph position="20"> A constraint network contains all the information needed to produce the parsing results. No grammatical knowledge is necessary to recover parse trees from a constraint network. A simple backtrack search can generate the 14 parse trees of sentence (5) from the constraint network shown in Figure 5 at any time. Therefore, we regard a constraint network as a packed representation of parsing results.</Paragraph>
    <Section position="1" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
Filtering
</SectionTitle>
      <Paragraph position="0"> A constraint network is said to be arc consistent if, for any constraint matrix, there are no rows and no columns that contain only zeros. A node value corresponding to such a row or a column cannot participate in any solution, so it can be abandoned without further checking. The filtering algorithm identifies such inconsistent values and removes them from the domains. Removing a value from one domain may make another value in another domain inconsistent, so the process is propagated over the network until the network becomes arc consistent.</Paragraph>
      <Paragraph position="1"> Filtering does not generate solutions, but may significantly reduce the search space. In our example, the constraint network shown in Figure 5 is already arc consistent, so nothing can be done by filtering at this point.</Paragraph>
    </Section>
    <Section position="2" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
Adding New Constraints
</SectionTitle>
      <Paragraph position="0"> To illustrate how we can add new constraints to narrow down the ambiguity, let us introduce additional constraints (G2b-1) and (G2b-2), assuming that appropriate syntactic and/or semantic features are attached to each word and that the function /e(i) is provided to access these features.</Paragraph>
      <Paragraph position="2"> &amp;quot;No verb can take two locatives.&amp;quot; Note that these constraints are not purely syntactic. Any kind of knowledge, syntactic, semantic, or even pragmatic, can be applied in CDG parsing as long as it is expressed as a unary or binary constraint on word-to-word modifications.</Paragraph>
      <Paragraph position="3"> Each value or pair of values is tested against the newly added constraints. In the network in Figure 5, the value P3 (i.e. &lt;POSTMOD,3&gt;) of the node PP4 (i.e.; &amp;quot;on the table (PP4)&amp;quot; modifies &amp;quot;on the floor (PP3)&amp;quot;) violates the constraint (G2b-1), so we remove P3 from the domain of PP4. Accordingly, corresponding rows and columns in the four constraint matrices adjacent to the node PP4 are removed. The binary constraint (G2b-2) affects the elements of the constraint matrices. For the matrix between the nodes PP3 and</Paragraph>
      <Paragraph position="5"> Since the sentence is still ambiguous, let us consider another constraint.</Paragraph>
      <Paragraph position="6"> (G2c-1) Iab(x)=POSTMOD, lab(y)=POSTMOD, mod(x)=mod(y), on e fe(po~(x)), on e fe(pos(y)) ~ x=y &amp;quot;No object can be on two distinct objects.&amp;quot; This sets the P2-P2 element of the matrix PP3-PP4 to zero. Filtering on this network again results in the network shown in Figure 8, which is unambiguous, since every node has a singleton domain. Recovering the dependency structure (the one in Figure 4) from this network is straightforward.</Paragraph>
    </Section>
    <Section position="3" start_page="35" end_page="36" type="sub_section">
      <SectionTitle>
Related Work
</SectionTitle>
      <Paragraph position="0"> PP4, the element in row L1 (&lt;LOC,I&gt;) and column L1 (&lt;LOC, 1&gt;) is set to zero, since both are modifications to Vl with the label LOC. Similarly, the L1-L1 elements of the matrices PP3-PP5 and PP4-PP5 are set to zero. The modified network is shown in Figure 6, where the updated elements are indicated by asterisks.</Paragraph>
      <Paragraph position="1"> Note that the network in Figure 6 is not arc consistent. For example, the L1 row of the matrix PP3-PP4 consists of all zero elements. The filtering algorithm identifies such locally inconsistent values and eliminates them until there are no more inconsistent values left. The resultant network is shown in Figure 7. This network implicitly represents the remaining four parses of sentence (5).</Paragraph>
      <Paragraph position="2"> Several researchers have proposed variant data structures for representing a set of syntactic structures. Chart (Kaplan 1973) and shared, packed forest (Tomita 1987) are packed data structures for context-free parsing. In these data structures, a substring that is recognized as a certain phrase is represented as a single edge or node regardless of how many different readings are possible for this phrase. Since the production rules are context free, it is unnecessary to check the internal structure of an edge when combining it with another edge to form a higher edge. However, this property is true only when the grammar is purely context-free. If one introduces context sensitivity by attaching augmentations and controlling the applicability of the production rules, different readings of the same string with  the same nonterminal symbol have to be represented by separate edges, and this may cause a combinatorial explosion.</Paragraph>
      <Paragraph position="3"> Seo and Simmons (1988) propose a data structure called a syntactic graph as a packed representation of context-free parsing. A syntactic graph is similar to a constraint network in the sense that it is dependency-oriented (nodes are words) and that an exclusion matrix is used to represent the co-occurrence conditions between modification links. A syntactic graph is, however, built after context-free parsing and is therefore used to represent only context-free parse trees. The formal descriptive power of syntactic graphs is not known. As will be discussed in Section 4, the formal descriptive power of CDG is strictly greater than that of CFG and hence, a constraint network can represent non-context-free parse trees as well.</Paragraph>
      <Paragraph position="4"> Sugimura et al. (1988) propose the use of a constraint logic program for analyzing modifier-modifiee relationships of Japanese. An arbitrary logical formula can be a constraint, and a constraint solver called CIL (Mukai 1985) is responsible for solving the constraints. The generative capacity and the computational complexity of this formalism are not clear. The above-mentioned works seem to have concentrated on the efficient representation of the output of a parsing process, and lacked the formalization of a structural disambiguation process, that is, they did not specify what kind of knowledge can be used in what way for structural disambiguation. In CDG parsing, any knowledge is applicable to a constraint network as long as it can be expressed as a constraint between two modifications, and an efficient filtering algorithm effectively uses it to reduce structural ambiguities. null</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="36" end_page="37" type="metho">
    <SectionTitle>
4 FORMAL PROPERTIES
</SectionTitle>
    <Paragraph position="0"> Weak Generative Capacity of CDG Consider the language Lww = {wwlw E (a+b)*}, the language of strings that are obtained by concatenating the same arbitrary string over an alphabet {a,b}. Lww is known to be non-context-free (Hopcroft and Ullman 1979), and is frequently mentioned when discussing the non-context-freeness of the &amp;quot;respectively&amp;quot; construct (e.g. &amp;quot;A, B, and C do D, E, and F, respectively&amp;quot;) of various natural languages (e.g., Savitch et al. 1987). Although there</Paragraph>
    <Paragraph position="2"> is no context-free grammar that generates Lww, the grammar Gww =&lt; E,L,R,C &gt; shown in Figure 9 generates it (Maruyama 1990). An assignment given to a sentence &amp;quot;aabaab&amp;quot; is shown in Figure 10. On the other hand, any context-free language can be generated by a degree=2 CDG. This can be proved by constructing a constraint dependency grammar GCDG from an arbitrary context-free grammar GCFG in Greibach Normal Form, and by showing that the two grammars generate exactly the same language. Since GcFc is in Greibach Normal Form, it is easy to make one-to-one correspondence between a word in a sentence and a rule application in a phrase-structure tree. The details of the proof are given in Maruyama (1990). This, combined with the fact that Gww generates Lww, means that the weak generative capacity of CDG with degree=2 is strictly greater than that of CFG.</Paragraph>
    <Paragraph position="3"> Computational complexity of CDG parsing Let us consider a constraint dependency grammar G =&lt; E, R, L, C &gt; with arity=2 and degree=k. Let n be the length of the input sentence. Consider the space complexity of the constraint network first. In CDG parsing, every word has k roles, so there are n x k nodes in total. A role can have n x l possible values, where l is the size of L, so the maximum domain size is n x l. Binary constraints may be imposed on arbitrary pairs of roles, and therefore the number of constraint matrices is at most proportional to (nk) 2. Since the size of a constraint matrix is (nl) 2, the total space complexity of the constraint network is O(12k~n4). Since k and l are grammatical constants, it is O(n 4) for the sentence length n.</Paragraph>
    <Paragraph position="4"> As the initial formation of a constraint network takes a computation time proportional to the size of the constraint network, the time complexity of the initial formation of a constraint network is O(n4).</Paragraph>
    <Paragraph position="5"> The complexity of adding new constraints to a constraint network never exceeds the complexity of the initial formation of a constraint network, so it is also bounded by O(n4).</Paragraph>
    <Paragraph position="6"> The most efficient filtering algorithm developed so far runs in O(ea 2,) time, where e is the number of arcs and a is the size of the domains in a constraint network (Mohr and Henderson 1986). Since the number of arcs is at most O((nk)2), filtering can be performed in O((nk)2(nl)2), which is O(n 4) without grammatical constants.</Paragraph>
    <Paragraph position="7"> Thus, in CDG parsing with arity 2, both the initial formation of a constraint network and filtering are bounded in O(n 4) time.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML