File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/p87-1033_metho.xml
Size: 12,245 bytes
Last Modified: 2025-10-06 14:12:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P87-1033"> <Title>A Unification Method for Disjunctive Feature Descriptions</Title> <Section position="4" start_page="236" end_page="238" type="metho"> <SectionTitle> 3 The Algorithm: Unification by Successive Approximation </SectionTitle> <Paragraph position="0"> In this section we will give a complete algorithm for unifying two feature.descriptions, where one or both may contain disjunction. This algorithm is designed so that it can be used as a relatively efficient approximation method, with an optional step to perform complete consistency checking when necessary.</Paragraph> <Paragraph position="1"> Given two feature-descriptions, the strategy of the unification algorithm is to unify the definite components of the descriptions first, and examine the compatibility of indefinite components later. Disjuncts are eliminated from the description when they are inconsistent with definite information. This strategy avoids exploring dlsjuncts more than once when they are inconsistent with definite information.</Paragraph> <Paragraph position="2"> The exact algorithm is described in Figure 3. It has three major steps.</Paragraph> <Paragraph position="3"> In the first step, the definite components of the two descriptions are unified together, producing a DG structure, new-def, which represents the definite information of the result. This step can be performed by existing unification algorithms for DGs.</Paragraph> <Paragraph position="4"> In the second step, the indefinite components of both descriptions are checked for compatibility with new-def, using the function CHECK-INDEF, which is defined in Figure 4.</Paragraph> <Paragraph position="5"> CHECK-IN\]DEF uses the function CHECK-DIS J, defined in Figure 5, to check the compatibility of each disjunction with the DG structure given by the parameter con& The compatibility of two DGs can be checked by almost the same procedure as unification, but the two structures being checked are not actually merged as they are in unification.</Paragraph> <Paragraph position="6"> In the third major step, if any disjunctions remain, and it is necessary to do so, disjuncts of different disjunctions are considered in groups, to check whether they are compatible together. This step is performed by the function NWISE-CONSISTENCY, defined in Figure 6.</Paragraph> <Paragraph position="7"> When the parameter r~ to NWISE,-CONSISTENCY has the value 1, then one disjunct is checked for compatibility with all other disjunctions of the description in a pairwise manner. The pairwise manner of checking compatibility can be generalized to groups of any size by increasing the value of the parameter n.</Paragraph> <Paragraph position="8"> While this third step of the algorithm is necessary in order to insure consistency of disjunctive descriptions, it is not necessary to use it every time a description is built during a parse. In practice, we find that the performance of the algorithm can be tuned by using this step only at strategic points during a parse, since it is the most inefficient step of the al- null Function CHECK-INDEF (desc, cond) Returns feature-description: where deac is a feature-description, and cond is a DG.</Paragraph> <Paragraph position="10"> where disj is a disjunction of feature-descriptions, and cond is a DG.</Paragraph> <Paragraph position="11"> Let new-disj = 0 (a set of feature-descriptions). For each disjunct in disj:</Paragraph> <Paragraph position="13"> otherwise: (keep this disjunction in result)</Paragraph> <Paragraph position="15"> gorithm. In our application, using the Earley chart parsing method, it has proved best to use NWISE-CONSISTENCY only when building descriptions for complete edges, but not when building descriptions for active edges.</Paragraph> <Paragraph position="16"> Note that two feature-descriptions do not become permanently linked when they are unified, unlike unification for DG stuctures. The result of unifying two descriptions is a new description, which is satisfied by the intersection of the sets of structures that satisfy the two given descriptions. The new descriptlon contains all the information that is contained in either of the given descriptions, subtracting any disjuncts which are no longer compatible.</Paragraph> </Section> <Section position="5" start_page="238" end_page="239" type="metho"> <SectionTitle> 4 An example </SectionTitle> <Paragraph position="0"> In order to illustrate the effect of each step of the algorithm, let us consider an example of unifying the description of a known constituent with the description of a portion of a grammar. This exemplifies the predominant type of structure building operation needed in a parsing program for Functional Unification Grammar. The example given here is deliberately simple, in order to illustrate how the algorithm works with a minimum amount of detail. It is not intended as an example of a linguistically motivated grammar.</Paragraph> <Paragraph position="1"> Let us trace what happens when the two descriptions of step 1 of the algorithm. The definite components of the two descriptions have been unified, and their indefinite components have been conjoined together.</Paragraph> <Paragraph position="2"> In step 2 of the algorithm each of the disjuncts of DESC.INDEFINITE is checked for compatibility with DESC.DEFINITE, using the function CHECK-IN'DEF. In this case, all disjuncts are compatible with the definite information, except for one; the disjunct of the third disjunction which contains the feature Number : Sing. This disjunct is eliminated, and the only remaining disjunct in the disjunction (i.e., the disjunct containing Number : PI) is unified with DESC.DEFINITE. The result after this step is shown in Figure 9. The four disjuncts that remain are numbered for convenience.</Paragraph> <Paragraph position="3"> In step 3, NWISE-CONSISTENCY is used with 1 as the value of the parameter n. A new description is hypothesized by unifying disjunct (1) with the definite component of the description (i.e., NEW-DESC.DEFINITE). Then disjuncts (3) and (4) are checked for compatibility with this hypothesized structure: (3) is not compatible, because the values of the Transitivity features do not unify. Disjunct (4) is also incompatible, because it has Goal : Person : 3, and the hy-</Paragraph> <Paragraph position="5"> pothesized description has ~< Sub\] >, < Goal >l, along with Sub\] : Person : 2. Therefore, since there is no compatible dlsjunct among (3) and (4), the hypothesis that (1) is compatible with the rest of the description has been shown to be invalid, and (1) can be eliminated. It follows that disjunct (2) should be unified with the definite part of the description. Now disjuncts (3) and (4) are checked for compatibility with the definite component of the new description: (3) is no longer compatible, but (4) is compatible. Therefore, (3) ls eliminated, and (4) is unified with the definite information.</Paragraph> <Paragraph position="6"> No disjunctions remain in the result, as shown in Figure 10.</Paragraph> </Section> <Section position="6" start_page="239" end_page="239" type="metho"> <SectionTitle> 5 Complexity of the Algorithm </SectionTitle> <Paragraph position="0"> Referring to Figure 3, note that the function LrNIFY=-DESC may terminate after any of the three major steps. After each step it may detect inconsistency between the two descriptions and terminate, returning failure, or it may terminate because no disjunctions remain in the descrlption. Therefore, it is useful to examine the complexity of each of the three steps independently.</Paragraph> <Paragraph position="1"> Let n represent the total number of symbols in the combined description f ^ g, and d represent the total number of disjuncts (in both top-level and embedded disjunctions) contained in f A g.</Paragraph> <Paragraph position="2"> Step I. This step performs the unification of two DG structures. Ait-Kaci \[11 has shown how this operation can be performed in almost linear time by the UNION/FIND algorithm. Its time complexity has an upper bound of O(n log n). Since an unknown amount of a description may be contained in the definite component, this step of the algorithm also requires O(n log n) time.</Paragraph> <Paragraph position="3"> Slop ~. For this step we examine the complexity of the function CHECK-INDEF. There are two nested loops in CHECK-INDEF, each of which may be executed at most once for each disjunct in the description. The inner loop checks the compatibility of two DG structures, which requires no more time than unification. Thus, in the worst case, CHECK-INDEF requires O(d2n log n) time.</Paragraph> <Paragraph position="4"> Step 8. NWISE-CONSISTENCY requires at most 0(2 ~/~) time. In this step, NWISE-CONSISTENCY is called at most (d/2) - 1 times. Therefore, the overall complexity of step 3 0(2&quot;/2). Discussion. While the worst case complexity of the entire algorithm i, 0(2~), an exponential, it is significant that it often terminates before step 3, even when a large number of dlsjunctlons are present in one of the descriptions. Thus, in many practical cases the actual cost of the algorithm is bounded by a polynomial that is at most d2n log n. Since must be less than n, this complexity function is almost cubic. Even when step 3 must be used, the number of remaining disjunctions is often much fewer than d/2, so the exponent i, usually a small number. The algorithm performs well in most cases, because the three steps are ordered in increasing complexity, and the number of disjunctions can only decrease during unification.</Paragraph> </Section> <Section position="7" start_page="239" end_page="241" type="metho"> <SectionTitle> 6 Implementation </SectionTitle> <Paragraph position="0"> The algorithm presented in the previous sections has been implemented and tested as part of a general parsing method for Systemic Functional Grammar, which is described in 13\]. The algorithm was integrated with the structure building module of the PATR-II system \[10\], written in the Zetalisp programming language.</Paragraph> <Paragraph position="1"> While the feature-description corresponding to a grammar may have hundreds of disjunctions, the descriptions that result from parsing a sentence usually have only a small number of disjunctions, if any at all. Most disjunctions in a systemic grammar represent possible alternative values that some particular feature may have (along with the grammatical consequences entailed by choosing particular values for the feature). In the analysis of a particular sentence most features have a unique value, and some features are not present at all.</Paragraph> <Paragraph position="2"> When disjunction remains in the description of a sentence after parsing, it usually represents ambiguity or an under-specified part of the grammar.</Paragraph> <Paragraph position="3"> With this implementation of the algorithm, sentences of up to I0 words have been parsed correctly, using a grammar which contains over 300 disjunctions. The time required for most sentences is in the range of 10 to 300 seconds, running on lisp machine hardware.</Paragraph> <Paragraph position="4"> The fact that sentences can be parsed at all with a grammar containing this many disjunctions indicates that the algorithm is performing much better than its theoretical worst case time of O(2d). 2 The timings, shown in Table 1, obtained from the experimental parser for systemic grammar also indicate that a dramatic increase in the number of disjunctions in the grammar does not result in an exponential increase in parse time. Gos is a grammar containing 98 disjunctions, 2Consider, 2300 ~ 2 sdeg, and 2 sdeg is taken to be a rough estimate of the number of particles in the universe.</Paragraph> <Paragraph position="5"> Sentence Gos G44o Nigel has been speaking English. 22.9 144&quot;.3 Nigel has been speaking English to me. 28.6 203.5 Table i: Timings obtained from a systemic parser.</Paragraph> <Paragraph position="6"> and G,,o is a grammar containing 440 disjunctions. The total time used to parse each sentence is given in seconds.</Paragraph> </Section> class="xml-element"></Paper>