File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/p91-1033_metho.xml
Size: 25,534 bytes
Last Modified: 2025-10-06 14:12:48
<?xml version="1.0" standalone="yes"?> <Paper uid="P91-1033"> <Title>FEATURE LOGIC WITH WEAK SUBSUMPTION CONSTRAINTS</Title> <Section position="3" start_page="0" end_page="257" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Many of the current constralnt-based grammar formalisms, as e.g. FUG \[Kay 79, Kay 85\], LFG \[Kaplan/Bresnan 82\], HPSG \[Pollard/Sag 87\], PATR-II \[Shieber et al. 83\] and its derivates, model linguistic knowledge in recursive feature structures. Feature (or functional) equations, as in LFG, or feature terms, as in FUG or STUF \[Bouma et al. 88\], are used as constraints to describe declaratively what properties should be assigned to a linguistic entity. In the last few years, the study of the forreal semantics and formal properties of logics involving such constraints has made substantial progress \[Kasper/Rounds 86, Johnson 87, Smolka 88, Smolka 89\], e.g., by making precise which sublanguages of predicate logic it corresponds to. This paves the way not only for reliable implementations of these formalisms, but also for extensions of the basic logic with a precisely defined meaning. The extension we present here, weak subsumption constraints, is a mechanism of one-way information flow, often proposed for a logical treatment of coordination in a feature-based unification grammar. 1 It can I Another application would be type inference in a grammar formalism (or programming language) that be informally described as a device, which enables us to require that one part of a (solution) feature structure has to be subsumed (be an instance of) another part.</Paragraph> <Paragraph position="1"> Consider the following example of a coordination with &quot;and&quot;, taken from \[Shieber 89\]. (1) Pat hired \[tcP a Republican\] and \[NP a banker\].</Paragraph> <Paragraph position="2"> (2) *Pat hired \[NP a Republican\] and lAP proud of it\].</Paragraph> <Paragraph position="3"> Clearly (2) is ungrammatical since the verb &quot;hire&quot; requires a noun phrase as object complement and this requirement has to be fulfilled by both coordinated complements. This subcategorization requirement is modeled in a unification-based grammar generaUy using equations which cause the features of a complement (or parts thereof encoding the type) to get unified with features encoding the requirements of the respective position in the subcategorization frame of the verb. Thus we could assume that for a coordination the type-encoding features of each element have to be &quot;unified into&quot; the respective position in the subcategorisation frame. This entails that the coordinated elements are taken to be of one single type, which then can be viewed as the type of the whole coordination. This approach works fine for the verb &quot;hire&quot;, but certain verbs, used very frequently, do not require this strict identity.</Paragraph> <Paragraph position="4"> (3) Pat has become \[NP a banker\] and \[AP very conservative\].</Paragraph> <Paragraph position="5"> (4) Pat is lAP healthy\] and \[pp of sound mind\].</Paragraph> <Paragraph position="6"> The verb &quot;become&quot; may have either noun-phrase or adjective-phrase complements, &quot;to be&quot; Mlows prepositional and verb phrases in addition, and these may appear intermixed in a coordination. In order to allow for such &quot;polymorphic&quot; type requirements, we want to l~e~ a-type discipline with polymorphic types.</Paragraph> <Paragraph position="7"> state, that (the types of) coordinated arguments each should be an instance of the respective requirement from the verb. Expressed in a general rule for (constituent) coordination, we want the structures of coordinated phrases to be instances of the structure of the coordination. Using subsumption constraints the rule basically looks like this:</Paragraph> <Paragraph position="9"> With an encoding of the types like the one proposed in HPSG we can model the subcategorisation requirements for&quot;to be&quot; and &quot;to become&quot; as generalizations of all allowed types (cf.</Paragraph> <Paragraph position="10"> Fig. 1).</Paragraph> <Paragraph position="11"> i n: \] \] NP= v: - AP= v: + bar: 2 bar: 2 VPffi v: + PP= v: bar: 2 bar: 2 A similar treatment of constituent coordination has been proposed in \[Kaplan/Maxwell 88\], where the coordinated elements are required to be in a set of feature structures and where the feature structure of the whole set is defined as the generalisation (greatest lower bound w.r.t. subsumption) of its elements. This entails the requirement stated above, namely that the structure of the coordination subsumes those of its elements. In fact, it seems that especially in the context of set-valued feature structures (cf. \[Rounds 88\]) we need some method of inheritance of constraints, since if we want to state general combination rules which apply to the set-valued objects as well, we would like constraints imposed on them to affect also their members in a principled way.</Paragraph> <Paragraph position="12"> Now, recently it turned out that a feature logic involving subsumption constraints, which are based on the generally adopted notion of subsumption for feature graphs is undecidable (cf. \[D rre/Rounds 90\]). In the present paper we therefore investigate a weaker notion of subsumption, which we can roughly characterize as relaxing the constraint that an instance of a feature graph contains all of its path equivalencies. Observe, that path equivalencies play no role in the subcategorisation requirements in our examples above ...... ~</Paragraph> </Section> <Section position="4" start_page="257" end_page="258" type="metho"> <SectionTitle> 2 Feature Algebras </SectionTitle> <Paragraph position="0"> In this section we define the basic structures which are possible interpretations of feature descriptions, the expressions of our feature logic.</Paragraph> <Paragraph position="1"> Instead of restricting ourselves to a specific interpretation, like in \[Kasper/Rounds 86\] where feature structures are defined as a special kind of finite automata, we employ an open-world semantics as in predicate logic. We adopt most of the basic definitions from \[Smolka 89\]. The mathematical structures which serve us as interpretations are called feature algebras.</Paragraph> <Paragraph position="2"> We begin by assuming the pairwise disjoint sets of symbols L, A and V, called the sets of features (or labels), atoms (or constants) and variables, respectively. Generally we use the letters /,g, h for features, a, b, c for atoms, and z, ~, z for variables. The letters s and t always denote variables or atoms. We assume that there are infinitely many variables.</Paragraph> <Paragraph position="3"> A feature algebra .A is a pair (D ~4, ..4) consisting of a nonempty set D ~t (the domain of.4) and an interpretation .~ defined on L and A such that * a ~4 E D &quot;4 for a E A. (atoms are constants) * Ifa ~ b then a &quot;4 ~ b ~4. (unique name assumption) null * If f is a feature then/~4 is a unary partial function on D ~4. (features are functional) * No feature is defined on an atom.</Paragraph> <Paragraph position="4"> Notation. We write function symbols on the right following the notation for record fields in computer languages, so that f(d) is written dr. If f is defined at d, we write d.f ~, and otherwise d/ T. We use p,q,r to denote strings of features, called paths. The interpretation function .Jr is straightforwardly extended to paths: for the empty path e, ~.4 is the identity on D~4; for a path p = fl ... f-, p~4 is the unary partial function which is the composition of the filnctions fi&quot;4.., f.4, where .fl &quot;4 is applied first. A feature algebra of special interest is the Feature Graph Algebra yr since it is canonical in the sense that whenever there exists a solution for a formula in basic feature logic in some feature algebra then there is also one in the Feature Graph Algebra. The same holds if we extend our logic to subsumption constraints (see ~DSrre/Rounds 90\]). A feature graph is a rooted and connected directed graph. The nodes are either variables or atoms, where atoms may appear only as terminal nodes. The edges are labeled with features and for every node no two outgoing edges may be labeled with the same feature.</Paragraph> <Paragraph position="5"> We formalize feature graphs as pairs (s0, E) where So E VUA is the root and E C V x L x (V U A) is a set of triples, the edges. The following conditions hold: 1. If s0EA, thenE=0.</Paragraph> <Paragraph position="6"> 2. If (z, f, s) and (z, f, t) are in E, then s : t. 3. If (z, f, 8) is in E, then E contains edges leading from the root s0 to the node z.</Paragraph> <Paragraph position="7"> Let G - (z0, E) be a feature graph containing an edge (z0, f, s). The subgraph under f of G (written G/f) is the maximal graph (s, E') such that E t C E.</Paragraph> <Paragraph position="8"> Now it is clear how the Feature Graph Algebra ~&quot; is to be defined. D ~r is the set of all feature graphs. The interpretation of an atom a ~r is the feature graph (a, ~), and for a feature f we let G.f 7~ = G/.f, if this is defined. It is easy to verify that ~r is a feature algebra.</Paragraph> <Paragraph position="9"> Feature graphs are normally seen as data objects containing information. From this view-point there exists a natural preorder, called subsumptlon preorder, that orders feature graphs according to their informational content thereby abstracting away from variable names. We do not introduce subsumption on feature graphs here directly, but instead we define a subsumption order on feature algebras in general. Let .A and B be feature algebras. A simulation between .A and B is a relation A C D ~4 x D v satisfying the following conditions: 1. if (a ~4, d) E A then d = a B, for each atom a, and 2. for any d E D~,e E D B and f E L: if df A ~ and (d,e) E A, then ef B ~ and (dr ~4, ef B) E A.</Paragraph> <Paragraph position="10"> Notice that the union of two simulations and the transitive closure of a simulation are also simulations.</Paragraph> <Paragraph position="11"> A partial homomorphlsm &quot;y between .A and B is a simulation between the two which is a partial function. If.A = B we also call T a partial endomorphism.</Paragraph> <Paragraph position="12"> Definition. Let .A be a feature algebra. The (strong) subsumption preorder ff_A and the weak subsumption preorder ~4 of ~4 are defined as follows: * d (strongly) subsumes e (written d E ~4 e) iff there is an endomorphism &quot;y such that = e.</Paragraph> <Paragraph position="13"> * d wealcly subsumes e (written d ~4 e) iff there is a simulation A such that dAe.</Paragraph> <Paragraph position="14"> It can be shown (see \[Smolka 89\]) that the subsumption preorder of the feature graph algebra coincides with the subsumption order usually defined on feature graphs, e.g. in \[Kasper/Rounds 86\].</Paragraph> <Paragraph position="15"> Example: Consider the feature algebra depicted in Fig. 2, which consists of the elements {1, 2, 3, 4, 5, a, b) where a and b shall be (the pictures of) atoms and f, g, i and j shall be features whose interpretations are as indicated.</Paragraph> <Paragraph position="16"> Now, element 1 does not strongly subsume 3, since for 3 it does not hold, that its f-value equals its g-value. However, the simulation A demonstrates that they stand in the weak subsumption relation: 1 ~ 3.</Paragraph> </Section> <Section position="5" start_page="258" end_page="258" type="metho"> <SectionTitle> 3 Constraints </SectionTitle> <Paragraph position="0"> To describe feature algebras we use a relational language similar to the language of feature descriptions in LFG or path equations in PATR-II. Our syntax of constraints shall allow for the forms zp &quot;---- ~q, zp &quot;---- a, zp ~ ~q where p and q are paths (possibly empty), a E A, and z and ~/are variables. A feature clause is a finite set of constraints of the above forms. As usual we interpret constraints with respect to a variable assignment, in order to make sure that variables are interpreted uniformly in the whole set. An assignment is a mapping ~ of variables to the elements of some feature algebra. A constraint ~ is satisfied in .,4 under assignment a, written (A, a) ~ ~, as follows: (.,4, a) ~ zp - vq iff a(z)p A = a(v)q A (.4, a) ~ zp -- a aft a(z)p A if (v)qA.</Paragraph> <Paragraph position="1"> The solutions of a clause C in a feature algebra .4 are those assignments which satisfy each constraint in C. Two clauses C1 and C2 are equivalent iff they have the same set of solutions in every feature algebra .A.</Paragraph> <Paragraph position="2"> The problem we want to consider is the following: null Given a clause C with symbols from V, L and A, does C have a solution in some feature algebra? We call this problem the weak semiunification problem in feature algebras)</Paragraph> </Section> <Section position="6" start_page="258" end_page="261" type="metho"> <SectionTitle> 4 An Algorithm </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="258" end_page="259" type="sub_section"> <SectionTitle> 4.1 Presolved Form </SectionTitle> <Paragraph position="0"> We give a solution algorithm for feature clauses based on normalization, i.e. the goal is to define a normal form which exhibits unsatisfiability and rewrite rules which transform each feature clause into normal form. The normal form we present here actually is only half the way to a solution, but we show below that with the use of a standard algorithm solutions can be generated from it.</Paragraph> <Paragraph position="1"> First we introduce the restricted syntax of the normal form. Clauses containing only constraints of the following forms are called simple: null zf --y, z--s, z ~ y where s is either a variable or an atom. Each feature clause can be restated in linear time as an equisatisfiable simple feature clause whose solutions are extensions of the solutions of the original clause, through the introduction of auxiliary variables. This step is trivial.</Paragraph> <Paragraph position="2"> A feature clause C is called presolved iff it is simple and satisfies the following conditions.</Paragraph> <Paragraph position="3"> ~The anMogous problem for (strong) subsumption constraints is undecidable, even if we restrict ourselves to finite feature algebras. Actually, this problem could be shown to be equivalent to the semiunification prob- lem for rational trees, i.e. first-order terms which may contain cycles. The interested reader is referred to \[D~rre/Rounds 90\].</Paragraph> <Paragraph position="4"> C, then z' ~ V' is in C (downward propagation closure).</Paragraph> <Paragraph position="5"> In the first step our algorithm attempts to transform feature clauses to presolved form, thereby solving the equational part. In the simplification rules (cf. Fig. 3) we have adapted some of Smolka's rules for feature clauses including complements \[Smolka 89\]. In the rules \[z/s\]C denotes the clause C where every occurrence of z has been replaced with s, and ~ & C denotes the feature clause {~} U C provided ~b ~ C.</Paragraph> <Paragraph position="6"> Theorem 1 Let C be a simple feature clause.</Paragraph> <Paragraph position="7"> Then I. if C can be rewritten to 19 using one of the rules, then 1) i8 a simple feature clause equivalent to C, f. for every non-normal simple feature clause one of the rewrite rules applies, 3. there is no infinite chain C --* U1 --* C2 --, ProoL 3 The first part can be verified straight-forwardly by inspecting the rules. The same holds for the second part. To show the termination claim first observe that the application of the last two rules can safely be postponed until no one of the others can apply any more, since they only introduce subsumption constraints, which cannot feed the other rules. Now, call a variable z isolated in a clause C, if C contains an equation z - 7/and z occurs exactly once in C. The first rule strictly increases the number of isolated variables and no rule ever decreases it. Application of the second and third rule decrease the number of equational constraints or the number of features appearing in C, which no other rule increase. Finally, the last two rules strictly increase the number of subsumption constraints for a constant set of variables. Hence, no infinite chain of rewriting steps may be produced. \[\] We will show now, that the presolved form can be seen as a nondeterministic finite automaton ~Part of this proof has been directly adapted from \[S molka 89\].</Paragraph> <Paragraph position="8"> from its deterministic equivalent, if that is of a special, trivially verifiable, form, called clashbee. null</Paragraph> </Section> <Section position="2" start_page="259" end_page="260" type="sub_section"> <SectionTitle> 4.2 The Transition Relation 6c of a Presolved Clause C </SectionTitle> <Paragraph position="0"> The intuition behind this construction is, that subsumption constraints basically enfoice that information about one variable (and the space teachable hom it) has to be inherited by (copied to) another variable. For example the constraints z H y and zp - a entail that also lip - a has to hold. 4 Now, if we have a constraint z ~ T/, we could think of actually copying the information found under z to y, e.g. zf - z ~ would be copied to 1/f - 1/t, where 1/I is a new variable, and z I would be linked to yl by z p ~ ?/.</Paragraph> <Paragraph position="1"> However, this treatment is hard to control in the presence of cycles, which always can occur. Instead of actually copying we also can regard a constraint z g 7/as a pointer C/rom ~ back to z leading us to the information which is needed to construct the local solution of ~. To extend this view we regard the whole p~esolved chase C as a finite automaton: take variables and atoms as nodes, a feature constraint as an arc labeled with the feature, constraints z - s and 1/~ z as e-moves horn z to s or ~/. We can show then that C is unsatisfiable iff there is some z hom which we reach atom a via path p such that we can also reach b(~ a) via p or there is a path starting from z whose proper prefix is p.</Paragraph> <Paragraph position="2"> Formally, let NFA Arc of presolved clause C be ~F~rora this point of view the difference between weak and strong subsumption can be captured in the type of information they enforce to be inherited. Strong subsumption requires path equivalences to be inherited (x ~ y and ~p -&quot; zq implies yp - yq), whereas weak subsumption does not.</Paragraph> <Paragraph position="3"> defined as follows. Its states are the variables occurring in C (Vc) plus the atoms plus the states qF and the initial state q0. The set of final states is Vc U {qp}. The alphabet of Arc is vcu z, u A u {e}. 5 The transition relation is defined as follows: s</Paragraph> <Paragraph position="5"> As usual, let ~c be the extension of 6c to paths.</Paragraph> <Paragraph position="6"> Notice that zpa E L(Afc) iff (z,p,a) E ~c.</Paragraph> <Paragraph position="7"> The language accepted by this automaton contains strings of the forms zp or zpa, where a string zp indicates that in a solution a the object ol(z)p ~t should be defined and zpa tells us further that this object should be a A.</Paragraph> <Paragraph position="8"> A set of strings of (V x L*) U (V x L* x A) is called clash-free iff it does not contain a string zpa together with zpb (where a ~ b) or together with zpf. It is clear that the property of a regular language L of being dash-free with respect to L and A can be read off immediately from a DFA D for it: if D contains a state q with 5(q, a) E F and either 6(q, b) E F (where a ~ b) or 6(q, f) E F, then it is not clash-free, otherwise it is.</Paragraph> <Paragraph position="9"> We now present our centrM theorem.</Paragraph> <Paragraph position="10"> Theorem 2 Let Co be a feature clause, C its presolved form and Arc the NFA as constructed sir L or A are infinite we restrict ourselves to the sets of symbols actually occurring in C.</Paragraph> <Paragraph position="11"> 6Notice that if x - s E C, then either s is an atom or occurs only once. Thus it is pointless to have an arc fr,)m s to ~, since we either have already the maximum of information for s or ~ will not provide any new arcs. above. Then the following conditions are equivalent: null i. L(Are) is cZash- ,ee YL There exists a finite feature algebra .A and an assignment c~ such that (.A,c~) ~ Co, provided the set of atoms is finite.</Paragraph> <Paragraph position="12"> 3. There exists a feature algebra .4 and an as8ignraent ol such that (.A, c~) ~ Co.</Paragraph> <Paragraph position="13"> Proof. see Appendix A.</Paragraph> <Paragraph position="14"> Now the algorithm consists of the following simple or well-understood steps: 1: (a) Solve the equationai constraints of C, which can be done using standard unification methods, exemplified by rules 1) to 3).</Paragraph> <Paragraph position="15"> (b) Make the set of weak subsumption constraints transitively and &quot;downward&quot; closed (rules 4) and 5)). 2: The result interpreted as an NFA is made deterministic using standard methods and tested of being clash-free.</Paragraph> </Section> <Section position="3" start_page="260" end_page="260" type="sub_section"> <SectionTitle> 4.3 Determining Clash-Freeness Di- </SectionTitle> <Paragraph position="0"> rectly For the purpose of proving the algorithm correct it was easiest to assume that clash-freeness is determined after transforming the NFA of the presolved form into a deterministic automaton. However, this translation step has a time complexity which is exponential with the number of states in the worst case. In this section\[A we consider a technique to determine clash-freeness directly from the NFA representation of the pre-solved form in polynomial time. We do not go into implementational details, though. Instead we are concerned to describe the different steps more from a logical point of view. It can be assumed that there is still room left for optimizations which improve ef\[iciency.</Paragraph> <Paragraph position="1"> In a first step we eliminate all the e-transitions from the NFA Arc- We will call the result still Arc. For every pair of a variable node z and an atom node a let Arc\[z,a\] be the (sub-)automaton of all states of Arc reachable horn z, but with the atom a being the only final state. Thus, Afc\[z,g\] accepts exactly the language of all strings p for which zpg E L(Arc). Likewise, let Afc\[z,~\] be the (sub-)automaton of all states olaf C reachable from z, but where every atom node besides a is in the set of final states as well as every node with an outgoing feature arc. The set accepted by this machine contains every string p such that zpb E L(ArC), (b ~ a) or zpf E L(Arc). If and only if the intersection of these two machines is empty for every z and a, L(Arc) is clash-free.</Paragraph> </Section> <Section position="4" start_page="260" end_page="261" type="sub_section"> <SectionTitle> 4.4 Complexity </SectionTitle> <Paragraph position="0"> Let us now examine the complexity of the different steps of the algorithm.</Paragraph> <Paragraph position="1"> We know that Part la) can be done (using the efficient union/find technique to maintain equivalence classes of variables and vectors of features for each representative) in nearly linear time, the result being smaller or of equal size than Co. Part lb) may blow up the clause to a size at most quadratic with the number of different variables n, since we cannot have more subsumption constraints than this. For every new subsumption constraint, trying to apply ruh 4) might involve at most 2n membership test to check whether we are actually adding a new constraint, whereas for rule 5) this number only depends on the size of L. Hence, we stay within cubic time until here.</Paragraph> <Paragraph position="2"> Determining whether the presolved form is dash-free from the NPA representation is done in three steps. The e-free representation of Arc does not increase the number of states. If n,a and l are the numbers of variables, atoms and features resp. in the initial clause, then the number of edges is in any case smaller than (n + a) ~ * l, since there are only n + a states. This computation can be performed in time of an order less than o((~z + a)3).</Paragraph> <Paragraph position="3"> Second, we have to build the intersections for Arc\[z,a\] and Arc\[z,g\] for every z and a. Intersection of two NFAs is done by building a cross-product machine, requiring maximally o((~z + a) 4 * l) time and space. C/ The test for emptiness of these intersection machines is again trivial and can be performed in constant time.</Paragraph> <Paragraph position="4"> Hence, we estimate a total time and space complexity of order n- a. (Tz + a) 4 * I.</Paragraph> <Paragraph position="5"> 7This is an estimate for the number of edges, since the nmuber of states is below (n + a) 2. As usual, we assume appropriate data structures where we can neglect the order of access times. Probably the space (and time) complexity can be reduced hrther, since we actually do not need the representations of the intersection machines besides for testing, whether they can accept anything.</Paragraph> </Section> </Section> class="xml-element"></Paper>