File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1019_metho.xml
Size: 17,766 bytes
Last Modified: 2025-10-06 14:10:18
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1019"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Partially Specified Signatures: a Vehicle for Grammar Modularity</Title> <Section position="6" start_page="145" end_page="146" type="metho"> <SectionTitle> 4 Desiderata </SectionTitle> <Paragraph position="0"> To better understand the needs of grammar developers we carefully explored two existing grammars: the LINGO grammar matrix (Bender et al., 2002), which is a basis grammar for the rapid development of cross-linguistically consistent gram- null mars; and a grammar of a fragment of Modern Hebrew, focusing on inverted constructions (Melnik, 2006). These grammars were chosen since they are comprehensive enough to reflect the kind of data large scale grammar encode, but are not too large to encumber this process. Motivated by these two grammars, we experimented with ways to divide the signatures of grammars into modules and with different methods of module interaction. This process resulted in the following desiderata for a beneficial solution for signature modularization: 1. The grammar designer should be provided with as much flexibility as possible. Modules should not be unnecessarily constrained.</Paragraph> <Paragraph position="1"> 2. Signature modules should provide means for specifying partial information about the components of a grammar.</Paragraph> <Paragraph position="2"> 3. A good solution should enable one module to refer to types defined in another. Moreover, it should enable the designer of module Mi to use a type defined in Mj without specifying the type explicitly. Rather, some of the attributes of the type can be (partially) specified, e.g., its immediate subtypes or its appropriateness conditions.</Paragraph> <Paragraph position="3"> 4. While modules can specify partial information, it must be possible to deterministically extend a module (which can be the result of the combination of several modules) into a full type signature.</Paragraph> <Paragraph position="4"> 5. Signature combination must be associative and commutative: the order in which modules are combined must not affect the result. The solution we propose below satisfies these requirements.1 null</Paragraph> </Section> <Section position="7" start_page="146" end_page="148" type="metho"> <SectionTitle> 5 Partially specified signatures </SectionTitle> <Paragraph position="0"> We define partially specified signatures (PSSs), also referred to as modules below, which are structures containing partial information about a signature: part of the subsumption relation and part of the appropriateness specification. We assume enumerable, disjoint sets TYPE of types and FEAT of features, over which signatures are defined.</Paragraph> <Paragraph position="1"> We begin, however, by defining partially labeled graphs, of which PSSs are a special case.</Paragraph> <Paragraph position="2"> 1The examples in the paper are inspired by actual grammars but are obviously much simplified.</Paragraph> <Paragraph position="3"> Definition 5 A partially labeled graph (PLG) over TYPE and FEAT is a finite, directed labeled graph S = <Q,T,precedesequal,Ap> , where: 1. Q is a finite, nonempty set of nodes, disjoint from TYPE and FEAT.</Paragraph> <Paragraph position="4"> 2. T : Q - TYPE is a partial function, marking some of the nodes with types.</Paragraph> <Paragraph position="5"> 3. precedesequal[?] Q x Q is a relation specifying (immediate) subsumption.</Paragraph> <Paragraph position="6"> 4. Ap [?] Qx FEAT xQ is a relation specifying appropriateness.</Paragraph> <Paragraph position="7"> Definition 6 A partially specified signature (PSS) over TYPE and FEAT is a PLG S = <Q,T,precedesequal,Ap> , where: 1. T is one to one.</Paragraph> <Paragraph position="8"> 2. 'precedesequal' is antireflexive; its reflexive-transitive closure, denoted ' [?]precedesequal', is antisymmetric. 3. (a) (Relaxed Upward Closure) for all q1,q'1,q2 [?] Q and F [?] FEAT, if (q1,F,q2) [?] Ap and q1 [?]precedesequal q'1, then there exists q'2 [?] Q such that q2 [?]precedesequal q'2 and (q'1,F,q'2) [?] Ap; and (b) (Maximality) for all q1,q2 [?] Q and F [?] FEAT, if (q1,F,q2) [?] Ap then for all q'2 [?] Q such that q'2 [?]precedesequal q2 and q2 negationslash= q'2, (q1,F,q'2) /[?] Ap.</Paragraph> <Paragraph position="9"> A PSS is a finite directed graph whose nodes denote types and whose edges denote the subsumption and appropriateness relations. Nodes can be marked by types through the function T, but can also be anonymous (unmarked). Anonymous nodes facilitate reference, in one module, to types that are defined in another module. T is one-to-one since we assume that two marked nodes denote different types.</Paragraph> <Paragraph position="10"> The 'precedesequal' relation specifies an immediate subsumption order over the nodes, with the intention that this order hold later for the types denoted by nodes. This is why ' [?]precedesequal' is required to be a partial order. The type hierarchy of a type signature is a BCPO, but current approaches (Copestake, 2002) relax this requirement to allow more flexibility in grammar design. PSS subsumption is also a partial order but not necessarily a bounded complete one. After all modules are combined, the resulting subsumption relation will be extended to a BCPO (see section 7), but any intermediate result can be a general partial order. Relaxing the BCPO requirement also helps guaranteeing the associativity of module combination.</Paragraph> <Paragraph position="11"> Consider now the appropriateness relation. In contrast to type signatures, Ap is not required to be a function. Rather, it is a relation which may specify several appropriate nodes for the values of a feature F at a node q. The intention is that the eventual value of Approp(T(q),F) be the lub of the types of all those nodes q' such that Ap(q,F,q'). This relaxation allows more ways for modules to interact. We do restrict the Ap relation, however. Condition 3a enforces a relaxed version of upward closure. Condition 3b disallows redundant appropriateness arcs: if two nodes are appropriate for the same node and feature, then they should not be related by subsumption. The feature introduction condition of type signatures is not enforced by PSSs. This, again, results in more flexibility for the grammar designer; the condition is restored after all modules combine, see section 7. Example 1 A simple PSS S1 is depicted in Figure 1, where solid arrows represent the 'precedesequal' (subsumption) relation and dashed arrows, labeled by features, the Ap relation. S1 stipulates two sub-types of cat, n and v, with a common subtype, gerund. The feature AGR is appropriate for all three categories, with distinct (but anonymous) values for Approp(n, AGR) and Approp(v, AGR).</Paragraph> <Paragraph position="12"> Approp(gerund, AGR) will eventually be the lub of Approp(n, AGR) and Approp(v, AGR), hence the multiple outgoing AGR arcs from gerund.</Paragraph> <Paragraph position="13"> Observe that in S1, 'precedesequal' is not a BCPO, Ap is not a function and the feature introduction condition does not hold.</Paragraph> <Paragraph position="14"> We impose an additional restriction on PSSs: a PSS is well-formed if any two different anonymous nodes are distinguishable, i.e., if each node is unique with respect to the information it encodes. If two nodes are indistinguishable then one of them can be removed without affecting the information encoded by the PSS. The existence of indistinguishable nodes in a PSS unnecessarily increases its size, resulting in inefficient processing. Given a PSS S, it can be compacted into a PSS, compact(S), by unifying all the indistinguishable nodes in S. compact(S) encodes the same information as S but does not include indistinguishable nodes. Two nodes, only one of which is anonymous, can still be otherwise indistinguishable. Such nodes will, eventually, be coalesced, but only after all modules are combined (to ensure the associativity of module combination). The detailed computation of the compacted PSS is suppressed for lack of space.</Paragraph> <Paragraph position="16"/> </Section> <Section position="8" start_page="148" end_page="150" type="metho"> <SectionTitle> 6 Module combination </SectionTitle> <Paragraph position="0"> We now describe how to combine modules, an operation we call merge bellow. When two modules are combined, nodes that are marked by the same type are coalesced along with their attributes.</Paragraph> <Paragraph position="1"> Nodes that are marked by different types cannot be coalesced and must denote different types. The main complication is caused when two anonymous nodes are considered: such nodes are coalesced only if they are indistinguishable.</Paragraph> <Paragraph position="2"> The merge of two modules is performed in several stages: First, the two graphs are unioned (this is a simple pointwise union of the coordinates of the graph, see definition 7). Then the resulting graph is compacted, coalescing nodes marked by the same type as well as indistinguishable anonymous nodes. However, the resulting graph does not necessarily maintain the relaxed upward closure and maximality conditions, and therefore some modifications are needed. This is done by Ap-Closure, see definition 8. Finally, the addition of appropriateness arcs may turn two anonymous distinguishable nodes into indistinguishable ones and therefore another compactness operation is needed (definition 9).</Paragraph> <Paragraph position="3"> Ap-Closure adds to a PLG the arcs required for it to maintain the relaxed upward closure and maximality conditions. First, arcs are added (Ap') to maintain upward closure (to create the relations between elements separated between the two modules and related by mutual elements). Then, redundant arcs are removed to maintain the maximality condition (the removed arcs may be added by Ap' but may also exist in Ap). Notice that</Paragraph> <Paragraph position="5"> Two PSSs can be merged only if the resulting subsumption relation is indeed a partial order, where the only obstacle can be the antisymmetry of the resulting relation. The combination of the appropriateness relations, in contrast, cannot cause the merge operation to fail because any violation of the appropriateness conditions in PSSs can be deterministically resolved.</Paragraph> <Paragraph position="7"> In the merged module, pairs of nodes marked by the same type and pairs of indistinguishable anonymous nodes are coalesced. An anonymous node cannot be coalesced with a typed node, even if they are otherwise indistinguishable, since that will result in an unassociative combination operation. Anonymous nodes are assigned types only after all modules combine, see section 7.1.</Paragraph> <Paragraph position="8"> If a node has multiple outgoing Ap-arcs labeled with the same feature, these arcs are not replaced by a single arc, even if the lub of the target nodes exists in the resulting PSS. Again, this is done to guarantee the associativity of the merge operation.</Paragraph> <Paragraph position="9"> Example 3 Figure 4 depicts a na&quot;ive agreement module, S5. Combined with S1 of Figure 1, S1 [?] S5 = S5 [?] S1 = S6, where S6 is depicted in Figure 5. All dashed arrows are labeled AGR, but these labels are suppressed for readability.</Paragraph> <Paragraph position="10"> Example 4 Let S7 and S8 be the PSSs depicted in Figures 6 and 7, respectively. Then S7 [?] S8 = S8[?]S7 = S9, where S9 is depicted in Figure 8. By standard convention, Ap arcs that can be inferred by upward closure are not depicted.</Paragraph> <Paragraph position="11"> Proposition 2 Given two mergeable PSSs S1,S2, S1 [?]S2 is a well formed PSS.</Paragraph> <Paragraph position="12"> Proposition 3 PSS merge is commutative: for any two PSSs, S1,S2, S1[?]S2 = S2[?]S1. In particular, either both are defined or both are undefined.</Paragraph> <Paragraph position="13"> Proposition 4 PSS merge is associative: for all S1,S2,S3, (S1 [?]S2)[?]S3 = S1 [?](S2 [?]S3).</Paragraph> <Paragraph position="14"> 7 Extending PSSs to type signatures When developing large scale grammars, the signature can be distributed among several modules. A PSS encodes only partial information and therefore is not required to conform with all the constraints imposed on ordinary signatures. After all the modules are combined, however, the PSS must be extended into a signature. This process is done in 4 stages, each dealing with one property: 1.</Paragraph> <Paragraph position="15"> Name resolution: assigning types to anonymous nodes (section 7.1); 2. Determinizing Ap, converting it from a relation to a function (section 7.2); 3. Extending 'precedesequal' to a BCPO. This is done using the algorithm of Penn (2000); 4. Extending Ap to a full appropriateness specification by enforcing the feature introduction condition: Again, we use the person nvagr bool</Paragraph> <Section position="1" start_page="149" end_page="150" type="sub_section"> <SectionTitle> 7.1 Name resolution </SectionTitle> <Paragraph position="0"> By the definition of a well-formed PSS, each anonymous node is unique with respect to the information it encodes among the anonymous nodes, but there may exist a marked node encoding the same information. The goal of the name resolution procedure is to assign a type to every anonymous node, by coalescing it with a similar marked node, if one exists. If no such node exists, or if there is more than one such node, the anonymous node is given an arbitrary type.</Paragraph> <Paragraph position="1"> The name resolution algorithm iterates as long as there are nodes to coalesce. In each iteration, for each anonymous node the set of its similar typed nodes is computed. Then, using this computation, anonymous nodes are coalesced with their paired similar typed node, if such a node uniquely exists. After coalescing all such pairs, the resulting PSS may be non well-formed and therefore the PSS is compacted. Compactness can trigger more pairs that need to be coalesced, and therefore the above procedure is repeated. When no pairs that need to be coalesced are left, the remaining anonymous nodes are assigned arbitrary names and the algorithm halts. The detailed algorithm is suppressed for lack of space.</Paragraph> <Paragraph position="2"> Example 5 Let S6 be the PSS depicted in Figure 5. Executing the name resolution algorithm on this module results in the PSS of Figure 9 (AGR-labels are suppressed for readability.) The two anonymous nodes in S6 are coalesced with the nodes marked nagr and vagr, as per their attributes. Cf. Figure 1, in particular how two anonymous nodes in S1 are assigned types from</Paragraph> </Section> <Section position="2" start_page="150" end_page="150" type="sub_section"> <SectionTitle> 7.2 Appropriateness consolidation </SectionTitle> <Paragraph position="0"> For each node q, the set of outgoing appropriateness arcs with the same label F, {(q,F,q')}, is replaced by the single arc (q,F,ql), where ql is marked by the lub of the types of all q'. If no lub exists, a new node is added and is marked by the lub. The result is that the appropriateness relation is a function, and upward closure is preserved; feature introduction is dealt with separately.</Paragraph> <Paragraph position="1"> The input to the following procedure is a PSS whose typing function, T, is total; its output is a PSS whose typing function, T, is total and whose appropriateness relation is a function. Let S = <Q,T,precedesequal,Ap> be a PSS. For each q [?] Q and F [?] FEAT, let</Paragraph> <Paragraph position="3"> 1. Find a node q and a feature F for which |target(q,F) |> 1 and for all q' [?] Q such that q' [?]precedesequal q, |target(q',F) |[?] 1. If no such pair exists, halt.</Paragraph> <Paragraph position="4"> 2. If target(q,F) has a lub, p, then: (a) for all q' [?] target(q,F), remove the arc (q,F,q') from Ap.</Paragraph> <Paragraph position="5"> (b) add the arc (q,F,p) to Ap.</Paragraph> <Paragraph position="6"> (c) for all q' [?] Q such that q [?]precedesequal q', if (q',F,p) /[?] Ap then add (q',F,p) to Ap.</Paragraph> <Paragraph position="7"> (d) go to (1).</Paragraph> <Paragraph position="8"> 3. (a) Add a new node, p, to Q with:</Paragraph> <Paragraph position="10"> (b) Mark p with a fresh type from NAMES.</Paragraph> <Paragraph position="11"> (c) For all q' [?] Q such that q [?]precedesequal q', add (q',F,p) to Ap.</Paragraph> <Paragraph position="12"> (d) For all q' [?] target(q,F), remove the arc (q,F,q') from Ap.</Paragraph> <Paragraph position="13"> (e) Add (q,F,p) to Ap.</Paragraph> <Paragraph position="14"> (f) go to (1).</Paragraph> <Paragraph position="15"> The order in which nodes are selected in step 1 of the algorithm is from supertypes to subtypes. This is done to preserve upward closure. In addition, when replacing a set of outgoing appropriateness arcs with the same label F, {(q,F,q')}, by a single arc (q,F,ql), ql is added as an appropriate value for F and all the subtypes of q. Again, this is done to preserve upward closure. If a new node is added (stage 3), then its appropriate features and values are inherited from its immediate supertypes. During the iterations of the algorithm, condition 3b (maximality) of the definition of a PSS may be violated but the resulting graph is guaranteed to be a PSS.</Paragraph> <Paragraph position="16"> Example 6 Consider the PSS depicted in Figure 9. Executing the appropriateness consolidation algorithm on this module results in the module depicted in Figure 10. AGR-labels are suppressed. gerund new n v vagr nagr cat agr</Paragraph> </Section> </Section> class="xml-element"></Paper>