File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1007_metho.xml

Size: 24,212 bytes

Last Modified: 2025-10-06 14:07:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1007">
  <Title>OT Syntax: Decidability of Generation-based Optimizationa0</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Undecidability for unrestricted OT
</SectionTitle>
    <Paragraph position="0"> Assume that the candidate set is characterized by a context-free grammar (cfg) a15a17a16 , plus one additional candidate 'yes'. There are two constraints (a18 a16 a1 a18a11a19 ): a18 a16 is violated if the candidate is neither 'yes' nor a structure generated by a cfg a15  ; a18 a19 is violated only by 'yes'. Now, 'yes' is in the language defined by this system iff there are no structures in a15a20a16 that are also in a15  . But the emptiness problem for the intersection of two context-free languages is known to be undecidable, so the optimization task for unrestricted OT is undecidable too.3 However, it is not in the spirit of OT to have extremely powerful individual constraints; the explanatory power should rather arise from interaction of simple constraints.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 OT-LFG
</SectionTitle>
    <Paragraph position="0"> Following (Bresnan, 2000; Kuhn, 2000; Kuhn, 2001), we define a restricted OT system based on Lexical-Functional Grammar (LFG) representations: c(ategory) structure/f(unctional) structure 2Most computational OT work so far focuses on candidates and constraints expressible as regular languages/rational relations, based on (Frank and Satta, 1998) (e.g., (Eisner, 1997; Karttunen, 1998; Gerdemann and van Noord, 2000)).</Paragraph>
    <Paragraph position="1"> 3Cf. also (Johnson, 1998) for the sketch of an undecidability argument and (Kuhn, 2001, 4.2, 6.3) for further constructions. Computational Linguistics (ACL), Philadelphia, July 2002, pp. 48-55. Proceedings of the 40th Annual Meeting of the Association for pairs a0a2a1a4a3a6a5a8a7 like a0 (4),(5)a7 . Each c-structure tree node is mapped to a node in the f-structure graph by the function a9 . The mapping is specified by f-annotations in the grammar rules (below category symbols, cf. (2)) and lexicon entries (3).4</Paragraph>
    <Paragraph position="3"> abbreviates a45 a24a48a47a49a46 a25 , i.e., the f-structure corresponding to the present node's mother category.</Paragraph>
    <Paragraph position="4"> The correct f-structure for a sentence is the minimal model satisfying all properly instantiated fannotations. null In OT-LFG, the universe of possible candidates is defined by an LFG a15a51a50a53a52a55a54a56a50a58a57a60a59 (encoding inviolable principles, like an X-bar scheme). A particular candidate set is the set Gena61a63a62a64a56a65a2a62a66a68a67a70a69 a5 a50a53a52a72a71 - i.e., the c-/fstructure pairs in a15a73a50a43a52a55a54a70a50a53a57a60a59 , which have the input a5 a50a43a52 as their f-structure. Constraints are expressed as local configurations in the c-/f-structure pairs. They have one of the following implicational forms:5 (6) a74a75a77a76 a74 a26a75  where a74a79a78a80a74a79a26 are descriptions of nonterminals of a81 a62a64a70a65a82a62a83a66a68a67 ;a75</Paragraph>
    <Paragraph position="6"> of a81 a62a64a56a65a2a62a66a68a67 ; a74a79a78a80a74 a26 refer to the mother in a local subtree configuration, a85 a78 a85 a26 refer to the same daughter category; a84 a78 a84 a26 a78 a87 a78 a87a72a26 are regular expressions over nontermi-</Paragraph>
    <Paragraph position="8"> Any of the descriptions can be maximally unspecific; (6) can for example be instantiated by the OPSPEC constraint (a89 OP)=+ a90 (DF a89 ) (an operator must be the value of a discourse function, (Bresnan, 2000)) with the category information unspecified.</Paragraph>
    <Paragraph position="9"> An OT-LFG system a91 is thus characterized by a base grammar and a set of constraints, with a language-specific ranking relation a1a93a92 :</Paragraph>
    <Paragraph position="11"> picks the most harmonic from a set of candidates, based on the constraints and ranking. The language (set of analyses)6 generated by an OT system is defined as</Paragraph>
    <Paragraph position="13"/>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 LFG generation
</SectionTitle>
    <Paragraph position="0"> Our decidability proof for generation-based optimization builds on the result of (Kaplan and Wedekind, 2000) (K&amp;W00) that LFG generation produces context-free languages.</Paragraph>
    <Paragraph position="1">  string of the c-structure part of the analyses. (8) Given an arbitrary LFG grammar a81 and a cycle-free f-structure a113 , a cfg a81 a26 can be constructed that generates exactly the strings to which a81 assigns the f-structure a113 . I will refer to the resulting cfg a15 a0 as a1a3a2 a69 a15 a3a6a5 a71 . K&amp;W00 present a constructive proof, folding all f-structural contributions of lexical entries and LFG rules into the c-structural rewrite rules (which is possible since we know in advance the range of f-structural objects that can instantiate the f-structure meta-variables in the rules). I illustrate the specialization steps with grammar (2) and lexicon (3) and for generation from f-structure (5).</Paragraph>
    <Paragraph position="2"> Initially, the generalized format of right-hand sides in LFG rules is converted to the standard context-free notation (resolving regular expressions by explicit disjunction or recursive rules). F-structure (5) contains five substructures: the root fstructure, plus the embedded f-structures under the paths SUBJ, COMP, COMP SUBJ, and COMP OBJ.</Paragraph>
    <Paragraph position="3"> Any relevant metavariable (a89 , a4 ) in the grammar must end up instantiated to one of these. So for each path from the root f-structure, a distinct variable is introduced: a5 , subscripted with the (abbreviated and possibly empty) feature path: a5 a3 a5a7a6 a3 a5a9a8 a3 a5a9a8a10a6 a3 a5a9a8a12a11 . Rule augmentation step 1 adds to each category name a concrete f-structure to which the category corresponds. So for FP, we get FP:a5 , FP:a5a13a6 , FP:a5a14a8 , FP:a5 a8a10a6 , and FP:a5 a8a12a11 . The rules are multiplied out to cover all combinations of augmented categories obeying the original f-annotations.7 Step 2 adds a set of instantiated f-annotation schemes to each symbol, based on the instantiation of metavariables from step 1. One instance of the lexicon entry Mary look as follows:</Paragraph>
    <Paragraph position="5"> The rules are again multiplied out to cover all combinations for which the set of f-constraints on the mother is the union of all daughters' fconstraints, plus the appropriately instantiated rulespecific annotations. So, for the VP rule based on the categories NP:a15 a16a19a18 :</Paragraph>
    <Paragraph position="7"> With this bottom-up construction it is ensured that each new category ROOT:a5 :a31 . . . a32 (corresponding to the original root symbol) contains a complete possible collection of instantiated f-constraints. To exclude analyses whose f-structure is not a5 (for which we are generating strings) a new start symbol is introduced &amp;quot;above&amp;quot; the original root symbol. Only for the sets of f-constraints that have a5 as their minimal model, rules of the form ROOTa0a7a33 ROOT:a5 :a31 . . . a32 are introduced (this also excludes inconsistent fconstraint sets).</Paragraph>
    <Paragraph position="8"> With the cfg a1a3a2 a69 a15 a3a6a5 a71 , standard techniques for cfg's can be applied, e.g., if there are infinitely many possible analyses for a given f-structure, the smallest one(s) can be produced, based on the pumping lemma for context-free languages. Grammar (2) does indeed produce infinitely many analyses for the input f-structure (5). It overgenerates in several respects: The functional projection FP can be stacked due to recursions like the following (with the augmented FP reoccuring in the Fa0 rules):</Paragraph>
    <Paragraph position="10"> for that in (3), so a1a3a2 ((2),(5)) generates an arbitrary number of thats on top of any FP. A similar repetition effect will arise for the auxiliary had.8 Other choices in generation arise from the freedom of generating the subject in the specifier of VP or FP and from the possibility of (unbounded) topicalization of  (10) a. John thought that Titanic, Mary had seen.</Paragraph>
    <Paragraph position="11"> b. Titanic, John thought that Mary had seen.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 LFG generation in OT-LFG
</SectionTitle>
    <Paragraph position="0"> While grammar (2) would be considered defective as a classical LFG grammar, it constitutes a reasonable example of a candidate generation grammar (a15 a50a53a52a55a54a56a50a58a57a60a59 ) in OT. Here, it is the OT constraints that enforce language-specific restrictions, so a15a98a50a53a52a55a54a56a50a58a57a60a59 has to ensure that all candidates are generated in the first place. For instance, expletive elements as do in Who do you know will arise by passing a recursion in the cfg constructed during generation. A candidate containing such a vacuous cycle can still become the winner of the OT competition if the Faithfulness constraint punishing expletives is outranked by some constraint favoring an aspect of the recursive structure. So the harmony is increased by going through the recursion a certain number of times. It is for this very reason, that Who do you know is predicted to be grammatical in English.</Paragraph>
    <Paragraph position="1"> So, in OT-LFG it is not sufficient to apply just the a1a3a2 construction; I use an additional step: prior to application of a1a3a2 , the LFG grammar a15 a50a53a52a55a54a56a50a58a57a60a59 is converted to a different form a0 a101 a69 a15 a50a43a52a55a54a70a50a53a57a60a59 a71 (depending on the constraint set a1 ), which is still an LFG grammar but has category symbols which reflect local constraint violations. When the a1 a2 construction is applied to a0 a101 a69 a15 a50a53a52a55a54a56a50a58a57a60a59 a71 , all &amp;quot;pumping&amp;quot; structures generated by the cfg a1a3a2 a69</Paragraph>
    <Paragraph position="3"> can indeed be ignored since all OT-relevant candidates are already contained in the finite set of non-recursive structures. So, finally the ranking of the constraints is taken into consideration in order to determine the harmony of the candidates in this finite subset.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 The conversion a2 a101a4a3a6a5 a50a43a52a55a54a70a50a53a57a60a59a8a7
</SectionTitle>
    <Paragraph position="0"> Preprocessing Like K&amp;W00, I assume an initial conversion of the c-structure part of rules into standard context-free form, i.e., the right-hand side is a category string rather than a regular expression. This ensures that for a given local subtree, each constraint (of form (6) or (7)) can be applied only a finite number of times: if a9 is the arity of the longest right-hand side of a rule, the maximal number of local violations is a9 (since some constraints of type (7) can be instantiated to all daughters).</Paragraph>
    <Paragraph position="1"> Grammar conversion With the number of local violations bounded, we can encode all candidate distinctions with respect to constraint violations at the local-subtree level with finite means: The set of categories in the newly constructed LFG grammar</Paragraph>
    <Paragraph position="3"> The rules in a0 a101 a69 a15 a50a43a52a55a54a70a50a53a57 a59 a71 are constructed in such a way that for each rule</Paragraph>
    <Paragraph position="5"> are included such that a39 a51a41 (the number of violations of constraint a18 a51 incurred local to the rule) and the</Paragraph>
    <Paragraph position="7"> does not match the condition a74 ;</Paragraph>
    <Paragraph position="9"> does not match the condition a74 ;</Paragraph>
    <Paragraph position="11"> if Xa111 does not match a85 , or Xa3 . . . Xa111 a0 a3 do not match a84 , or Xa111 a1 a3 . . . Xa38 do not match a87 ; ii.</Paragraph>
    <Paragraph position="13"> Note that the constraint profile of the daughter categories does not play any role in the determination of constraint violations local to the subtree under consideration (only the sequences a39 a51a41 are restricted by the conditions (12) and (13)). So for each new rule type, all combinations of constraint profiles on the daughters are constructed (creating a large but finite number of rules).9 This ensures that no sentence that can be parsed (or generated) by a15a98a50a43a52a55a54a70a50a53a57a60a59 is excluded from a0 a101 a69 a15 a50a43a52a55a54a70a50a53a57 a59 a71 (as stated by fact (14)):10  ) matches both the antecedent (a74 ) and the consequent (a74 a26 ) category description of a constraint of form (6), three clauses apply: (12b), (12c), and (12d). So, we get two new rules with the count of 0 local violations of the constraint and two rules with count 1, with a difference in the f-annotations.</Paragraph>
    <Paragraph position="14"> 10Providing all possible combinations of augmented category symbols on the right-hand rule sides in a19 a121 a24 a81a19a25 ensures that the newly constructed rules can be reached from the root symbol in a derivation. It is also guaranteed that whenever a rule a2 in a81 contributes to an analysis, at least one of the rules constructed from a2 will contribute to the corresponding analysis in a19 a121 a24 a81a19a25 . This is ensured since the subclauses in (12) and (13) cover the full space of logical possibilities.</Paragraph>
    <Paragraph position="15"> We can overload the function name Cat with a function applying to the set of analyses produced by an LFG grammar a15 by defining Cata24 a81a19a25 a13 a107 a31a110a109a63a78a56a113 a32 a115 a31a110a109 a26 a78a119a113 a32a11a10 a81 , a109 is derived from a109 a26 by applying Cat to all category symbols a134 .</Paragraph>
    <Paragraph position="16"> Coverage preservation of the a0 a101 construction holds also for the projected c-category skeleton (cf. the argumentation in fn. 10): (15) C-structure level coverage preservation For an LFG grammar a81 : Cata24 a19 a121 a24 a81a19a25a80a25 a13 a81 Each category in a0 a101 a69 a15 a71 encodes the number of local violations for all constraints. Since all constraints are locally evaluable by assumption, all constraints violated by a candidate analysis have to be incurred local to some subtree. Hence the total number of constraint violations incurred by a candidate can be computed by simply summing over all category-encoded local violation profiles: (16) Total number of constraint violations Let Nodesa24 a109 a25 be the multiset of categories occurring in the c-structure tree a109 , then the total number of violations of constraint a7 a8 incurred by an analysis a31a110a109 a78a88a113 a32 a10  Since a0 a101 a69 a15a51a50a43a52a55a54a70a50a53a57a60a59 a71 is a standard LFG grammar, we can apply the a1a3a2 construction to it to get a cfg for a given f-structure a5 a50a43a52 . The category symbols then have the form X:a0a40a39 a16 a3 a43a45a43a45a43 a3a42a39 a46 a7 :a5 :a32 , with a5 and a32 arising from the a1a3a2 construction. We can overload the projection function Cat again such that Cata69a34a33 :a5 :a35 :a36 a71a38a37 a33 for all augmented category symbol of the new format; likewise Cata69 a15 a71 for a15 a cfg. Since the a0 a101 construction (strongly) preserves the language generated, coverage preservation holds also after the application of a1a3a2 to a0 a101 a69 a15 a50a43a52a55a54a70a50a53a57 a59 a71 and  But since the symbols in a0 a101 a69 a15 a50a43a52a55a54a70a50a53a57a60a59 a71 reflect local constraint violations, Cata69 a1a3a2 a69 a0 a101 a69 a15 a50a43a52a55a54a70a50a53a57a60a59 a71 a3a6a5 a50a53a52 a71a60a71 has the property that all instances of recursion in the resulting cfg create candidates that are at most as harmonic as their non-recursive counterparts. As- null suming a projection function CatCounta69a34a33 :a5 :a35 :a36 a71a5a37 a33 :a5 , we can state more formally: (18) If a109 a3 and a109 a5 are CatCount projections of trees produced  by the cfg a39a42a41a18a24 a19 a121 a24 a81 a62a64a70a65a82a62a83a66a68a67 a25a56a78a88a113 a62a64 a25 , using exactly the same rules, and a109 a5 contains a superset of the nodes that  from a31a21a20 a3a3 a25a27a25a42a25a53a20 a8a3 a25a42a25a27a25a22a20 a30a3 a32 a13 Totala121 a24 a109 a3 a25 , and a31a21a20 a3a5 a25a27a25a27a25a22a20 a8a5 a25a28a25a27a25a22a20 a30a5 a32 a13 Totala121 a24 a109 a5a60a25 . This fact follows from definition of Total (16): the violation counts in the additional nodes in a1  will add to the total of constraint violations (and if none of the additional nodes contains any local constraint violation at all, the total will be the same as in a1 a16 ).</Paragraph>
    <Paragraph position="17"> Intuitively, the effect of the augmentation of the category format is that certain recursions in the pure a1a3a2 construction (which one may think of as a loop) are unfolded, leading to a longer loop. The new loop is sufficiently large to make all relevant distinctions.</Paragraph>
    <Paragraph position="18"> This result can be directly exploited in processing: if all non-recursive analyses are generated (of which there are only finitely many) it is guaranteed that a subset of the optimal candidates is among them. If the grammar does not contain any violation-free recursion, we even know that we have generated all optimal candidates.</Paragraph>
    <Paragraph position="19">  (19) A recursion with the derivation path a2 a76 a25a27a25a27a25 a76 a2 is called violation-free iff all categories dominated by the upper occurrence of a2 , but not dominated by the lower occurrence of a2 have the form a74 a117 a31a21a20 a3 a78a22a20 a5 a25a27a25a28a25a22a20 a30 a32 with</Paragraph>
    <Paragraph position="21"> Note that if there is an applicable violation-free recursion, the set of optimal candidates is infinite; so if the constraint set is set up properly in a linguistic analysis, one would assume that violation-free recursion should not arise. (Kuhn, 2000) excludes the application of such recursions by a similar condition as offline parsability (which excludes vacuous recursions over a string in parsing), but with the a1a3a2 construction, this condition is not necessary for decidability of the generation-based optimization task. The cfg produced by a1 a2 can be transformed further to only generate the optimal candidates according to the constraint ranking a1 a92 of the OT system a91 a37 a0 a15 a50a53a52a55a54a56a50a58a57a60a59 a3 a0 a1 a3 a1 a92 a7a60a7 , eliminating all but the violation-free recursions in the grammar:  a129a130 is finite and can be easily computed, by keeping track of the rules already used in an analysis.</Paragraph>
    <Paragraph position="22"> b. Redefine Evala120a122a121a55a123a124a19a125a103a126 to apply on a set of context-free analyses with augmented category symbols with counts of local constraint violations: Evala120a122a121a55a123a124a19a125a127a126 a24 a1 a25 a13a93a107 a109 a10 a1 a115 a109 is maximally harmonic in a1 , under ranking a96 a97a99a134 Using the function Total defined in (16), this function is straightforward to compute for finite sets, i.e., in particular Evala120a122a121a55a123a124 a125 a126 a24 a1 a16a5a2a4 a129a130 a25 . c. Augment the category format further by one index  symbols replaced by the indexed symbols -, (ii) the rules in a39a42a41a30a24 a19 a121 a24 a81 a62a64a56a65a2a62a66a68a67 a25a56a78a119a113 a62a64 a25 , in which the mother category and all daughter categories are of the form X:a31a21a20 a3 a78a27a25a42a25a27a25a22a20 a30 a32 :a15 :a7 , a20 a8 a13 a33 for a0 a13 a57 a25 a25a32 (with the new index a33 added), and (iii) one rule Sa13a16a15a4 a129a130 a10 Sa111 :a6 for each of the indexed versions Sa111 :a6 of the start symbols of  With the index introduced in step (20c), the original recursion in the cfg is eliminated in all but the violation-free cases. The grammar Cata69 a15a20a19a22a21a23  i.e., the set of c-structures for the optimal candidates for input f-structure a113 a62a64 according to the OT system</Paragraph>
    <Paragraph position="24"> 11The projection function Cat is again overloaded to also remove the index on the categories.</Paragraph>
    <Paragraph position="25"> 12Like K&amp;W00, I make the assumption that the input f-structure in generation is fully specified (i.e., all the candidates have the form a31a110a109a63a78a119a113 a62a64 a32 ), but the result can be extended to allow for the addition of a finite amount of f-structure information in generation. Then, the specified routine is computed separately for each possible f-structural extension and the results are compared in the end.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
8 Proof
</SectionTitle>
    <Paragraph position="0"> To prove fact (21) we will show that the c-structure of an arbitrary candidate analysis generated from</Paragraph>
    <Paragraph position="2"> other candidates are equally or less harmonic.</Paragraph>
    <Paragraph position="3"> Take an arbitrary candidate c-structure a1 generated from a5 a50a43a52 with a15 a50a53a52a55a54a56a50a58a57a60a59 such that a1 a0</Paragraph>
    <Paragraph position="5"> dates a1 a0 generated from a5 a50a53a52 are equally or less harmonic than a1 . Assume there were a a1 a0 that is more harmonic than a1 . Then there must be some constraint a18 a51 a0 a1 , such that a1 a0 violates a18 a51 fewer times than a1 does, and a18 a51 is ranked higher than any other constraint in which a1 and a1 a0 differ. Constraints have to be incurred within some local subtree; so a1 must contain a local violation configuration that a1 a0 does not contain, and by the construction (12)/(13) the  construction step (20b)). This gives us a contradiction with our assumption.</Paragraph>
    <Paragraph position="6"> (ii) a0 a101 a69 a1 a71 contains a recursion and a0 a101 a69 a1 a0 a71 is free of recursion. If the recursion in a0 a101 a69 a1 a71 is violationfree, then there is an equally harmonic recursionfree candidate a1 a0 a0 . But this a1 a0 a0 is also less harmonic than a0 a101 a69 a1 a0 a71 , such that it would have been ex- null a71 would also be excluded (for lack of the relevant rules in the non-recursive part). On the other hand, if it were the recursion in a0 a101 a69 a1 a71 that incurred the additional violation (as compared to a0 a101 a69  a71 contains a recursion. If this recursion is violation-free, we can pick the equally harmonic candidate avoiding the recursion to be our a0 a101 a69 a1 a0 a71 , and we are back to case (i) and (ii). Likewise, if the recursion in a0 a101 a69 a1 a0 a71 does incur some violation, not using the recursion leads to an even more harmonic candidate, for which again cases (i) and (ii) will apply. null All possible cases lead to a contradiction with the assumptions, so no candidate is more harmonic than</Paragraph>
    <Paragraph position="8"> We still have to prove that if the c-structure a1 of a candidate analysis generated from a5 a50a43a52 with a15a51a50a43a52a55a54a70a50a53a57a60a59 is equally or more harmonic than all other candidates, then it is contained in Cata69 a15 a19a22a21a23  We can use the constraint marking construction a0 a101 and the a1 a2 construction to construct the tree a1 a0 with augmented category symbols of the analysis a1 . The result of K&amp;W00 plus (17) guarantee that Cata69 a1 a0 a71 a37 a1 . Now, there has to be a homomorphism from the categories in a1 a0 to the categories of some analysis in a15 a19a22a21a23  Since we know that a1 is equally or more harmonic than any other candidate generated from a5 a50a43a52 , we know that the augmented tree a1 a0 either contains no recursion or only violation-free recursion. If it does contain such violation-free recursions we map all categories a2 on the recursion paths to the indexed form a2 :a48 , and furthermore consider the variant of a1 a0 avoiding the recursion(s). For our (non-recursive) tree, there is guaranteed to be a counterpart in the finite set of non-recursive trees in a15a20a19a22a21a23</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML