File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1022_metho.xml
Size: 26,208 bytes
Last Modified: 2025-10-06 14:14:54
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1022"> <Title>A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis</Title> <Section position="3" start_page="145" end_page="146" type="metho"> <SectionTitle> 2. A DOP model based on Lexical-Functional </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="145" end_page="146" type="sub_section"> <SectionTitle> representations Representations </SectionTitle> <Paragraph position="0"> The definition of a well-formed representation for utterance-analyses follows from LFG theory, that is, every utterance is annotated with a c-structure, an f-structure and a mapping C/ between them. The c-structure is a tree that describes the surface constituent structure of an utterance; the f-structure is an attribute-value matrix marking the grammatical relations of subject, predicate and object, as well as providing agreement features and semantic forms; and is a correspondence function that maps nodes of the c-structure into units of the f-structure (Kaplan & Bresnan 1982; Kaplan 1989). The following figure shows a representation for the utterance Kim eats.</Paragraph> <Paragraph position="1"> (We leave out some features to keep the example</Paragraph> <Paragraph position="3"> Note that the C/ correspondence function gives an explicit characterization of the relation between the superficial and underlying syntactic properties of an utterance, indicating how certain parts of the string carry information about particular units of underlying structure. As such, it will play a crucial role in our definition for the decomposition and composition operations of LFG-DOP. In (1) we see for instance that the NP node maps to the subject f-structure, and the S and VP nodes map to the outermost f-structure.</Paragraph> <Paragraph position="4"> It is generally the case that the nodes in a subtree carry information only about the f-structure units that the subtree's root gives access to. The notion of accessibility is made precise in the following definition: An f-structure unit fis C/-accessible from a node n iff either n is C-linked to f (that is, f= C/(n) ) orf is contained within C/(n) (that is, there is a chain of attributes that leads from C/(n) to f).</Paragraph> <Paragraph position="5"> All the f-structure units in (1) are C-accessible from for instance the S node and the VP node, but the TENSE and top-level PRED are not 0-accessible from the NP node.</Paragraph> <Paragraph position="6"> According to LFG theory, c-structures and f-structures must satisfy certain formal well-formedness conditions. A c-structure/f-structure pair is a valid LFG representation only if it satisfies the Non-branching Dominance, Uniqueness, Coherence and Completeness conditions (Kaplan & Bresnan 1982).</Paragraph> <Paragraph position="7"> Nonbranching Dominance demands that no c-structure category appears twice in a nonbranching dominance chain; Uniqueness asserts that there can be at most one value for any attribute in the f-structure; Coherence prohibits the appearance of grammatical functions that are not governed by the lexical predicate; and Completeness requires that all the functions that a predicate governs appear as attributes in the local f-structure.</Paragraph> <Paragraph position="8"> Decomposition operations Many different DOP models are compatible with the system of LFG representations. In this paper we outline a basic LFG-DOP model which extends the operations of Tree-DOP to take correspondences and f-structure features into account. The decomposition operations for this model will produce fragments of the composite LFG representations. These will consist of connected subtrees whose nodes are in Ccorrespondence with sub-units of f-structures. We extend the Root and Frontier decomposition operations of Tree-DOP so that they also apply to the nodes of the c-structure while respecting the fundamental principles of c-structure/f-structure correspondence. When a node is selected by the Root operation, all nodes outside of that node's subtree are erased, just as in Tree-DOP. Further, for LFG-DOP, all C/ links leaving the erased nodes are removed and all f-structure units that are not C-accessible from the remaining nodes are erased. Root thus maintains the intuitive correlation between nodes and the information in their corresponding f-structures. For example, if Root selects the NP in (1), then the f-structure corresponding to the S node is erased, giving (2) as a possible fragment: (2)</Paragraph> <Paragraph position="10"> In addition the Root operation deletes from the remaining f-structure all semantic forms that are local to f-structures that correspond to erased c-structure nodes, and it thereby also maintains the fundamental two-way connection between words and meanings.</Paragraph> <Paragraph position="11"> Thus, if Root selects the VP node so that the NP is erased, the subject semantic form &quot;Kim&quot; is also deleted: (3) SUB, \[,~u,., sG \] \] p~,.,,...D- TENSE PRES eats PRED 'eat(SUB J)' As with Tree-DOP, the Frontier operation then selects a set of frontier nodes and deletes all subtrees they dominate. Like Root, it also removes the ~ links of the deleted nodes and erases any semantic form that corresponds to any of those nodes. Frontier does not delete any other f-structure features. This reflects the fact that all features are C-accessible from the fragment's root even when nodes below the frontier are erased. For instance, if the VP in (1) is selected as a frontier node, Frontier erases the predicate</Paragraph> <Paragraph position="13"> su., IPREdeg:ml INo.</Paragraph> </Section> </Section> <Section position="4" start_page="146" end_page="150" type="metho"> <SectionTitle> TENSE PRES </SectionTitle> <Paragraph position="0"> Note that the Root and Frontier operations retain the subject's NUM feature in the VP-rooted fragment (3), even though the subject NP is not present. This reflects the fact, usually encoded in particular grammar rules or lexical entries, that verbs of English carry agreement features for their subjects. On the other hand, fragment (4) retains the predicate's TENSE feature, reflecting the possibility that English subjects might also carry information about their predicate's tense. Subject-tense agreement as encoded in (4) is a pattern seen in some languages (e.g. the split-ergativity pattern of languages like Hindi, Urdu and Georgian) and thus there is no universal principle by which fragments such as (4) can be ruled out. But in order to represent directly the possibility that subject-tense agreement is not a dependency of English, we also allow an S fragment in which the TENSE feature is deleted, as in (5).</Paragraph> <Paragraph position="2"> Fragment (5) is produced by a third decomposition operation, Discard, defined to construct generalizations of the fragments supplied by Root and Frontier. Discard acts to delete combinations of attribute-value pairs subject to the following restriction: Discard does not delete pairs whose values C-correspond to remaining c-structure nodes.</Paragraph> <Paragraph position="3"> This condition maintains the essential correspondences of LFG representations: if a c-structure and an f-structure are paired in one fragment provided by Root and Frontier, then Discard also pairs that c-structure with all generalizations of that fragment's f-structure. Fragment (5) results from applying Discard to the TENSE feature in (4).</Paragraph> <Paragraph position="4"> Discard also produces fragments such as (6), where the subject's number in (3) has been deleted:</Paragraph> <Paragraph position="6"> Again, since we have no language-specific knowledge apart from the corpus, we have no basis for ruling out fragments like (6). Indeed, it is quite intuitive to omit the subject's number in fragments derived from sentences with past-tense verbs or modals. Thus the specification of Discard reflects the fact that LFG representations, unlike LFG grammars, do not indicate unambiguously the c-structure source (or sources) of their f-structure feature values.</Paragraph> <Paragraph position="7"> The composition operation In LFG-DOP the operation for combining fragments, again indicated by o, is carried out in two steps. First the c-structures are combined by left-most substitution subject to the category-matching condition, just as in Tree-DOP. This is followed by the recursive unification of the f-structures corresponding to the matching nodes. The result retains the C/ correspondences of the fragments being combined. A derivation for an LFG-DOP representation R is a sequence of fragments the first of which is labeled with S and for which the iterative application of the composition operation produces R.</Paragraph> <Paragraph position="8"> We show in (7) the effect of the LFG composition operation using two fragments from representations of an imaginary corpus containing the sentences Kim eats and People ate. The VP-rooted fragment is substituted for the VP in the first fragment, and the second f-structure unifies with the first f-structure, resulting in a representation for the new sentence Kim ate.</Paragraph> <Paragraph position="10"> This representation satisfies the well-formedness conditions and is therefore valid. Note that in LFG-DOP, as in Tree-DOP, the same representation may be produced by several derivations involving different fragments.</Paragraph> <Paragraph position="11"> Another valid representation for the sentence Kim ate could be composed from a fragment for Kim that does not preserve the number feature, leading to a representation which is unmarked for number. The probability models we discuss below have the desirable property that they tend to assign higher probabilities to more specific representations.</Paragraph> <Paragraph position="12"> The following derivation produces a valid representation for the intuitively ungrammatical This system of fragments and composition thus provides a representational basis for a robust model of language comprehension in that it assigns at least some representations to many strings that would generally be regarded as ill-formed. A correlate of this advantage, however, is the fact that it does not offer a direct formal account of metalinguistic judgments of grammaticality. Nevertheless, we can reconstruct the notion of grammaticality by means of the following definition: A sentence is grammatical with respect to a corpus if and only if it has at least one valid representation with at least one derivation whose fragments are produced only by Root and Frontier and not by Discard.</Paragraph> <Paragraph position="13"> Thus the system is robust in that it assigns three representations (singular, plural, and unmarked as the subject's number) to the string People eats, based on fragments for which the number feature of people, eats, or both has been discarded. But unless the corpus contains non-plural instances of people or non-singular instances of eats, there will be no Discardfree derivation and the string will be classified as ungrammatical (with respect to the corpus).</Paragraph> <Paragraph position="14"> As in Tree-DOP, an LFG-DOP representation R can typically be derived in many different ways. If each derivation D has a probability P(D), then the probability of deriving R is again the probability of producing it by any of its derivations. This is the sum of the individual derivation probabilities: (9) P(R) = O derives R P(D) An LFG-DOP derivation is also produced by a stochastic branching process which at each step makes a random selection from a competition set of competing fragments. Let CP(fl CS) denote the probability of choosing a fragment f from a competition set CS containing f, then the probability of a derivation D = <fl,f2 ...fk> is (10) P(<fl,f2 ...fk>) = FIi CPffi I CSi) where as in Tree-DOP, CP(f I CS) is expressed in terms of fragment probabilities P(f) by the formula</Paragraph> <Paragraph position="16"> Tree-DOP is the special case where there are no conditions of validity other than the ones that are enforced at each step of the stochastic process by the composition operation. This is not generally the case and is certainly not the case for the Completeness Condition of LFG representations: Completeness is a property of a final representation that cannot be evaluated at any intermediate steps of the process.</Paragraph> <Paragraph position="17"> However, we can define probabilities for the valid representations by sampling only from such representations in the output of the stochastic process. The probability of sampling a particular valid representation R is given by (12) P(R I R is valid) = P(R) / ~R' is valid P(R') This formula assigns probabilities to valid representations whether or not the stochastic process guarantees validity. The valid representions for a particular utterance u are obtained by a further sampling step and their probabilities are given by: (13) P(R I R is valid and yields u) = P(R) / ~R' is valid and yields u P(R~ The formulas (9) through (13) will be part of any LFG-DOP probability model. The models will differ only in how the competition sets are defined, and this in turn depends on which well-formedness conditions are enforced on-line during the stochastic branching process and which are evaluated by the off-line validity sampling process.</Paragraph> <Paragraph position="18"> One model, which we call M1, is a straight-forward extension of Tree-DOP's probability model. This computes the competition sets only on the basis of the category-matching condition, leaving all other well-formedness conditions for off-line sampling. Thus for M1 the competition sets are defined simply in terms of the categories of a fragment's c-structure root node. Suppose that Fi-I =fl degf2 o ... ofi.1 is the current subanalysis at the beginning of step i in the process, that LNC(Fi.1) denotes the category of the leftmost nonterminal node of the c-structure of F i.1, and that r(f) is now interpreted as the root-node category of fs c-structure component. Then the competition set for the i th step is (14) CSi = {f: r(0C)=LNC(Fi.1) } Since these competition sets depend only on the category of the leftmost nonterminal of the current cstructure, the competition sets group together all fragments with the same root category, independent of any other properties they may have or that a particular derivation may have. The competition probability for a fragment can be expressed by the formula (15) CP(f) = p(f)/~Ef: r(f)=rff) P(\]&quot;) We see that the choice of a fragment at a particular step in the stochastic process depends only on the category of its root node; other well-formedness properties of the representation are not used in making fragment selections. Thus, with this model the stochastic process may produce many invalid representations; we rely on sampling of valid representations and the conditional probabilities given by (12) and (13) to take the Uniqueness, Coherence, and Completeness Conditions into account.</Paragraph> <Paragraph position="19"> Another possible model (M2) defines the competition sets so that they take a second condition, Uniqueness, into account in addition to the root node category. For M2 the competing fragments at a particular step in the stochastic derivation process are those whose c-structures have the same root node category as LNC(Fi.1 ) and also whose f-structures are consistently unifiable with the f-structure of Fi. 1 . Thus the competition set for the ith step is (16) CSi = {f: r(f)=LNC(Fi.1) andfis unifiable with the f-structure of Fi-1 } Although it is still the case that the category-matching condition is independent of the derivation, the unifiability requirement means that the competition sets vary according to the representation produced by the sequence of previous steps in the stochastic process. Unifiability must be determined at each step in the process to produce a new competition set, and the competition probability remains dependent on the particular step:</Paragraph> <Paragraph position="21"> On this model we again rely on sampling and the conditional probabilities (12) and (13) to take just the Coherence and Completeness Conditions into account.</Paragraph> <Paragraph position="22"> In model M3 we define the stochastic process to enforce three conditions, Coherence, Uniqueness and category-matching, so that it only produces representations with well-formed c-structures that correspond to coherent and consistent f-structures. The competition probabilities for this model are given by the obvious extension of (17). It is not possible, however, to construct a model in which the Completeness Condition is enforced during the derivation process. This is because the satisfiability of the Completeness Condition depends not only on the results of previous steps of a derivation but also on the following steps (see Kaplan & Bresnan 1982).</Paragraph> <Paragraph position="23"> This nonmonotonic property means that the appropriate step-wise competition sets cannot be defined and that this condition can only be enforced at the final stage of validity sampling.</Paragraph> <Paragraph position="24"> In each of these three models the category-matching condition is evaluated on-line during the derivation process while other conditions are either evaluated on-line or off-line by the after-the-fact sampling process. LFG-DOP is crucially different from Tree-DOP in that at least one validity requirement, the Completeness Condition, must always be left to the post-derivation process. Note that a number of other models are possible which enforce other combinations of these three conditions. 3. Illustration and properties of LFG-DOP We illustrate LFG-DOP using a very small corpus consisting of the two simplified LFG representations shown in (18):</Paragraph> <Paragraph position="26"> The fragments from this corpus can be composed to provide representations for the two observed sentences plus two new utterances, John walked and People fell. This is sufficient to demonstrate that the probability models M1 and M2 assign different probabilities to particular representations. We have omitted the TENSE feature and the lexical categories N and V to reduce the number of the fragments we have to deal with. Applying the Root and Frontier operators systematically to the first corpus representation produces the fragments in the first column of (19), while the second column shows the additional f-structure that is associated with each c-structure by the Discard operation.</Paragraph> <Paragraph position="27"> A total of 12 fragments are produced from this representation, and by analogy 12 fragments with either PL or unmarked NUM values will also result from People walked. Note that the \[S NP VP\] fragment with the unspecified NUM value is produced for both sentences and thus its corpus frequency is 2. There are 14 other S-rooted fragments, 4 NP-rooted fragments, and 4 VP-rooted fragments; each of these occurs only once.</Paragraph> <Paragraph position="28"> These fragments can be used to derive three different representations for John walked (singular, plural, and unmarked as the subject's number). To facilitate the presentation of our derivations and probability calculations, we denote each fragment by an abbreviated name that indicates its c-structure root-node category, the sequence of its frontier-node labels, and whether its subject's number is SG, PL, or unmarked (indicated by U). Thus the first fragment in (19) is referred to as S/John-fell/SG and the unmarked fragment that Discard produces from it is referred to as S/John-fell/U. Given this naming convention, we can specify one of the derivations for John walked by the expression S/NP-VP/U o NP/John/SG o VP/walked/U, corresponding to an analysis in which the subject's number is marked as SG. The fragment VP/walked/U of course comes from People walked, the second corpus sentence, and does not appear in condition during the stochastic branching process, and the competition sets are fixed independent of the derivation. The probability of choosing the fragment S/NP-VP/U, given that an S-rooted fragment is required, is always 2/16, its frequency divided by the sum of the frequencies of all the S fragments.</Paragraph> <Paragraph position="29"> Similarly, the probability of then choosing NP/John/SG to substitute at the NP frontier node is 1/4, since the NP competition set contains 4 fragments each with frequency 1. Thus, under model M1 the probability of producing the complete derivation S/NP-VP/U o NP/John/SG o VP/walked/U is 2/16xl/4xl/4=2/256. This probability is small because it indicates the likelihood of this derivation compared to other derivations for John walked and for the three other analyzable strings. The computation of the other M1 derivation probabilities for John walked is left to the reader. There are 5 different derivations for the representation with SG number and 5 for the PL number, while there are only 3 ways of producing the unmarked number U. The conditional probabilities for the particular representations (SG, PL, U) can be calculated by (9) and (13), and are given below.</Paragraph> <Paragraph position="30"> P(NUM=SG I valid and yield = John walked) = .353 P(NUM=PL I valid and yield = John walked) = .353 P(NUM=U I valid and yield = John walked) = .294 We see that the two specific representations are equally likely and each of them is more probable than the representation with unmarked NUM.</Paragraph> <Paragraph position="31"> Model M2 produces a slightly different distribution of probabilities. Under this model, the consistency requirement is used in addition to the root-category matching requirement to define the competition sets at each step of the branching process. This means that the first fragment that instantiates the NUM feature to either SG or PL constrains the competition sets for the following choices in a derivation. Thus, having chosen the NP/John/SG fragment in the derivation S/NP-VP/U o NP/John/SG o VP/walked/U, only 3 VP fragments instead of 4 remain in the competition set at the next step, since the VP/walked/PL fragment is no longer available. The probability for this derivation under model M2 is therefore 2/16xl/4xl/3=2/192, slightly higher than the probability assigned to it by M1.</Paragraph> <Paragraph position="32"> Table 1 shows the complete set of derivations and their M2 probabilities for John walked.</Paragraph> <Paragraph position="33"> S/NP-VP/U o NP/JohrdSG o VP/walked/U SG 2/16 x 1/4 x 1/3 S/NP-VP/SG deg NP/John/SG o VP/walked/U SG 1/16 x 1/3 x 1/3 S/NP-VP/SG deg NP/John/U o VP/walked/U SG 1/16 x 1/3 x 1/3 S/NP-walked/U o NP/John/SG SG 1/16 x 1/4 S/John-VP/SG o VP/walked/U SG 1/16 x 1/3 P(NUM=SG and yield = John walked) = 351576 = .061 P(NUM=SG I valid and yield = John walked) = 701182 = .38 S/NP-VP/U o NP/John/U o VP/walked/PL PL 2/16 x 1/4 x 1/4 S/NP-VP/PL o NP/John/U oVP/walked/PL PL 1/16 x 1/3 x 1/3 S/NP-VP/PL deg NP/John/U o VP/walked/U PL 1/16 x 1/3 x 1/3 S/NP-walked/PL o NP/JohrdU PL 1/16 x 1/3 S/John-VP/U deg VP/walked/PL PL 1/16 x 1/4 P(NUM=PL and yield = John walked) = 33.5/576 = .058 P(NUM=PL I valid and yield = John walked) = 671182 = .37 S/NP-VP/U o NP/John/U o VP/walked/U U 2/16 x 1/4 x 1/4 S/NP-walked/U o NP/John/U U 1/16 x 1/4 S/John-VP/U o VP/walked/U U 1/16 x 1/4 P(NUM=U and yield = John walked) = 22.51576 = .039 P(NUM=U I valid and yield = John walked) = 451182 = .25 and probabilities for John walked The total probability for the derivations that produce John walked is .158, and the conditional probabilities for the three representations are: P(NUM=SG I valid and yield = John walked) = .38 P(NUM=PL I valid and yield = John walked) = .37 P(NUM=U I valid and yield = John walked) = .25 For model M2 the unmarked representation is less likely than under M1, and now there is a slight bias in favor of the value SG over PL. The SG value is favored because it is carried by substitutions for the left-most word of the utterance and thus reduces competition for subsequent choices. The value PL would be more probable for the sentence People fell.</Paragraph> <Paragraph position="34"> Thus both models give higher probability to the more specific representations. Moreover, M1 assigns the same probability to SG and PL, whereas M2 doesn't.</Paragraph> <Paragraph position="35"> M2 reflects a left-to-right bias (which might be psycholinguistically interesting -- a so-called primacy effect), whereas M1 is, like Tree-DOP, order independent. null It turns out that all LFG-DOP probability models (M 1, M2 and M3) display a preference for the most specific representation. This preference partly depends on the number of derivations: specific representations tend to have more derivations than generalized (i.e., unmarked) representations, and consequently tend to get higher probabilities -- other things being equal. However, this preference also depends on the number of feature values: the more feature values, the longer the minimal derivation length must be in order to get a preference for the most specific representation (Cormons, forthcoming).</Paragraph> <Paragraph position="36"> The bias in favor of more specific representations, and consequently fewer Discard-produced feature generalizations, is especially interesting for the interpretation of ill-formed input strings. Bod & Kaplan (1997) show that in analyzing an intuitively ungrammatical string like These boys walks, there is a probabilistic accumulation of evidence for the plural interpretation over the singular and unmarked one (for all models M1, M2 and M3). This is because both These and boys carry the PL feature while only walks is a source for the SG feature, leading to more derivations for the PL reading of These boys walks. In case of &quot;equal evidence&quot; as in the ill-formed string Boys walks, model M I assigns the same probability to PL and SG, while models M2 and M3 prefer the PL interpretation due to their left-to-right bias.</Paragraph> </Section> class="xml-element"></Paper>