File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0604_metho.xml
Size: 14,675 bytes
Last Modified: 2025-10-06 14:10:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0604"> <Title>Probing the space of grammatical variation: induction of cross-lingual grammatical constraints from treebanks</Title> <Section position="5" start_page="23" end_page="23" type="metho"> <SectionTitle> 3 Maximum Entropy modelling </SectionTitle> <Paragraph position="0"> The MaxEnt framework offers a mathematically sound way to build a probabilistic model for SOI, which combines different linguistic cues. Given a linguistic context c and an outcome a[?]A that depends on c, in the MaxEnt framework the conditional probability distribution p(a|c) is estimated on the basis of the assumption that no a priori constraints must be met other than those related to a set of features fj(a,c) of c, whose distribution is derived from the training data. It can be proven that the probability distribution p satisfying the above assumption is the one with the highest entropy, is unique and has the following exponential form (Berger et al. 1996):</Paragraph> <Paragraph position="2"> where Z(c) is a normalization factor, fj(a,c) are the values of k features of the pair (a,c) and correspond to the linguistic cues of c that are relevant to predict the outcome a. Features are extracted from the training data and define the constraints that the probabilistic model p must satisfy. The parameters of the distribution a1, ..., ak correspond to weights associated with the features, and determine the relevance of each feature in the overall model. In the experiments reported below feature weights have been estimated with the Generative Iterative Scaling (GIS) algorithm implemented in the AMIS software (Miyao and Tsujii 2002).</Paragraph> <Paragraph position="3"> We model SOI as the task of predicting the correct syntactic function ph [?] {subject, object} of a noun occurring in a given syntactic context s. This is equivalent to building the conditional probability distribution p(ph|s) of having a syntactic function ph in a syntactic context s.</Paragraph> <Paragraph position="4"> Adopting the MaxEnt approach, the distribution p can be rewritten in the parametric form of (1), with features corresponding to the linguistic contextual cues relevant to SOI. The context s is a pair <vs, ns>, where vs is the verbal head and ns its nominal dependent in s. This notion of s departs from more traditional ways of describing an SOI context as a triple of one verb and two nouns in a certain syntactic configuration (e.g, SOV or VOS, etc.). In fact, we assume that SOI can be stated in terms of the more local task of establishing the grammatical function of a noun n observed in a verb-noun pair. This simplifying assumption is consistent with the claim in MacWhinney et al. (1984) that SVO word order is actually derivative from SV and VO local patterns and downplays the role of the transitive complex construction in sentence processing.</Paragraph> <Paragraph position="5"> Evidence in favour of this hypothesis also comes from corpus data: for instance, in ISST complete subject-verb-object configurations represent only 26% of the cases, a small percentage if compared to the 74% of verb tokens appearing with either a subject or an object only; a similar situation can be observed in PDT where complete subject-verb-object configurations occur in only 20% of the cases. Due to the comparative sparseness of canonical SVO constructions in Czech and Italian, it seems more reasonable to assume that children should pay a great deal of attention to both SV and VO units as cues in sentence perception (Matthews et al. in press).</Paragraph> <Paragraph position="6"> Reconstruction of the whole lexical SVO pattern can accordingly be seen as the end point of an acquisition process whereby smaller units are re-analyzed as being part of more comprehensive constructions. This hypothesis is more in line with a distributed view of canonical constructions as derivative of more basic local positional patterns, working together to yield more complex and abstract constructions. Last but not least, assuming verb-noun pairs as the relevant context for SOI allows us to simultaneously model the interaction of word order variation with pro-drop.</Paragraph> </Section> <Section position="6" start_page="23" end_page="24" type="metho"> <SectionTitle> 4 Feature selection </SectionTitle> <Paragraph position="0"> The most important part of any MaxEnt model is the selection of the context features whose weights are to be estimated from data distributions. Our feature selection strategy is grounded on the main assumption that features should correspond to theoretically and typologically well-motivated contextual cues.</Paragraph> <Paragraph position="1"> This allows us to evaluate the probabilistic model also with respect to its consistency with current linguistic generalizations. In turn, the model can be used as a probe into the correspondence between theoretically motivated Animacy. This is the main semantic feature, which tests whether the noun in s is animate or inanimate (cf. section 2). The centrality of this cue for grammatical relation assignment is widely supported by typological evidence (cf.</Paragraph> <Paragraph position="2"> Aissen 2003, Croft 2003). The Animacy Markedness Hierarchy - representing the relative markedness of the associations between grammatical functions and animacy degrees - is actually assigned the role of a functional universal principle in grammar. The hierarchy is reported below, with each item in these scales been less marked than the elements to its right: Markedness hierarchies have also been interpreted as probabilistic constraints estimated from corpus data (Bresnan et al. 2001). In our MaxEnt model we have used a reduced version of the animacy markedness hierarchy in which human and animate nouns have been both subsumed under the general class animate.</Paragraph> <Paragraph position="3"> Definiteness tests the degree of &quot;referentiality&quot; of the noun in a context pair s. Like for animacy, definiteness has been claimed to be associated with grammatical functions, giving rise to the following universal markedness hierarchy Aissen (2003):</Paragraph> <Section position="1" start_page="24" end_page="24" type="sub_section"> <SectionTitle> Definiteness Markedness Hierarchy </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> According to this hierarchy, subjects with a low degree of definiteness are more marked than subjects with a high degree of definiteness (for objects the reverse pattern holds). Given the importance assigned to the definiteness markedness hierarchy in current linguistic research, we have included the definiteness cue in the MaxEnt model. In our experiments, for Italian we have used a compact version of the definiteness scale: the definiteness cue tests whether the noun in the context pair i) is a name or a pronoun ii) has a definite article iii), has an indefinite article or iv) is a bare noun (i.e. with no article). It is worth saying that bare nouns are usually placed at the bottom end of the definiteness scale. Since in Czech there is no article, we only make a distinction between proper names and common nouns.</Paragraph> </Section> </Section> <Section position="7" start_page="24" end_page="26" type="metho"> <SectionTitle> 5 Testing the model </SectionTitle> <Paragraph position="0"> The Italian MaxEnt model was trained on 14,643 verb-subject/object pairs extracted from ISST.</Paragraph> <Paragraph position="1"> For Czech, we used a training corpus of 37,947 verb-subject/object pairs extracted from PDT. In both cases, the training set was obtained by extracting all verb-subject and verb-object dependencies headed by an active verb, with the exclusion of all cases where the position of the nominal constituent was grammatically determined (e.g. clitic objects, relative clauses). It is interesting to note that in both training sets the proportion of subjects and objects relations is nearly the same: 63.06%-65.93% verb-subject pairs and 36.94%-34.07% verb-object pairs for Italian and Czech respectively.</Paragraph> <Paragraph position="2"> The test corpus consists of a set of verb-noun pairs randomly extracted from the reference Treebanks: 1,000 pairs for Italian and 1,373 for Czech. For Italian, 559 pairs contained a subject and 441 contained an object; for Czech, 905 pairs contained a subject and 468 an object.</Paragraph> <Paragraph position="3"> Evaluation was carried out by calculating the percentage of correctly assigned relations over the total number of test pairs (accuracy). As our model always assigns one syntactic relation to each test pair, accuracy equals both standard precision and recall.</Paragraph> <Section position="1" start_page="25" end_page="26" type="sub_section"> <SectionTitle> Italian </SectionTitle> <Paragraph position="0"> We have assumed a baseline score of 56% for Italian and of 66% for Czech, corresponding to the result yielded by a naive model assigning to each test pair the most frequent relation in the training corpus, i.e. subject. Experiments were carried out with the general features illustrated in section 4: verb agreement, case (for Czech only), word order, noun animacy and noun definiteness.</Paragraph> <Paragraph position="1"> Accuracy on the test corpus is 88.4% for Italian and 85.4% for Czech. A detailed error analysis for the two languages is reported in Table 3, showing that in both languages subject identification appears to be particularly problematic. In Czech, it appears that the prototypically mistaken subjects are post-verbal (71.14%), inanimate (72.64%), ambiguously case-marked (70.65%) and agreeing with the verb (70.15%), where reported percentages refer to the whole error set. Likewise, Italian mistaken subjects can be described thus: they typically occur in post-verbal position (71.55%), are mostly inanimate (64.66%) and agree with the verb (61.21%). Interestingly, in both languages, the highest number of errors occurs when a) N has the least prototypical syntactic and semantic properties for O or S (relative to word order and noun animacy) and b) morpho-syntactic features such as agreement and case are neutralised. This shows that MaxEnt is able to home in on the core linguistic properties that govern the distribution of S and O in Italian and Czech, while remaining uncertain in the face of somewhat peripheral and occasional cases.</Paragraph> <Paragraph position="2"> A further way to evaluate the goodness of fit of our model is by inspecting the weights associated with feature values for the two languages. They are reported in Table 4, where grey cells highlight the preference of each feature value for either subject or object identification. In both languages agreement with the verb strongly relates to the subject relation. For Czech, nominative case is strongly associated with subjects while the other cases with objects. Moreover, in both languages preverbal subjects are strongly preferred over preverbal objects; animate subjects are preferred over animate objects; pronouns and proper names are typically subjects.</Paragraph> <Paragraph position="3"> Let us now try to relate these feature values to the Markedness Hierarchies reported in section 4. Interestingly enough, if we rank the Italian Anim and Inanim values for subjects and objects, we observe that they distribute consistently with the Animacy Markedness Hierarchy: Subj/Anim > Subj/Inanim and Obj/Inanim > Obj/Anim. This is confirmed by the Czech results. Similarly, by ranking the Italian values for the definiteness features in the Subj column by decreasing weight values we obtain the following ordering: PronName > DefArt > IndefArt > NoArt, which nicely fits in with the Definiteness Markedness Hierarchy in section 4. The so-called &quot;markedness reversal&quot; is replicated with a good degree of approximation, if we focus on the values for the same features in the Obj column: the PronName feature represents the most marked option, followed by IndefArt, DefArt and NoArt (the latter two showing the same feature value). The exception here is represented by the relative ordering of IndefArt and DefArt which however show very close values. The same seems to hold for Czech, where the feature ordering for Subj is PronName > DefArt/IndefArt/NoArt and the reverse is observed for Obj.</Paragraph> </Section> <Section position="2" start_page="26" end_page="26" type="sub_section"> <SectionTitle> 5.1 Evaluating comparative feature salience </SectionTitle> <Paragraph position="0"> The relative salience of the different constraints acting on SOI can be inferred by comparing the weights associated with individual feature values. For instance, Goldwater and Johnson (2003) show that MaxEnt can successfully be applied to learn constraint rankings in Optimality Theory, by assuming the parameter weights <a1, ..., ak> as the ranking values of the constraints.</Paragraph> <Paragraph position="1"> Table 5 illustrates the constraint ranking for the two languages, ordered by decreasing weight values for both S and O. Note that, although not all constraints are applicable in both languages, the weights associated with applicable constraints exhibit the same relative salience in Czech and Italian. This seems to suggest the existence of a rather dominant (if not universal) salience scale of S and O processing constraints, in spite of the considerable difference in the marking strategies adopted by the two languages.</Paragraph> <Paragraph position="2"> As the relative weight of each constraint crucially depends on its overall interaction with other constraints on a given processing task, absolute weight values can considerably vary from language to language, with a resulting impact on the distribution of S and O constructions. For example, the possibility of overtly and unambiguously marking a direct object with case inflection makes wider room for preverbal use of objects in Czech. Conversely, lack of case marking in Italian considerably limits the preverbal distribution of direct objects. This evidence, however, appears to be an epiphenomenon of the interaction of fairly stable and invariant preferences, reflecting common functional tendencies in language processing. As shown in Table 5, if constraint ranking largely confirms the interplay between animacy and word order in Italian, Czech does not contradict it but rather re-modulate it somewhat, due to the &quot;perturbation&quot; factors introduced by its richer battery of case markers.</Paragraph> </Section> </Section> class="xml-element"></Paper>