XML Viewer - w05-0509

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0509_metho.xml
Size: 24,947 bytes
Last Modified: 2025-10-06 14:09:52
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0509">
  <Title>subject/object learning</Title>
  <Section position="5" start_page="73" end_page="74" type="metho">
    <SectionTitle>
3 A Maximum Entropy model of SOI
</SectionTitle>
    <Paragraph position="0"> The Maximum Entropy (ME) framework offers a mathematically sound way to build a probabilistic model for SOI, which combines different linguistic cues. Given a linguistic context c and an outcome a[?]A that depends on c, in the ME framework the conditional probability distribution p(a|c) is estimated on the basis of the assumption that no a priori constraints must be met other than those related to a set of features fj(a,c) of c, whose distribution is derived from the training data. It can be proven that the probability distribution p satisfying the above assumption is the one with the highest entropy, is unique and has the following expone ntial form (Berger et al. 1996):</Paragraph>
    <Paragraph position="2"> where Z(c) is a normalization factor, fj(a,c) are the values of k features of the pair (a,c) and correspond to the linguistic cues of c that are relevant to predict the outcome a. Features are extracted from the training data and define the constraints that the probabilistic model p must satisfy. The parameters of the distribution a1, ..., ak correspond to weights associated with the features, and determine the relevance of each feature in the overall model. In the experiments reported below feature weights have been estimated with the Generative Iterative Scaling (GIS) algorithm implemented in the AMIS software (Miyao and Tsujii 2002).</Paragraph>
    <Paragraph position="3"> We model SOI as the task of predicting the correct syntactic function f [?] {subject, object} of a noun occurring in a given syntactic context s. This is equivalent to build the conditional probability distribution p(f |s) of having a syntactic function f in a syntactic context s. Adopting the ME approach, the distribution p can be rewritten in the parametric form of (1), with features corresponding to the linguistic contextual cues relevant to SOI. The context s is a pair &lt;vs , ns&gt;, where vs is the verbal head and ns its nominal dependent in s.</Paragraph>
    <Paragraph position="4"> This notion of s departs from more traditional ways of describing an SOI context as a triple of one verb and two nouns in a certain syntactic configuration (e.g, SOV or VOS, etc.). In fact, we assume that SOI can be stated in terms of the more  local task of establishing the grammatical function of a noun n observed in a verb-noun pair. This simplifying assumption is consistent with the claim in MacWhinney et al. (1984) that SVO word order is actually derivative from SV and VO local patterns and downplays the role of the transitive complex construction in sentence processing. Evidence in favour of this hypothesis also comes from corpus data: in ISST, there are 4,072 comp lete subject-verb-object-configurations, a small number if compared to the 11,584 verb tokens appearing with either a subject or an object only. Due to the comparative sparseness of canonical SVO constructions in Italian, it seems more reasonable to assume that children should pay a great deal of attention to both SV and VO units as cues in sentence perception (Matthews et al. 2004). Reconstruction of the whole lexical SVO pattern can accordingly be seen as the end point of an acquisition process whereby smaller units are re-analyzed as being part of more comprehensive constructions. This hypothesis is more in line with a distributed view of canonical constructions as derivative of more basic local positional patterns, working together to yield more complex and abstract constructions. Last but not least, assuming verb-noun pairs as the relevant context for SOI allows us to simultaneously model the interaction of word order variation with pro-drop in Italian.</Paragraph>
  </Section>
  <Section position="6" start_page="74" end_page="79" type="metho">
    <SectionTitle>
4 Feature selection
</SectionTitle>
    <Paragraph position="0"> The most important part of any ME model is the selection of the context features whose weights are to be estimated from data distributions. Our feature selection strategy is grounded on the main assumption that features should correspond to linguistically and psycholinguistically well-motivated contextual cues. This allows us to evaluate the probabilistic model also with respect to its ability to replicate psycholinguistic experimental results and to be consistent with linguistic generalizations.</Paragraph>
    <Paragraph position="1"> Features are binary functions fki,f (f ,s), which test whether a certain cue ki for the function f occurs in the context s. For our ME model of SOI, we have selected the following types of features: Word order tests the position of the noun wrt the verb, for instance:  Animacy tests whether the noun in s is animate or inanimate (cf. SS.2). The centrality of this cue in Italian is widely supported by psycholinguistic evidence. Another source of converging evidence comes from functional and typological linguistic research. For instance, Aissen (2003) argues for the universal value of the following hierarchy representing the relative markedness of the associations between grammatical functions and animacy degrees (with each item in these scale been less marked than the elements to its right):</Paragraph>
    <Section position="1" start_page="74" end_page="75" type="sub_section">
      <SectionTitle>
Animacy Markedness Hierarchy
</SectionTitle>
      <Paragraph position="0"> Subj/Human &gt; Subj/Animate &gt; Subj/Inanimate Obj/Inanimate &gt; Obj/Animate &gt; Obj/Human  Markedness hierarchies have also been interpreted as probabilistic constraints estimated form corpus data (Bresnan et al. 2001, Ovrelid 2004). In our ME model we have used a reduced version of the animacy markedness hierarchy in which human and animate nouns have been both subsumed under the general class animate.</Paragraph>
      <Paragraph position="1"> Definiteness tests the degree of &amp;quot;referentiality&amp;quot; of the noun in a context pair s. Like for animacy, definiteness has been claimed to be associated with grammatical functions, giving rise to the following  According to this hierarchy, subjects with a low degree of definiteness are more marked than subjects with a high degree of definiteness (for objects the reverse pattern holds). Given the importance assigned to the definiteness markedness hierarchy in current linguistic research, we have included the definiteness cue in the ME model. It is worth remarking that, unlike animacy, in psycholinguistic experiments definiteness has not been assigned any effective role in SOI. This makes testing this cue in a computational model even more interesting, as a way to evaluate its effective contribution to Italian SOI. In our experiments, we have used a &amp;quot;compact&amp;quot; version of the definiteness scale: the definiteness cue tests whether the noun in the context  pair i) is a name or a pronoun ii) has a definite article iii), has an indefinite article or iv) is a &amp;quot;bare&amp;quot; noun (i.e. with no article). It is worth saying that &amp;quot;bare&amp;quot; nouns are usually placed at the bottom end of the definiteness scale.</Paragraph>
      <Paragraph position="2"> The three types of features above only refer to nominal cues in the context pairs. Nevertheless, specific lexical properties of the verb can also be resorted to in SOI. The probability for ns to be sub-ject or object may also depend on the specific lexical preferences of vs. To take this lexical factor into account, we add a set of lexical cues to the three general feature types above. Lexical cues test animacy with respect to a specific verb vk:  Lexical features provide evidence of the prope nsity of a given verb to have an animate (inanimate) subject or object. In fact, the verb argument structure and thematic properties may well influence the possible distribution of animate (inanimate) subjects and objects, thus overriding more general tendencies. By including lexical cues, we are thus able to test the interplay of lexical constraints with general grammatical ones.</Paragraph>
      <Paragraph position="3"> Note that in our ME model we have not included agreement as a feature, in spite of its prominent role in Italian. The fact that agreement is often inconclusive for SOI (SS.2) suggests that children must also acquire the ability to deal with the interplay of various concurrent constraints, none of which is singularly sufficient for the task completion this type of competence. It is exactly this area of syntactic competence that we wanted to explore with the experiments reported below (cf.</Paragraph>
      <Paragraph position="4"> MacWhinney et al. 1984, who similarly abstract from the dominant role of case in Ge rman SOI).</Paragraph>
      <Paragraph position="5"> 5 Testing feature configurations for SOI The ME model for Italian SOI has been trained on 18,205 verb-subject/object pairs extracted from ISST. The training set was obtained by extracting all verb-subject and verb-object dependencies headed by an active verb occurring in a finite verbal construction and by excluding all cases where the position of the nominal constituent was grammatically constrained (e.g. clitic objects, relative clauses). Two different feature configurations have been used for training: [?] non-lexical feature configuration (NLC), including only general features acting as global constraints: namely word order, noun animacy and noun definiteness; [?] lexical fe ature configuration (LC), including word order, noun animacy and definiteness, and information about the verb head.</Paragraph>
      <Paragraph position="6"> The test corpus consists of 645 verb-noun pairs extracted from contexts where agreement happens to be neutralized. Of them, 446 contained a subject (either pre- or post-verbal) and 199 contained an object (either pre- or post-verbal). The two feature configurations were evaluated by calculating the percentage of correctly assigned relations over the total number of test pairs (accuracy). As our model always assigns one syntactic relation to each test pair, accuracy equals both standard precision and recall. Finally, we have assumed a baseline score of 69%, corresponding to the result yielded by a dumb model assigning to each test pair the most frequent relation in the training corpus, i.e. subject.</Paragraph>
    </Section>
    <Section position="2" start_page="75" end_page="76" type="sub_section">
      <SectionTitle>
5.1 Non-lexical feature configuration
</SectionTitle>
      <Paragraph position="0"> Our first experiment was carried out with NLC.</Paragraph>
      <Paragraph position="1"> The accuracy on the test corpus is 91.5%; most errors (i.e. 96.4%) relate to the postverbal position, with 44 mistaken subjects (42 inanimate) and 9 mistaken objects (all animate). The score was confirmed by a 10-fold cross-validation on the whole training set (89.3% accuracy).</Paragraph>
      <Paragraph position="2"> A further way to evaluate the goodness of the model is by inspecting the weights associated with  The grey cells in Table 1 highlight the preference of each feature value for either subject or object identif ication: e.g. preverbal subjects are strongly preferred over preverbal objects; animate subjects  are preferred over animate objects, etc. Interestingly, if we rank the Anim and Inanim values for subjects and objects, we can observe tha t they distribute consistently with the Animacy Markedness Hierarchy reported in SS.4: Subj /Anim &gt; Subj/Inanim and Obj/Inanim &gt; Obj/Anim. Sim ilarly, by ranking the values of the definiteness features in the Subj column by decreasing weight values we obtain the following ordering: PronName &gt; DefArt &gt; IndefArt &gt; NoArt, which nicely fits in with the Definiteness Markedness Hierarchy in SS.4. The so-called &amp;quot;markedness reversal&amp;quot; is observed if we focus on the values for the same features in the Obj column: the PronName feature represents the most marked option, followed by DefArt. The only exception is represented by the relative ordering of IndefArt and NoArt which however show very close values.</Paragraph>
      <Paragraph position="3"> Evaluating feature salience In order to evaluate the most reliable cues in Italian SOI, we have analysed the model predictions for different bundles of feature values. For each of the  The model shows a neat preference for subject when the noun is preverbal. Instead, when the noun is postverbal, function assignment is de facto decided by the noun animacy. Conversely, definiteness features have a much more secondary role: the y can re-enforce (or weaken) the preference expressed by animacy, but they do not have the strength to determine SOI.</Paragraph>
      <Paragraph position="4"> The relative salience of the different constraints acting on SOI can also be inferred by comparing the weights associated with individual feature values. For instance, Goldwater and Johnson (2003) show that ME can be successfully applied to learn constraint rankings in Optimality Theory, by assuming the parameter weights a1, ..., ak as the ranking values of the constraints. The following table lists the 16 general constraints of the model by increasing weight values:</Paragraph>
    </Section>
    <Section position="3" start_page="76" end_page="77" type="sub_section">
      <SectionTitle>
Feature Weight
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> The rankings in Table 3 can be used to derive the relative salience of each constraint. Lower ranked constraints correspond to more marked syntactic config urations that are then disfavoured in SOI.</Paragraph>
      <Paragraph position="3"> Notice that the two animacy constraints Anim_Obj and Anim_Subj are respectively placed near the bottom and the top end of the scale. Notwithstanding the low position of Postverbal_Subj, animacy is thus able to override the word order constraint and to produce a strong tendency to identify animate nouns as subjects, even when they appear in postverbal position (cf. Table 2 above). The constraint ranking thus confirms the interplay between animacy and word order in Italian, with the former playing a decisive role in assigning the syntactic function of postverbal nouns. On the other hand,  the constraints involving noun definiteness occupy a more intermediate position in the general ranking, with very close values. This is again consistent with the less decisive role of this feature type in SOI, as shown above.</Paragraph>
    </Section>
    <Section position="4" start_page="77" end_page="78" type="sub_section">
      <SectionTitle>
5.2 Lexical feature configuration
</SectionTitle>
      <Paragraph position="0"> In this experiment the general features reported in Table 1 have been integrated with 4,316 verb-specific features as the ones exemplified below for the verb dire 'say': dire_animSog 1.228213e+00 dire_noanimSog 7.028484e-01 dire_animOgg 3.645964e-01 dire_noanimOgg 1.321887e+00 whose associated weights show the strong preference of this verb to take animate subjects as opposed to inanimate ones as well as a preference for inanimate objects with respect to animate ones. The results achieved with LC on the test corpus show a significant improvement with respect to those obtained with NLC: the accuracy is now 95.5%, with a 4% improvement, confirmed by a 10-fold cross-validation (94.9%). Also in this case, most of the errors relate to the pos tverbal position (i.e. 27 out of 29), partitioned into 26 mistaken subjects and 1 mistaken object. Lexical features have been resorted to to solve most of the NLC errors (i.e. 34 out of 55). It is interesting to note however that lexical features can also be misleading. The LC results include 8 new errors, suggesting that lexical features do not always provide conclusive evidence: in fact, in 185 cases out of 645 test VN pairs (i.e. 28.7% of the cases) general features are preferred over lexical ones. It is also worth mentioning that the ranking of general animacy and definiteness features in LC actually fits in with the respective markedness hierarchies even with a better approximation than the one produced by NLC. Finally, the relative prominence of the different global features confirms the trend in Table 2, with word order being predominant in pre-verbal pos ition and animacy playing a major role with pos tverbal nouns.</Paragraph>
      <Paragraph position="1"> Both feature configurations of the ME model thus appear to comply with linguistic and psycho-linguistic gene ralizations on SOI. On the linguistic side, the constraints learnt by the model are consistent with universal markedness hierarchies for grammatical relations. Secondly, the prominence of the various constraints in the model fits in well with psycholinguistic data. Consistently with the results in Bates et al. (1984), the model confirms the great impact of noun animacy in Italian, although in this case its key role seems to be more directly limited to the postverbal position. Conversely, the preverbal position is by itself a very strong cue for subject interpretation.</Paragraph>
      <Paragraph position="2"> 6 High frequency verbs and SOI Frequency is known to play a major influence in language learning. In morphology, for example, highly frequent lexical items tend to be shorter forms, more readily accessible in the mental lexicon, independently stored as whole items (Stemberger and MacWhinney 1986) and fairly resistant to morphological overgeneralization through time, thus establishing a correlation between irregular inflected forms and frequency. Frequency has also been assigned a key role in the acquisition of syntactic constructions. In fact, Goldberg (1998) and Ninio (1999) have independently argued for the existence of a causal relation between early exposure to highly frequent light verbs and acquisition of abstract syntax-semantics mappings (constructions). Light verbs such as want, put and go tend to be very frequent, because they are applicable in a wider range of contexts and are learned and used at an early la nguage maturation stage The main idea is that children's early use of these high frequency verbs is conducive to the acquisition of abstract constructional properties generalizing over partic ular instances.</Paragraph>
      <Paragraph position="3"> Goldberg et al. (2004) motivate this hypothesis by observing that light verbs have high input frequency in the child's developmental environment and, at the same time, exhibit a low degree of semantic specialization. Hence, she argues, it takes a little abstraction step for a child to jump from actual instances of use of light verbs to the syntax-semantics association of their underlying construction. On the other hand, Ninio (1999) grounds the facilitatory role of highly frequent verbs on their being &amp;quot;pathbreaking&amp;quot; prototypes of the construction they instantiate, since they are the best models of the relevant combinatorial and semantic properties of their construction in a relatively undiluted fashion. However, in the case of light verb constructions, the correlation between high frequency  and construction prototypicality and extension is tenuous. In fact, it is difficult to argue that frequent light verbs such as see, want or do exhibit a high degree of both semantic and constructional trans itivity (Goldberg et al. 2004). This is reminiscent of the morphological behaviour of very frequent word forms in infle ctional languages, as most of these forms are highly fused and show a general tendency towards irregular inflection and low morphological prototypicality. Furthermore, it is difficult to reconcile the &amp;quot;pathbreaking&amp;quot; view with the observation that frequently observed linguistic units are memorized in full, as unanalyzed wholes.</Paragraph>
    </Section>
    <Section position="5" start_page="78" end_page="79" type="sub_section">
      <SectionTitle>
6.1 Testing the role of frequency
</SectionTitle>
      <Paragraph position="0"> To address these open issues and put the alleged &amp;quot;pathbreaking&amp;quot; role of light verbs to the challenging test of a probabilistic model, we carried out a second battery of experiments to learn the general, non-lexical constraints from two training corpora of roughly equivalent size where overall type and token verb frequencies were controlled for. Both corpora are a subset of the original training set: 1. skewed frequency corpus (SF) - it includes 5,261 context pairs, obtained by selecting 15 verbs occurring more than 100 times in ISST (figures in parentheses give their token frequency): essere 'be' (2406), avere 'have' (708), fare 'do, make' (527), dire 'say, tell' (275), dare 'give' (173), vedere 'see' (134), andare 'go' (126), sembrare 'seem' (124), cercare 'try' (122), mettere 'put' (122), portare 'take' (121), trovare 'find' (112), volere 'want' (105), lasciare 'leave' (105), riu scire 'manage' (101). It is worth noticing that this set includes typical &amp;quot;pathbreaking&amp;quot; verbs; 2. balanced frequency corpus (BF) - this corpus includes 5,373 context pairs selected in such a way to ensure that every verb type in the original training set is attested in BF and occurs at most 6 times. For verbs occurring with a higher frequency, the pairs to be included in BF have been randomly selected. null Thus SF and BF represent two opposite training situations: SF contains few types with very high token frequencies, while BF contains a high number of verb types (i.e. 1457), with very low and uniform token frequency. These training sets resemble the structure of linguistic input used by Goldberg et al. (2004) for their experiments. In that case, one group of subjects was exposed to linguistic inputs in which some verbs occurred with a much higher frequency than the others; a second group of subjects was instead exposed to linguistic stimuli in which every verb occurred with roughly equal frequency. Therefore, by training our ME model on SF and BF we are able to evaluate the effective role of high token frequency verbs in driving syntactic learning.</Paragraph>
      <Paragraph position="1"> The ME model with the general features only (i.e. NLC) was first trained on SF, and then tested on the 645-pair corpus in SS.5, showing a 90% accuracy. The same ME model was then trained on BF, and then tested on the 645-pair corpus, scoring a 87% accuracy. The ME model trained on the skewed frequency data thus outperforms the model trained on BF in a statistically significant way (?2 = 4.97; a=0.05; p-value = 0.025).</Paragraph>
      <Paragraph position="2"> By using a training set formed only by the verbs with the highest token frequency, the model has thus been able to acquire robust syntactic constraints for SOI. Once these constraints have been applied to unseen events, the model has achieved a performance comparable to the one of the general models in SS.5. This is somehow even more signif icant if we consider that the training set was now formed by less than one-third of the pairs on which the models in SS.5 were trained. Data quantity aside, the most relevant fact is that it is the way verb frequencies are distributed to determine the learning path, with a significant positive effect produced by high token frequency verbs. In the model trained on SF, feature ranking is also governed by markedness relations, and the relative prominence of the various constraints is utterly similar to the one discussed in SS.5. In other terms, the results of this experiment prove that frequent verbs are actually able to act as &amp;quot;catalysts&amp;quot; of the syntactic acquis ition process. It is possible for children to converge on the correct generalizations governing SOI in Italian, just by relying on the linguistic evidence provided by the most frequent verbs.</Paragraph>
      <Paragraph position="3"> This view suggests a way out of the apparent paradox of the &amp;quot;pathbreaking&amp;quot; hypothesis: highly frequent verbs can be assumed to provide stable and consistent multiple probabilistic cues for the assignment of subject/object relations. The existence of pos itional patterns that occur with high token frequency may well provide a deeply entrenched and highly salient set of distributional cues that act as probabilistic constraints on constructional ge neralizations. We hypothesize that similar constructions of other less frequent verbs  are processed, for lack of more specific overriding information, in the light of these constraints. Since processing is the result of a &amp;quot;conspiracy&amp;quot; of distributed constraints, &amp;quot;pathbreaking&amp;quot; prototypes need not be real construction exemplars but highly schematic patterns. We proved that highly frequent local positional patterns offer the right sort of constraint conspiracy.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML