File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1305_metho.xml
Size: 18,634 bytes
Last Modified: 2025-10-06 14:09:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1305"> <Title>33 A Developmental Model of Syntax Acquisition in the Construction Grammar Framework with Cross-Linguistic Validation in English and Japanese</Title> <Section position="3" start_page="36" end_page="39" type="metho"> <SectionTitle> 3 Learning Experiments </SectionTitle> <Paragraph position="0"> Three sets of results will be presented. First the demonstration of the model sentence to meaning mapping for a reduced set of constructions is presented as a proof of concept. This will be followed by a test of generalization to a new extended set of grammatical constructions.</Paragraph> <Paragraph position="1"> Finally, in order to validate the cross-linguistic validity of the underlying principals, the model is tested with Japanese, a free word-order language that is qualitatively quite distinct from English.</Paragraph> <Section position="1" start_page="36" end_page="37" type="sub_section"> <SectionTitle> 3.1 Proof of Concept with Two Constructions 3.1.1 Initial Learning of Active Forms for Simple Event Meanings </SectionTitle> <Paragraph position="0"> The first experiment examined learning with sentence, meaning pairs with sentences only in the active voice, corresponding to the grammatical forms 1 and 2.</Paragraph> <Paragraph position="1"> 1. Active: The block pushed the triangle. 2. Dative: The block gave the triangle to the moon.</Paragraph> <Paragraph position="2"> For this experiment, the model was trained on 544 <sentence, meaning> pairs. Again, meaning is coded in a predicate-argument format, e.g. push(block, triangle) for sentence 1. During the first 200 trials (scene/sentence pairs), value a in Eqn. 1 was 1 and thereafter it was 0. This was necessary in order to avoid the effect of erroneous 6 In Eqn 7, index i = 1 to 25 corresponding to the size of the scene and word vectors. Indices m and k = 1 to 6, corresponding to the dimension of the predicted scene array, and the predicted references array, respectively.</Paragraph> <Paragraph position="3"> (random) syntactic knowledge on semantic learning in the initial learning stages. Evaluation of the performance of the model after this training indicated that for all sentences, there was error-free performance. That is, the PredictedScene generated from each sentence corresponded to the actual scene paired with that sentence. An important test of language learning is the ability to generalize to new sentences that have not previously been tested. Generalization in this form also yielded error free performance. In this experiment, only 2 grammatical constructions were learned, and the lexical mapping of words to their scene referents was learned. Word meaning provides the basis for extracting more complex syntactic structure. Thus, these word meanings are fixed and used for the subsequent experiments.</Paragraph> <Paragraph position="4"> The second experiment examined learning with the introduction of passive grammatical forms, thus employing grammatical forms 1-4.</Paragraph> <Paragraph position="5"> 3. Passive: The triangle was pushed by the block. 4. Dative Passive: The moon was given to the triangle by the block.</Paragraph> <Paragraph position="6"> A new set of <sentence, scene> pairs was generated that employed grammatical constructions, with two- and three- arguments, and active and passive grammatical forms for the narration. Word meanings learned in Experiment 1 were used, so only the structural mapping from grammatical to scene structure was learned. With exposure to less than 100 <sentence, scene>, error free performance was achieved. Note that only the WordToReferent mappings were retained from Experiment 1. Thus, the 4 grammatical forms were learned from the initial naive state. This means that the ConstructionIndex and ConstructionInventory mechanism correctly discriminates and learns the mappings for the different grammatical constructions. In the generalization test, the learned values were fixed, and the model demonstrated error-free performance on new sentences for all four grammatical forms that had not been used during the training.</Paragraph> <Paragraph position="7"> 3.1.3 Relative forms for Complex Events The complexity of the scenes/meanings and corresponding grammatical forms in the previous experiments were quite limited. Here we consider complex <sentence, scene> mappings that involve relativised sentences and dual event scenes. A small corpus of complex <sentence, scene> pairs were generated corresponding to the grammatical construction types 5-10 the moon.</Paragraph> <Paragraph position="8"> After exposure to less than 100 sentences generated from these relativised constructions, the model performed without error for these 6 construction types. In the generalization test, the learned values were fixed, and the model demonstrated error-free performance on new sentences for all six grammatical forms that had not been used during the training.</Paragraph> <Paragraph position="9"> The objective of the final experiment was to verify that the model was capable of learning the 10 grammatical forms together in a single learning session. Training material from the previous experiments were employed that exercised the ensemble of 10 grammatical forms. After exposure to less than 150 <sentence, scene> pairs, the model performed without error. Likewise, in the generalization test the learned values were fixed, and the model demonstrated error-free performance on new sentences for all ten grammatical forms that had not been used during the training.</Paragraph> <Paragraph position="10"> This set of experiments in ideal conditions demonstrates a proof of concept for the system, though several open questions can be posed based on these results. First, while the demonstration with 10 grammatical constructions is interesting, we can ask if the model will generalize to an extended set of constructions. Second, we know that the English language is quite restricted with respect to its word order, and thus we can ask whether the theoretical framework of the model will generalize to free word order languages such as Japanese. These questions are addressed in the following three sections.</Paragraph> </Section> <Section position="2" start_page="37" end_page="38" type="sub_section"> <SectionTitle> 3.2 Generalization to Extended Construction Set </SectionTitle> <Paragraph position="0"> As illustrated above the model can accommodate 10 distinct form-meaning mappings or grammatical constructions, including constructions involving &quot;dual&quot; events in the meaning representation that correspond to relative clauses.</Paragraph> <Paragraph position="1"> Still, this is a relatively limited size for the construction inventory. The current experiment demonstrates how the model generalizes to a number of new and different relative phrases, as well as additional sentence types including: conjoined (John took the key and opened the door), reflexive (The boy said that the dog was chased by the cat), and reflexive pronoun (The block said that it pushed the cylinder) sentence types, for a total of 38 distinct abstract grammatical constructions. The consideration of these sentence types requires us to address how their meanings are represented.</Paragraph> <Paragraph position="2"> Conjoined sentences are represented by the two corresponding events, e.g. took(John, key), open(John, door) for the conjoined example above.</Paragraph> <Paragraph position="3"> Reflexives are represented, for example, as said(boy), chased(cat, dog). This assumes indeed, for reflexive verbs (e.g. said, saw), that the meaning representation includes the second event as an argument to the first. Finally, for the reflexive pronoun types, in the meaning representation the pronoun's referent is explicit, as in said(block), push(block, cylinder) for &quot;The block said that it pushed the cylinder.&quot; For this testing, the ConstructionInventory is implemented as a lookup table in which the ConstructionIndex is paired with the corresponding SentenceToScene mapping during a single learning trial. Based on the tenets of the construction grammar framework (Goldberg 1995), if a sentence is encountered that has a form (i.e.</Paragraph> <Paragraph position="4"> ConstructionIndex) that does not have a corresponding entry in the ConstructionInventory, then a new construction is defined. Thus, one exposure to a sentence of a new construction type allows the model to generalize to any new sentence of that type. In this sense, developing the capacity to handle a simple initial set of constructions leads to a highly extensible system. Using the training procedures as described above, with a pre-learned lexicon (WordToReferent), the model successfully learned all of the constructions, and demonstrated generalization to new sentences that it was not trained on.</Paragraph> <Paragraph position="5"> That the model can accommodate these 38 different grammatical constructions with no modifications indicates its capability to generalize. This translates to a (partial) validation of the hypothesis that across languages, thematic role assignment is encoded by a limited set of parameters including word order and grammatical marking, and that distinct grammatical constructions will have distinct and identifying ensembles of these parameters. However, these results have been obtained with English that is a relatively fixed word-order language, and a more rigorous test of this hypothesis would involve testing with a free word-order language such as Japanese.</Paragraph> </Section> <Section position="3" start_page="38" end_page="38" type="sub_section"> <SectionTitle> 3.3 Generalization to Japanese </SectionTitle> <Paragraph position="0"> The current experiment will test the model with sentences in Japanese. Unlike English, Japanese allows extensive liberty in the ordering of words, with grammatical roles explicitly marked by postpositional function words -ga, -ni, -wo, -yotte.</Paragraph> <Paragraph position="1"> This word-order flexibility of Japanese with respect to English is illustrated here with the English active and passive di-transitive forms that each can be expressed in 4 different common manners in Japanese:</Paragraph> </Section> <Section position="4" start_page="38" end_page="38" type="sub_section"> <SectionTitle> 2.1 Circle-ga block-ni-yotte triangle-ni watasareta. 2.2 Block-ni-yotte circle-ga triangle-ni watasareta . 2.3 Block-ni-yotte triangle-ni circle-ga watasareta . 2.4 Triangle-ni circle-ga block-ni-yotte watasareta </SectionTitle> <Paragraph position="0"> .</Paragraph> <Paragraph position="1"> In the active Japanese sentences, the postpositional function words -ga, -ni and wo explicitly mark agent, recipient and, object whereas in the passive, these are marked respectively by ni-yotte, -ga, and ni. For both the active and passive forms, there are four different legal word-order permutations that preserve and rely on this marking. Japanese thus provides an interesting test of the model s ability to accommodate such freedom in word order.</Paragraph> <Paragraph position="2"> Employing the same method as described in the previous experiment, we thus expose the model to <sentence, meaning> pairs generated from 26 Japanese constructions that employ the equivalent of active, passive, relative forms and their permutations. We predicted that by processing the -ga, -ni, -yotte and wo markers as closed class elements, the model would be able to discriminate and identify the distinct grammatical constructions and learn the corresponding mappings. Indeed, the model successfully discriminates between all of the construction types based on the ConstructionIndex unique to each construction type, and associates the correct SentenceToScene mapping with each of them. As for the English constructions, once learned, a given construction could generalize to new untrained sentences.</Paragraph> <Paragraph position="3"> This demonstration with Japanese is an important validation that at least for this subset of constructions, the construction-based model is applicable both to fixed word order languages such as English, as well as free word order languages such as Japanese. This also provides further validation for the proposal of Bates and MacWhinney (et al. 1982) that thematic roles are indicated by a constellation of cues including grammatical markers and word order.</Paragraph> </Section> <Section position="5" start_page="38" end_page="39" type="sub_section"> <SectionTitle> 3.4 Effects of Noise </SectionTitle> <Paragraph position="0"> The model relies on lexical categorization of open vs. closed class words both for learning lexical semantics, and for building the ConstructionIndex for phrasal semantics. While we can cite strong evidence that this capability is expressed early in development (Shi et al. 1999) it is still likely that there will be errors in lexical categorization. The performance of the model for learning lexical and phrasal semantics for active transitive and ditransitive structures is thus examined under different conditions of lexical categorization errors. A lexical categorization error consists of a given word being assigned to the wrong category and processed as such (e.g. an open class word being processed as a closed class word, or vice-versa). Figure 2 illustrates the performance of the model with random errors of this type introduced at levels of 0 to 20 percent categorization of an open-class word as a closed-class word or viceversa) on performance (Scene Interpretation Errors) over Training Epochs. The 0% trace indicates performance in the absences of noise, with a rapid elimination of errors . The successive introduction of categorization errors yields a corresponding progressive impairment in learning. While sensitive to the errors, the system demonstrates a desired graceful degradation We can observe that there is a graceful degradation, with interpretation errors progressively increasing as categorization errors rise to 20 percent. In order to further asses the learning that was able to occur in the presence of noise, after training with noise, we then tested performance on noise-free input. The interpretation error values in these conditions were 0.0, 0.4, 2.3, 20.7 and 33.6 out of a maximum of 44 for training with 0, 5, 10, 15 and 20 percent lexical categorization errors, respectively. This indicates that up to 10 percent input lexical categorization errors allows almost error free learning. At 15 percent input errors the model has still significantly improved with respect to the random behavior (~45 interpretation errors per epoch).</Paragraph> <Paragraph position="1"> Other than reducing the lexical and phrasal learning rates, no efforts were made to optimize the performance for these degraded conditions, thus there remains a certain degree of freedom for improvement. The main point is that the model does not demonstrate a catastrophic failure in the presence of lexical categorization errors.</Paragraph> </Section> </Section> <Section position="4" start_page="39" end_page="39" type="metho"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> The research demonstrates an implementation of a model of sentence-to-meaning mapping in the developmental and neuropsychologically inspired construction grammar framework. The strength of the model is that with relatively simple innate learning mechanisms, it can acquire a variety of grammatical constructions in English and Japanese based on exposure to <sentence, meaning> pairs, with only the lexical categories of open vs. closed class being prespecified. This lexical categorization can be provided by frequency analysis, and/or acoustic properties specific to the two classes (Blanc et al. 2003; Shi et al. 1999). The model learns grammatical constructions, and generalizes in a systematic manner to new sentences within the class of learned constructions. This demonstrates the cross-linguistic validity of our implementation of the construction grammar approach (Goldberg 1995, Tomasello 2003) and of the cue competition model for coding of grammatical structure (Bates et al. 1982). The point of the Japanese study was to demonstrate this cross-linguistic validity i.e. that nothing extra was needed, just the identification of constructions based on lexical category information. Of course a better model for Japanese and Hungarian etc. that exploits the explicit marking of grammatical roles of NPs would have been interesting but it wouldn t have worked for English! The obvious weakness is that it does not go further. That is, it cannot accommodate new construction types without first being exposed to a training example of a well formed <sentence, meaning> pair. Interestingly, however, this appears to reflect a characteristic stage of human development, in which the infant relies on the use of constructions that she has previously heard (see Tomasello 2003). Further on in development, however, as pattern finding mechanisms operate on statistically relevant samples of this data, the child begins to recognize structural patterns, corresponding for example to noun phrases (rather than solitary nouns) in relative clauses. When this is achieved, these phrasal units can then be inserted into existing constructions, thus providing the basis for on the fly processing of novel relativised constructions. This suggests how the abstract construction model can be extended to a more generalized compositional capability. We are currently addressing this issue in an extension of the proposed model, in which recognition of linguistic markers (e.g. that , and directly successive NPs) are learned to signal embedded relative phrases (see Miikkulainen 1996).</Paragraph> <Paragraph position="1"> Future work will address the impact of ambiguous input. The classical example John saw the girl with the telescope implies that a given grammatical form can map onto multiple meaning structures. In order to avoid this violation of the one to one mapping, we must concede that form is influenced by context. Thus, the model will fail in the same way that humans do, and should be able to succeed in the same way that humans do. That is, when context is available to disambiguate then ambiguity can be resolved. This will require maintenance of the recent discourse context, and the influence of this on grammatical construction selection to reduce ambiguity.</Paragraph> </Section> class="xml-element"></Paper>