File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1303_metho.xml
Size: 26,730 bytes
Last Modified: 2025-10-06 14:09:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1303"> <Title>Putting Meaning into Grammar Learning</Title> <Section position="3" start_page="18" end_page="20" type="metho"> <SectionTitle> 2 Overview of the learning problem </SectionTitle> <Paragraph position="0"> We begin with an informal description of our learning task, to be formalized below. At all stages of language learning, children are assumed to exploit general cognitive abilities to make sense of the flow of objects and events they experience. To make sense of linguistic events -- sounds and gestures used in their environments for communicative purposes -- they also draw on specifically linguistic knowledge of how forms map to meanings, i.e., constructions. Comprehension consists of two stages: identifying the constructions involved and how their meanings are related (analysis), and matching these constructionally sanctioned meanings to the actual participants and relations present in context (resolution). The set of linguistic constructions will typically provide only a partial analysis of the utterance in the given context; when this happens, the agent may still draw on general inference to match even a partial analysis to the context. The goal of construction learning is to acquire a useful set of constructions, or grammar. This grammar should allow constructional analysis to produce increasingly complete interpretations of utterances in context, thus requiring minimal recourse to general resolution and inference procedures. In the limit the grammar should stabilize, while still being useful for comprehending novel input. A useful grammar should also reflect the statistical properties of the input data, in that more frequent or specific constructions should be learned before more infrequent and more general constructions.</Paragraph> <Paragraph position="1"> Formally, we define our learning task as follows: Given an initial grammar a0 and a sequence of training examples consisting of an utterance paired with its context, find the best grammar a0a2a1 to fit seen data and generalize to new data. The remainder of this section describes the hypothesis space, prior knowledge and input data relevant to the task.</Paragraph> <Section position="1" start_page="18" end_page="19" type="sub_section"> <SectionTitle> 2.1 Hypothesis space: embodied constructions </SectionTitle> <Paragraph position="0"> The space of possible grammars (or sets of constructions) is defined by Embodied Construction Grammar (ECG), a computationally explicit unification-based formalism for capturing insights from the construction grammar and cognitive linguistics literature (Bergen and Chang, in press; Chang et al., 2002). ECG is designed to support the analysis process mentioned above, which determines what constructions and schematic meanings are present in an utterance, resulting in a semantic specification (or semspec).1 1ECG is intended to support a simulation-based model of language understanding, with the semspec parameterizing a We highlight a few relevant aspects of the formalism, exemplified in Figure 1. Each construction has sections labeled form and meaning listing the entities (or roles) and constraints (type constraints marked with :, filler constraints marked with a3a5a4 , and identification (or coindexation) constraints marked with a3a7a6 ) of the respective domains. These two sections, also called the form and meaning poles, capture the basic intuition that constructions are form-meaning pairs. A subscripted a8 or a9 allows reference to the form or meaning pole of any construction, and the keyword self allows self-reference. Thus, the a10a12a11a14a13a16a15a18a17 construction simply links a form whose orthography role (or feature) is bound to the string &quot;throw&quot; to a meaning that is constrained to be of type Throw, a separately defined conceptual schema corresponding to throwing events (including roles for a thrower and throwee). (Although not shown in the examples, the formalism also includes a subcase of notation for sentation of the lexical a10a12a11a14a13a16a15a18a17 and lexically specific a10a12a11a14a13a16a15a18a17a64a63a65a10a12a13a16a66a14a67a21a68a51a69a71a70a18a69a73a72a58a74 construction (licensing expressions like You throw the ball).</Paragraph> <Paragraph position="1"> Multi-unit constructions such as the a10a12a11a14a13a16a15a18a17a64a63 a10a12a13a16a66a14a67a21a68a51a69a71a70a18a69a73a72a58a74 construction also list their constituents, each of which is itself a form-meaning construction. These multi-unit constructions serve as the target representation for the specific learning task at hand. The key representational insight here is that the form and meaning constraints typisimulation using active representations (or embodied schemas) to produce context-sensitive inferences. See Bergen and Chang (in press) for details.</Paragraph> <Paragraph position="2"> cally involve relations among the form and meaning poles of the constructional constituents. For current purposes we limit the potential form relations to word order, although many other form relations are in principle allowed. In the meaning domain, the primary relation is identification, or unification, between two meaning entities. In particular, we will focus on role-filler bindings, in which a role of one constituent is identified with another constituent or with one of its roles. The example construction pairs two word order constraints over its constituents' form poles with two identification constraints over its constituents' meaning poles (these specify the fillers of the thrower and throwee roles of a Throw event, respectively).</Paragraph> <Paragraph position="3"> Note that both lexical constructions and the multi-unit constructions needed to express grammatical patterns can be seen as graphs of varying complexity. Each domain (form or meaning) can be represented as a subgraph of elements and relations among them. Lexical constructions involve a simple mapping between these two subgraphs, whereas complex constructions with constituents require structured relational mappings over the two domains, that is, mappings between form and meaning relations whose arguments are themselves linked by known constructions.</Paragraph> </Section> <Section position="2" start_page="19" end_page="20" type="sub_section"> <SectionTitle> 2.2 Prior knowledge </SectionTitle> <Paragraph position="0"> The model makes a number of assumptions based on the child language literature about prior knowledge brought to the task, including conceptual knowledge, lexical knowledge and the language comprehension process described earlier. Figure 2 depicts how these are related in a simple example; each is described in more detail below.</Paragraph> <Paragraph position="1"> ball, with form elements on the left, meaning elements (conceptual schemas) on the right and constructions linking the two domains in the center. Conceptual knowledge is represented using an ontology of typed feature structures, or schemas. These include schemas for people, objects (e.g. Ball in the figure), locations, and actions familiar to children by the time they enter the two-word stage (typically toward the end of the second year). Actions like the Throw schema referred to in the example a0a2a1a4a3a6a5a8a7 construction and in the figure have roles whose fillers are subject to type constraints, reflecting children's knowledge of what kinds of entities can take place in different events.</Paragraph> <Paragraph position="2"> The input to learning includes a set of lexical constructions, represented using the ECG formalism, linking simple forms (i.e. words) to specific conceptual items. Examples of these include the a9 and a10a12a11a4a13a14a13 constructions in the figure, as well as the a0a2a1a4a3a6a5a8a7 construction formally defined in Figure 1. Lexical learning is not the focus of the current work, but a number of previous computational approaches have shown how simple mappings may be acquired from experience (Regier, 1996; Bailey, 1997; Roy and Pentland, 1998).</Paragraph> <Paragraph position="3"> As mentioned earlier, the ECG construction formalism is designed to support processes of language use. In particular, the model makes use of a construction analyzer that identifies the constructions responsible for a given utterance, much like a syntactic parser in a traditional language understanding system identifies which parse rules are responsible. In this case, however, the basic representational unit is a form-meaning pair. The analyzer must therefore also supply a semantic interpretation, called the semspec, indicating which conceptual schemas are involved and how they are related. The analyzer is also required to be robust to input that is not covered by its current grammar, since that situation is the norm during language learning.</Paragraph> <Paragraph position="4"> Bryant (2003) describes an implemented construction analyzer program that meets these needs. The construction analyzer takes as input a set of ECG constructions (linguistic knowledge), a set of ECG schemas (conceptual knowledge) and an utterance. The analyzer draws on partial parsing techniques previously applied to syntactic parsing (Abney, 1996): utterances not covered by known constructions yield partially filled semspecs, and unknown forms in the input are skipped. As a result, even a small set of simple constructions can provide skeletal interpretations of complex utterances.</Paragraph> <Paragraph position="5"> Figure 2 gives an iconic representation of the result of analyzing the utterance I throw the ball using the a0a2a1a4a3a6a5a8a7a16a15a17a0a2a3 a11a4a18a20a19a22a21a24a23a8a21a26a25a28a27 and a0a2a1a4a3a6a5a8a7 constructions shown earlier, along with some additional lexical constructions (not shown). The analyzer matches each input form with its lexical construction (if available) and corresponding meaning, and then matches the clausal construction by checking the relevant word order relations (implicitly represented by the dotted arrow in the figure) and role bindings (denoted by the double-headed arrows within the meaning domain) asserted on its candidate constituents. Note that at the stage shown, no construction for the has yet been learned, resulting in a partial analysis. At an even earlier stage of learning, before the a0a2a1a4a3a6a5a8a7a10a9a11a0a2a3a6a12a4a13a15a14a17a16a19a18a8a16a21a20a23a22 construction is learned, the lexical constructions are matched without resulting in the role-filler bindings on the Throw action schema.</Paragraph> <Paragraph position="6"> Finally, note that the semspec produced by constructional analysis (right-hand side of the figure) must be matched to the current situational context using a contextual interpretation, or resolution, process. Like other resolution (e.g. reference resolution) procedures, this process relies on category/type constraints and (provisional) identification bindings. The resolution procedure attempts to unify each schema and constraint appearing in the semspec with a type-compatible entity or relation in the context. In the example, the schemas on the right-hand side of the figure should be identified during resolution with particular schema instances available in context (e.g., the Speaker schema should be linked to the specific contextually available discourse speaker, the Ball schema to a particular ball instance, etc.).</Paragraph> </Section> <Section position="3" start_page="20" end_page="20" type="sub_section"> <SectionTitle> 2.3 Input data </SectionTitle> <Paragraph position="0"> The input is characterized as a set of input tokens, each consisting of an utterance form (a string of known and novel word-forms) paired with a specific communicative context (a set of linked conceptual schemas corresponding to the participants, salient scene and discourse information available in the situation). The learning model receives only positive examples, as in the child learning case. Note, however, that the interpretation a given utterance has in context depends on the current state of linguistic knowledge. Thus the same utterance at different stages may lead to different learning behavior.</Paragraph> <Paragraph position="1"> The specific training corpus used in learning experiments is a subset of the Sachs corpus of the CHILDES database of parent-child transcripts(Sachs, 1983; MacWhinney, 1991), with additional annotations made by developmental psychologists as part of a study of motion utterances (Dan I. Slobin, p.c.). These annotations indicate semantic and pragmatic features available in the scene. A simple feature structure representation of a sample input token is shown here; boxed numbers indicate that the relevant entities are identified:</Paragraph> <Paragraph position="3"> Many details have been omitted, and a number of simplifying assumptions have been made. But the rough outline given here nevertheless captures the core computational problem faced by the child learner in acquiring multi-word constructions in a framework putting meaning on par with form.</Paragraph> </Section> </Section> <Section position="4" start_page="20" end_page="23" type="metho"> <SectionTitle> 3 Learning algorithms </SectionTitle> <Paragraph position="0"> We model the learning task as a search through the space of possible grammars, with new constructions incrementally added based on encountered data. As in the child learning situation, the goal of learning is to converge on an optimal set of constructions, i.e., a grammar that is both general enough to encompass significant novel data and specific enough to accurately predict previously seen data.</Paragraph> <Paragraph position="1"> A suitable overarching computational framework for guiding the search is provided by the minimum description length (MDL) heuristic (Rissanen, 1978), which is used to find the optimal analysis of data in terms of (a) a compact representation of the data (i.e., a grammar); and (b) a compact means of describing the original data in terms of the compressed representation (i.e., constructional analyses using the grammar). The MDL heuristic exploits a tradeoff between competing preferences for smaller grammars (encouraging generalization) and for simpler analyses of the data (encouraging the retention of specific/frequent constructions).</Paragraph> <Paragraph position="2"> The rest of this section makes the learning framework concrete. Section 3.1 describes several heuristics for moving through the space of grammars (i.e., how to update a grammar with new constructions based on input data), and Section 3.2 describes how to chose among these candidate moves to find optimal points in the search space (i.e., specific MDL criteria for evaluating new grammars). These specifications extend previous methods to accommodate the relational structures of the ECG formalism and the process-based assumptions of the model.</Paragraph> <Section position="1" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 3.1 Updating the grammar </SectionTitle> <Paragraph position="0"> The grammar may be updated in three ways: hypothesis forming new structured maps to account for mappings present in the input but unexplained by the current grammar; reorganization exploiting regularities in the set of known constructions (merge two similar constructions into a more general construction, or compose two constructions that cooccur into a larger construction); and reinforcement incrementing the weight associated with constructions that are successfully used during comprehension.</Paragraph> <Paragraph position="1"> Hypothesis. The first operation addresses the core computational challenge of learning new structured maps. The key idea here is that the learner is assumed to have access to a partial analysis based on linguistic knowledge, as well as a fuller situation interpretation it can infer from context. Any difference between the two can directly prompt the formation of new constructions that will improve the agent's ability to handle subsequent instances of similar utterances in similar contexts. In particular, certain form and meaning relations that are unmatched by the analysis but present in context may be mapped using the procedure in Figure 3.</Paragraph> <Paragraph position="2"> Hypothesize construction. Given utterance a0 in situational context a1 and current grammar a2 : 1. Call the construction analysis/resolution processes on (a0 ,a1 ,a2 ) to produce a semspec consisting of form and meaning graphs a3 and a4 .</Paragraph> <Paragraph position="3"> Nodes and edges of a3 and a4 are marked as matched or unmatched by the analysis.</Paragraph> <Paragraph position="4"> 2. Find rela5 (Aa5 ,Ba5 ), an unmatched edge in a3 corresponding to an unused form relation over the matched form poles of two constructs A and B.</Paragraph> <Paragraph position="5"> 3. Find rela6 (Aa6 , Ba6 ), an unmatched edge (or subgraph) in a4 corresponding to an unused meaning relation (or set of bindings) over the corresponding matched meaning poles Aa6 and The algorithm creates new constructions mapping form and meaning relations whose arguments are already constructionally mapped. It is best illustrated by example, based on the sample input token shown in Section 2.3 and depicted schematically in Figure 4. Given the utterance &quot;throw the ball&quot; and a grammar including constructions for throw and ball (but not the), the analyzer produces a semspec including a Ball schema and a Throw schema, without indicating any relations between them. The resolution process matches these schemas to the actual context, which includes a particular throwing event in which the addressee (Naomi) is the thrower of a particular ball. The resulting resolved analysis looks like Figure 4 but without the new construction (marked with dashed lines): the two lexical constructions are shown mapping to particular utterance forms and contextual items.</Paragraph> <Paragraph position="6"> the utterance throw ball. Heavy solid lines indicate structures matched during analysis; heavy dotted lines indicate the newly hypothesized mapping. Next, an unmatched form relation (the word order edge between throw and ball) is found, followed by a corresponding unmatched meaning relation (the binding between the Throw.throwee role and the specific Ball in context); these are shown in the figure using heavy dashed lines. Crucially, these relations meet the condition in step 3 that the relations be pseudo-isomorphic. This condition captures three common patterns of relational form-meaning mappings, i.e., ways in which a meaning relation rela8 over Aa8 and Ba8 can be correlated with a form relation rela9 over Aa9 and Ba9 (e.g., word order); these are illustrated in Figure 5, where we assume a simple form relation: This condition enforces structural similarity between the two relations while recognizing that constructions may involve relations that are not strictly isomorphic. (The example mapping shown in the figure is strictly isomorphic.) The resulting construction is shown formally in Figure 6.</Paragraph> <Paragraph position="7"> Reorganization. Besides hypothesizing constructions based on new data, the model also allows new constructions to be formed via constructional reorganization, essentially by applying general categorization principles to the current grammar, as described in Figure 7.</Paragraph> <Paragraph position="8"> For example, the a21a23a22a25a24a27a26a29a28a31a30a33a32a35a34a25a36a15a36 construction and a similar a21a23a22a25a24a27a26a29a28a31a30a33a32a35a36a37a26a2a38a40a39 construction can be merged into a general a21a23a22a25a24a27a26a29a28a31a30a16a41a43a42a29a44a46a45a29a38a29a47 construction; the resulting subcase constructions each retain the appropriate type constraint. Similarly, a general a48a50a49a25a51 a34a25a52a27a30a33a21a23a22a25a24a27a26a29a28 and a21a23a22a25a24a27a26a29a28a31a30a16a41a43a42a29a44a46a45a29a38a29a47 construction may occur in many analyses in which they compete for the a47a29a22a25a24a27a26a29a28 constituent. Since they have compatible constraints in both form and meaning (in the latter case based on the same conceptual Throw schema), repeated co-occurrence may lead to the formation of a larger construction that includes all Reorganize constructions. Reorganize a53 to consolidate similar and co-occurring constructions: a54 Merge: Pairs of constructions with significant shared structure (same number of constituents, minimal ontological distance (i.e., distance in the type ontology) between corresponding constituents, maximal overlap in constraints) may be merged into a new construction containing the shared structure; the original constructions are rewritten as subcases of the new construction along with the non-overlapping information. a54 Compose: Pairs of constructions that co-occur frequently with compatible constraints (are part of competing analyses using the same constituent, or appear in a constituency relationship) may be composed into one construction.</Paragraph> <Paragraph position="9"> three constituents.</Paragraph> <Paragraph position="10"> Reinforcement. Each construction is associated with a weight, which is incremented each time it is used in an analysis that is successfully matched to the context. A successful match covers a majority of the contextually available bindings.</Paragraph> <Paragraph position="11"> Both hypothesis and reorganization provide means of proposing new constructions; we now specify how proposed constructions are evaluated.</Paragraph> </Section> <Section position="2" start_page="21" end_page="23" type="sub_section"> <SectionTitle> 3.2 Evaluating grammar cost </SectionTitle> <Paragraph position="0"> The MDL criteria used in the model is based on the cost of the grammar a55 given the data a56 : where a65 and a70 are learning parameters that control the relative bias toward model simplicity and data compactness. The size(a55 ) is the sum over the size of each construction a80 in the grammar (a70 a75 is the number of constituents in a80 , a84 a75 is the number of constraints in a80 , and each element reference a86 in a80 has a length, measured as slot chain length). The cost (complexity) of the data a56 given a55 is the sum of the analysis scores of each input token a91 using a55 . This score sums over the constructions ative (in)frequency, a3typea4a5a3 denotes the number of ontology items of type a6 , summed over all the constituents in the analysis and discounted by parameter a7 . The score also includes terms for the height of the derivation graph and the semantic fit provided by the analyzer as a measure of semantic coherence.</Paragraph> <Paragraph position="1"> In sum, these criteria favor constructions that are simply described (relative to the available meaning representations and the current set of constructions), frequently useful in analysis, and specific to the data encountered. The MDL criteria thus approximate Bayesian learning, where the minimizing of cost corresponds to maximizing the posterior probability, the structural prior corresponds to the grammar size, and likelihood corresponds to the complexity of the data relative to the grammar.</Paragraph> </Section> </Section> <Section position="5" start_page="23" end_page="23" type="metho"> <SectionTitle> 4 Learning verb islands </SectionTitle> <Paragraph position="0"> The model was applied to the data set described in Section 2.3 to determine whether lexically specific multi-word constructions could be learned using the MDL learning framework described. This task represents an important first step toward general grammatical constructions, and is of cognitive interest, since item-based patterns appear to be learned on independent trajectories (i.e., each verb forms its own &quot;island&quot; of organization (Tomasello, 2003)). We give results for drop (a8 =10 examples), throw for three verb islands.</Paragraph> <Paragraph position="1"> Given the small corpus sizes, the focus for this experiment is not on the details of the statistical learning framework but instead on a qualitative evaluation of whether learned constructions improve the model's comprehension over time, and how verbs may differ in their learning trajectories.</Paragraph> <Paragraph position="2"> Qualitatively, the model first learned item-specific constructions as expected (e.g. throw bear, throw books, you throw), later in learning generalizing over different event participants (throw OBJECT, PERSON throw, etc.).</Paragraph> <Paragraph position="3"> A quantitative measure of comprehension over time, coverage, was defined as the percentage of total bindings a9 in the data accounted for at each learning step. This metric indicates how new constructions incrementally improve the model's comprehensive capacity, shown in Figure 8. The throw subset, for example, contains 45 bindings to the roles of the Throw schema (thrower, throwee, and goal location). At the start of learning, the model has no combinatorial constructions and can account for none of these. But the model gradually amasses constructions with greater coverage, and by the tenth input token, the model learns new constructions that account for the majority of the bindings in the data.</Paragraph> <Paragraph position="4"> The learning trajectories do appear distinct: throw constructions show a gradual build-up before plateauing, while fall has a more fitful climb converging at a higher coverage rate than throw. It is interesting to note that the throw subset has a much higher percentage of imperative utterances than fall (since throwing is pragmatically more likely to be done on command); the learning strategy used in the current model focuses on relational mappings and misses the association of an imperative speech-act with the lack of an expressed agent, providing a possible explanation for the different trajectories.</Paragraph> <Paragraph position="5"> While further experimentation with larger training sets is needed, the results indicate that the model is able to acquire useful item-based constructions like those learned by children from a small number examples. More importantly, the learned constructions permit a limited degree of generalization that allows for increasingly complete coverage (or comprehension) of new utterances, fulfilling the goal of the learning model. Differences in verb learning lend support to the verb island hypothesis and illustrate how the particular semantic, pragmatic and statistical properties of different verbs can affect their learning course.</Paragraph> </Section> class="xml-element"></Paper>