XML Viewer - w05-0506

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0506_metho.xml
Size: 16,900 bytes
Last Modified: 2025-10-06 14:09:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0506">
  <Title>A Second Language Acquisition Model Using Example Generalization and Concept Categories</Title>
  <Section position="4" start_page="45" end_page="45" type="metho">
    <SectionTitle>
3 Previous Work
</SectionTitle>
    <Paragraph position="0"> There is almost no previous CL work explicitly addressing SLA. The only one of which we are aware is [Maritxalar97], which represents interlanguage levels using manually defined symbolic rules. No language model (in the CL sense) or automatic learning are provided.</Paragraph>
    <Paragraph position="1"> Many aspects of SLA are similar to first language acquisition. Unsupervised grammar induction from corpora is a growing CL research area ([Clark01, Klein05] and references there), mostly using statistical learning of model parameters or pattern identification by distributional criteria. The resulting models are not easily presentable to humans, and do not utilize semantics.</Paragraph>
    <Paragraph position="2"> [Edelman04] presents an elegant FLA system in which constructions and word categories are identified iteratively using a graph. [Chang04] presents an FLA system that truly supports construction grammar and is unique in its incorporation of general cognitive concepts and embodied semantics.</Paragraph>
    <Paragraph position="3"> SLA is related to machine translation (MT), since learning how to translate is a kind of acquisition of the L2. Most relevant to us here is modern example-based machine translation (EBMT) [Somers01, Carl03], due to its explicit computation of translation templates and to the naturalness of learning from a small number of examples [Brown00, Cicekli01].</Paragraph>
    <Paragraph position="4"> The Computer Assisted Language Learning (CALL) literature [Levy97, Chapelle01] is rich in project descriptions, and there are several commercial CALL software applications. In general, CALL applications focus on teacher, environment, memory and automatization aspects, and are thus complementary to the goals that we address here.</Paragraph>
  </Section>
  <Section position="5" start_page="45" end_page="47" type="metho">
    <SectionTitle>
4 Input, Learner and Language Knowl-
</SectionTitle>
    <Paragraph position="0"> edge Models Our ultimate goal is a comprehensive computational model of SLA that covers all aspects of the phenomenon. The present paper is a first step in that direction. Our goals here are to: Explore what can be learned from example-based, small, beginner-level input corpora tailored for SLA; Model a learner having a mature conceptual system; Use an L2 language knowledge model that supports sentence enumeration; Identify cognitively plausible and effective SL learning algorithms; Apply the model in assisting the authoring of corpora tailored for SLA.</Paragraph>
    <Paragraph position="1"> In this section we present the first three components; the learning algorithms and the application are presented in the next two sections.</Paragraph>
    <Section position="1" start_page="45" end_page="46" type="sub_section">
      <SectionTitle>
4.1 Input Model
</SectionTitle>
      <Paragraph position="0"> The input potentially available for SL learners is of high variability, consisting of meta-linguistic rules, usage examples isolated for learning purposes, usage examples partially or fully understood in context, dictionary-like word definitions, free-form explanations, and more.</Paragraph>
      <Paragraph position="1">  One of our major goals is to explore the relationship between first and second language acquisition. Methodologically, it therefore makes sense to first study input that is the most similar linguistically to that available during FLA, usage examples. As noted in section 2, a fundamental property of SLA is that learners are capable of mature understanding. Input in our model will thus consist of an ordered set of comprehensible usage examples, where an example is a pair of L1, L2 sentences such that the former is a translation of the latter in a certain understood context.</Paragraph>
      <Paragraph position="2"> We focus here on modeling beginner-level proficiency, which is qualitatively different from native-like fluency [Gass01] and should be studied before the latter.</Paragraph>
      <Paragraph position="3"> We are interested in relatively small input corpora (thousands of examples at most), because this is an essential part of SLA modeling. In addition, it is of great importance, in both theoretical and computational linguistics, to explore the limits of what can be learned from meager input.</Paragraph>
      <Paragraph position="4"> One of the main goals of SLA modeling is to discover which input is most effective for SLA, because a substantial part of learners' input can be controlled, while their time capacity is small. We thus allow our input to be optimized for SLA, by containing examples that are sub-parts of other examples and whose sole purpose is to facilitate learning those (our corpus is also optimized in the sense of covering simpler constructs and words first, but this issue is orthogonal to our model). We utilize two types of such sub-examples. First, we require that new words are always presented first on their own. This is easy to achieve in controlled teaching, and is actually very frequent in FLA as well [Clark03]. In the present paper we will assume that this completely solves the task of segmenting a sentence into words, which is reasonable for a beginner level corpus where the total number of words is relatively small. Word boundaries are thus explicitly and consistently marked.</Paragraph>
      <Paragraph position="5"> Second, the sub-example mechanism is also useful when learning a construction. For example, if the L2 sentence is 'the boy went to school' (where the L2 here is English), it could help learning algorithms if it were preceded by 'to school' or 'the boy'. Hence we do not require examples to be complete sentences.</Paragraph>
      <Paragraph position="6"> In this paper we do not deal with phonetics or writing systems, assuming L2 speech has been consistently transcribed using a quasi-phonetic writing system. Learning L2 phonemes is certainly an important task in SLA, but most linguistic and cognitive theories view it as separable from the rest of language acquisition [Fromkin02, Medin05].</Paragraph>
      <Paragraph position="7"> The input corpus we have used is a transcribed Pimsleur Japanese course, which fits the input specification above.</Paragraph>
    </Section>
    <Section position="2" start_page="46" end_page="47" type="sub_section">
      <SectionTitle>
4.2 Learner Model
</SectionTitle>
      <Paragraph position="0"> A major aspect of SLA is that learners already possess a mature conceptual system (CS), influenced by their life experience (including languages they know). Our learning algorithms utilize a CS model.</Paragraph>
      <Paragraph position="1"> We opted for being conservative: the model is only allowed to contain concepts that are clearly possessed by the learner before learning starts. Concepts that are particular to the L2 (e.g., 'noun gender' for English speakers learning Spanish) are not allowed. Examples for concept classes include fruits, colors, human-made objects, physical activities and emotions, as well as meta-linguistic concepts such as pronouns and prepositions. A single concept is simply represented by a prototypical English word denoting it (e.g., 'child', 'school'). A concept class is represented by the concepts it contains and is conveniently named using an English word or phrase (e.g., 'types of people', 'buildings', 'language names').</Paragraph>
      <Paragraph position="2"> Our learners can explicitly reason about concept inter-relationships. Is-a relationships between classes are represented when they are beyond any doubt (e.g., 'buildings' and 'people' are both 'physical things').</Paragraph>
      <Paragraph position="3"> A basic conceptual system is assumed to exist before the SLA process starts. When the input is controlled and small, as in our case, it is both methodologically valid and practical to prepare the CS manually. CS design is discussed in detail in section 6.</Paragraph>
      <Paragraph position="4"> In the model described in the present paper we do not automatically modify the CS during the learning process; CS evolution will be addressed in future models.</Paragraph>
      <Paragraph position="5"> As stated in section 1, in this paper we focus on linguistic SLA aspects and do not address issues such as human errors, motivation and attention.</Paragraph>
      <Paragraph position="6"> We thus assume that our learner possesses perfect memory and can invoke our learning algorithms without any mistakes.</Paragraph>
    </Section>
    <Section position="3" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
4.3 Language Knowledge Model
</SectionTitle>
      <Paragraph position="0"> We require our model to support a basic capability of a grammar: enumeration of language sentences (parsing will be reported in other papers). In addition, we provide a degree of certainty for each. The model's quality is evaluated by its applicability for learning corpora authoring assistance (section 6).</Paragraph>
      <Paragraph position="1"> The representation is based on construction grammar (CG), explicitly storing a set of constructions and their inter-relationships. CG is ideally suited for SLA interlanguage because it enables the representation of partial knowledge: every language form, from concrete words and sentences to the most abstract constructs, counts as a construction. The generative capacity of language is obtained by allowing constructions to replace arguments. For example, (child), (the child goes to school), (&lt;x&gt; goes to school), (&lt;x&gt; &lt;v&gt; to school) and (X goes Z) are all constructions, where &lt;x&gt;, &lt;v&gt; denote word classes and X, Z denote other constructions.</Paragraph>
      <Paragraph position="2"> SL learners can make explicit judgments as to their level of confidence in the grammaticality of utterances. To model this, our learning algorithms assign a degree of certainty (DOC) to each construction and to the possibility of it being an argument of another construction. The certainty of a sentence is a function (e.g., sum or maximum) of the DOCs present in its derivation path.</Paragraph>
      <Paragraph position="3"> Our representation is equivalent to a graph whose nodes are constructions and whose directed, labeled arcs denote the possibility of a node filling a particular argument of another node. When the graph is a-cyclic the resulting language contains a finite number of concrete sentences, easily computed by graph traversal. This is similar to [Edelman04]; we differ in our partial support for semantics through a conceptual system (section 5) and in the notion of a degree of certainty.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="47" end_page="48" type="metho">
    <SectionTitle>
5 Learning Algorithms
</SectionTitle>
    <Paragraph position="0"> Our general SLA scheme is that of incremental learning - examples are given one by one, each causing an update to the model. A major goal of our model is to identify effective, cognitively plausible learning algorithms. In this section we present a concrete set of such algorithms.</Paragraph>
    <Paragraph position="1"> Structured categorization is a major driving force in perception and other cognitive processes [Medin05]. Our learners are thus driven by the desire to form useful generalizations over the input. A generalization of two or more examples is possible when there is sufficient similarity of form and meaning between them. Hence, the basic ingredient of our learning algorithms is identifying such similarities.</Paragraph>
    <Paragraph position="2"> To identify concrete effective learning algorithms, we have followed our own inference processes when learning a foreign language from an example-based corpus (section 6). The set of algorithms described below are the result of this study. The basic form similarity algorithm is Single Word Difference (SWD). When two examples share all but a single word, a construction is formed in which that word is replaced by an argument class containing those words. For example, given 'eigo ga wakari mas' and 'nihongo ga wakari mas', the construction (&lt;eigo, nihongo&gt; ga wakari mas) ('I understand English/Japanese'), containing one argument class, is created. In itself, SWD only compresses the input, so its degree of certainty is maximal. It does not create new sentences, but it organizes knowledge in a form suitable for generalization. null The basic meaning-based similarity algorithm is Extension by Conceptual Categories (ECC). For an argument class W in a construction C, ECC attempts to find the smallest concept category U' that contains W', the set of concepts corresponding to the words in W. If no such U' exists, C is removed from the model. If U' was found, W is replaced by U, which contains the L2 words corresponding to the concepts in U'. When the replacement occurs, it is possible that not all such words have already been taught; when a new word is taught, we add it to all such classes U (easily implemented using the new word's translation, which is given when it is introduced.) In the above example, the words in W are 'eigo' and 'nihongo', with corresponding concepts 'English' and 'Japanese'. Both are contained in W', the 'language names' category, so in this case U' equals W'. The language names category contains concepts for many other language names, including Korean, so it suffices to teach our learner the Japanese word for Korean ('kankokugo') at some point in the future in order to update the construction to be (&lt;eigo, nihongo, kankokugo&gt; ga wakari mas). This creates a new sentence 'kankokugo ga wakari mas' meaning 'I understand Korean'. An  example in which U' does not equal W' is given in Table 1 by 'child' and 'car'.</Paragraph>
    <Paragraph position="3"> L2 words might be ambiguous - several concepts might correspond to a single word. Because example semantics are not explicitly represented, our system has no way of knowing which concept is the correct one for a given construction, so it considers all possibilities. For example, the Japanese 'ni' means both 'two' and 'at/in', so when attempting to generalize a construction in which 'ni' appears in an argument class, ECC would consider both the 'numbers' and 'prepositions' concepts. null The degree of certainty assigned to the new construction by ECC is a function of the quality of the match between W and U'. The more abstract is U, the lower the certainty.</Paragraph>
    <Paragraph position="4"> The main form-based induction algorithm is Shared Prefix, Generated Suffix (SPGS). Given an example 'x y' (x, y are word sequences), if there exist (1) an example of the form 'x z', (2) an example 'x', and (3) a construction K that derives 'z' or 'y', we create the construction (x K) having a degree of certainty lower than that of K. A Shared Suffix version can be defined similarly. Requirement (2) ensures that the cut after the prefix will not be arbitrary, and assumes that the lesson author presents constituents as partial examples beforehand (as indeed is the case in our corpus).</Paragraph>
    <Paragraph position="5"> SPGS utilizes the learner's current generative capacity. Assume input 'watashi wa biru o nomi mas' ('I drink beer'), previous inputs 'watashi wa america jin des' ('I am American'), 'watashi wa' ('as to me...') and an existing construction K = (&lt;biru, wain&gt; o nomi mas). SPGS would create the construction (watashi wa K), yielding the new sentence 'watashi wa wain o nomi mas' ('I drink wine').</Paragraph>
    <Paragraph position="6"> To enable faster learning of more abstract constructions, we use generalized versions of SWD and SPGS, which allow the differing or shared elements to be a construction rather than a word or a word sequence.</Paragraph>
    <Paragraph position="7"> The combined learning algorithm is: given a new example, iteratively invoke each of the above algorithms at the given order until nothing new can be learned. Our system is thus a kind of inductive programming system (see [Thompson99] for a system using inductive logic programming for semantic parsing).</Paragraph>
    <Paragraph position="8"> Note that the above algorithms treat words as atomic units, so they can only learn morphological rules if boundaries between morphemes are marked in the corpus. They are thus more useful for languages such as Japanese than, say, for Romance or Semitic languages.</Paragraph>
    <Paragraph position="9"> Our algorithms have been motivated by general cognitive considerations. It is possible to refine them even further, e.g. by assigning a higher certainty when the focus element is a prefix or a suffix, which are more conspicuous cognitively.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML