File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1711_metho.xml

Size: 24,906 bytes

Last Modified: 2025-10-06 14:09:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1711">
  <Title>CL for CALL in the Primary School</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 CL/NLP in e-Learning
2.1 CL/NLP - A Broad Distinction
</SectionTitle>
    <Paragraph position="0"> To a first approximation CL/NLP technologies split into two broad categories - A and B. Category A (sometimes referred to as CL proper) typically includes small coverage, proof of concept, often hand-crafted, knowledge- or rule-based systems.</Paragraph>
    <Paragraph position="1"> They are usually used to test a particular linguistic theory, tend to be of limited coverage and are often quite brittle. Example technologies include DCGs and many (but not all) formal grammar-based parsing and generation systems.</Paragraph>
    <Paragraph position="2"> Category B (sometimes referred to as NLP) typically includes broad coverage systems where the lingware is often (but not always - see e.g.</Paragraph>
    <Paragraph position="3"> FST) automatically induced and processed using statistical approaches. They are usually large scale engineering applications and very robust. Example technologies include speech processing, HMM taggers, probabilistic parsing and FST.</Paragraph>
    <Paragraph position="4"> This distinction is, of course, nothing more than a useful over-generalisation with an entire and interesting grey area existing between the two extremes. Khader et al. (2004), for example, show how a wide-coverage, robust rule-based system is used in CALL. In this paper we look at the suitability of type A and B CL/NLP technologies for primary school education, in the context of Ireland in particular.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 e-Learning
</SectionTitle>
      <Paragraph position="0"> CL is generally not to the fore in e-Learning, although it does have a potentially powerful role to play. It can help to enhance the accessibility of online teaching material (particularly when the material is not in the learner's L1), in analysing learner input and the automatic generation of simple feedback. It can also be used with Computer-Mediated Communication (CMC) environments. However, to date, the use of CL/NLP in e-Learning in general has not been a main stream focus of either the Computational Linguistics or the e-Learning community nor has there been much CL/NLP technology transfer into commercially available and deployed systems.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 CALL
</SectionTitle>
      <Paragraph position="0"> Within the domain of e-Learning, the area with the greatest fit and potential deployment of CL/NLP resources is that of Computer-Assisted Language Learning (CALL). This paper focuses on asynchronous e-Learning for natural languages in the primary school context. CL/NLP resources lend themselves naturally to the domain of language learning, given that the &amp;quot;raw material&amp;quot; in both fields is language. However, attempts to successfully marry the two fields have been limited. Schulze (2003) outlines several reasons for this. Computational Linguists are specifically interested in the use of the computer in analysing, generating and processing language. They are interested in testing out linguistic theories and using the computer to confirm their hypotheses.</Paragraph>
      <Paragraph position="1"> Researchers in NLP tend to be interested in widecoverage, robust engineering approaches. For the most part, use of their tools for language learning/teaching applications is not high on their research agenda. A review of COLING papers in the last twenty years reveals that there are very few papers that specifically deal with the use of CL/NLP in language learning. Furthermore, as Schulze (2003) points out, within the unspoken hierarchy that exists in Computer Science departments throughout the world, working with CALL is considered less prestigious than say, working on cryptography. Thus, socio-cultural factors may have played a part in limiting the number of CL/NLP researchers interested in CALL.</Paragraph>
      <Paragraph position="2"> From a CALL researcher's or practitioner's point of view, attempts to integrate CL/NLP resources into CALL have not been very successful. Many remain unconvinced about the benefits of using CL/NLP techniques in CALL and whether they can be integrated successfully or not. They sometimes expect an 'all-singing, alldancing' machine and are disappointed /disillusioned with the results of ICALL research, especially when they incorporate category A CL technologies. CALL practitioners generally come from a language teaching background and are often more interested in pedagogy than technology.</Paragraph>
      <Paragraph position="3"> Some feel that the technical knowledge required to integrate CL/NLP tools is beyond their scope.</Paragraph>
      <Paragraph position="4"> They may be wary of claims from CL/NLP developers that a certain CL/NLP resource will be &amp;quot;ideal&amp;quot; for CALL, especially if they have heard such claims before. Even if they are favourably disposed to the use of CL/NLP resources in CALL, it is often very difficult to reuse existing resources, as they demand that a certain (often non-standard) format be used for data (see Sections 4.2 and 5.2 below). Also, the interfaces to the resources may have assumed a techno-savvy or CL/NLP-savvy user, which mitigates against their (re)use.</Paragraph>
      <Paragraph position="5"> In summary, apart from notable exceptions (e.g.</Paragraph>
      <Paragraph position="6"> Glosser (Dokter &amp; Nerbonne, 1998) and FreeText (2001), for various technical and non-technical reasons, CL/NLP resources have not been extensively deployed in main-stream CALL applications.</Paragraph>
      <Paragraph position="7"> One of the problems in using CL/NLP resources in CALL materials is that the coverage achieved by the CL/NLP tools has to be broad to be able to handle a general range of learner language.</Paragraph>
      <Paragraph position="8"> Furthermore, the resources must be robust as learner language will contain input that is not well-formed and this can cause problems for some CL resources. Observations such as these point to type B NLP technologies as being the better type of technologies to employ in the context of language learning. However, below we argue that this is not necessarily the case.</Paragraph>
      <Paragraph position="9"> 2.4 ICALL in the Primary School It may be natural to assume that CL/NLP resources customarily lend themselves to intermediate or advanced learners of a language, as they are more likely to have the linguistic competence to understand output generated by CL/NLP resources. Considering the other end of the language-learning spectrum, that of primary school learners, it may be perceived that CL/NLP resources could not be so easily deployed with linguistically less advanced learners - these students will not be interested in viewing concordances, morphological annotations or parse trees.</Paragraph>
      <Paragraph position="10"> However, it can be argued that there are certain natural circumstances supporting the use of even type A CL technology in CALL in this environment. Firstly, in comparison to adults, young learners have limited first language (L1) performance (Brown, 1994). The target primary school students are aged between 7 and 13 years (second to sixth class in the Irish primary school system). They tend to produce simpler sentences and have a smaller range of vocabulary than an adult. These L1 features have a number of implications - the students' L1 knowledge further constrains their emerging L2 production. Complex linguistic constructs are less likely to transfer into the target language. Effectively, the target language amounts to a controlled language.</Paragraph>
      <Paragraph position="11"> Controlled languages are easier suited to type A CL systems and produce better results (Arnold et al., 1994).</Paragraph>
      <Paragraph position="12"> Secondly, the students' target language(s) (Irish and German in this context) represent a limited domain or sublanguage. The Irish curriculum is followed in primary schools from the age of 4/5.</Paragraph>
      <Paragraph position="13"> Students can take German (where it's available) during their senior years of primary school (aged 10-13) and the language domain is limited to a 2 year beginners' curriculum. It is possible to anticipate students' L2 knowledge, especially since they have been following set curricula. Machine Translation (MT) can be used to highlight an example of the success of sublanguages with CL/NLP. The Meteo translation system is used successfully in Canada to translate weather forecasts bi-directionally between French and English (Hutchins and Somers, 1992). The 'weather' sublanguage has a small vocabulary and uses a telegraphic style of writing and omits tense. Primary school students' L1 and L2 performance characteristics - controlled language and limited domain - imply that some scalability problems that are sometimes encountered in certain type A CL resources can be avoided.</Paragraph>
      <Paragraph position="14"> While primary school learners will not be interested in viewing concordances or parse trees technology can be used but hidden from the learner, to generate exercises and learner feedback and to present students with an animation based on information computed by the underlying CL/NLP engines embedded (but not visible) in the CALL application. In this way the learner will benefit from the technologies but not be confused by linguistic elements that are beyond their capacity as young learners.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 CL/NLP Resources for CALL
</SectionTitle>
    <Paragraph position="0"> In this paper we look at how CL/NLP resources can be integrated into CALL materials in general, as well as specifically for Primary Schools in Ireland, with a focus on CALL materials for Irish and German. This section will briefly outline how a range of CL/NLP resources can be used in this environment, while later sections will focus on the use of specific CL/NLP resources in more detail.</Paragraph>
    <Paragraph position="1"> We return to our dichotomy of A- and B-type CL/NLP systems outlined in Section 2.1. ICALL systems have used a range of technologies, including both type A and type B systems.</Paragraph>
    <Paragraph position="2"> Examples of type A-like systems include small-scale Lexical Functional Grammar (LFG) -based robust parsers to provide error recognition and feedback (Reuer, 2003) and parsing for viewing sentence structures and error diagnosis (Vandeventer Faltin, 2003). Examples of type Blike systems include a broad-coverage English LFG-based grammar for grammar checking (Khader et al, 2004), the Systran MT system to improve translation skills (La Torre, 1999) and using speech recognition for pronunciation training (Menzel et al, 2001).</Paragraph>
    <Paragraph position="3"> It is relatively straightforward to integrate type B (NLP) technology into CALL applications for primary school learners. In Section 4 of this paper we show how broad-coverage FST technology can be used to morphologically analyse word forms or to generate all inflected forms given a root form. Output from a FST morphology engine is fed into an interface engine which sends the information in the appropriate format to an XML/Flash environment for animation (Koller, 2004). The learner input can be collated over time into a learner corpus and later analysed by the teacher to detect common errors amongst students. Part-Of-Speech (POS) taggers can be used to identify the parts of speech in electronic versions of learners' textbooks or a corpus collated around their curriculum (Section 5). The output can then be used for a variety of uses, including the automatic generation of online exercises (e.g. hangman) and together with the FST morphological engine automatic dictionary extraction.</Paragraph>
    <Paragraph position="4"> Mainly due to scalability problems, type A CL technologies can be difficult to deploy in general ICALL systems. However, they can be used in the primary school context quite effectively. As outlined in Section 2.4, the limited linguistic performance knowledge of the learners' L1 and especially their L2 amounts to a 'controlled' language scenario and type A CL technologies can be deployed successfully. Curricula used in primary schools (in Ireland and elsewhere) represent a limited domain in which type A technologies can be highly appropriate. Small coverage DCGs, for example, can be written for the anticipated L2 learner input and can be used to provide immediate feedback to the learner.</Paragraph>
    <Paragraph position="5"> Problems associated with difficulties in building wider-coverage grammars do not present themselves in this context, as the curriculum is limited.</Paragraph>
    <Paragraph position="6"> The are many other potential uses of CL/NLP in this context, but this paper will focus on the FST and POS tagging examples mentioned above.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 CL/NLP Resources for Irish Primary
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
School CALL
4.1 Background
</SectionTitle>
      <Paragraph position="0"> Irish is a compulsory subject in schools in Ireland. Students generally tend to have a negative attitude towards the language, which hinders learning (Harris &amp; Murtagh, 1999). Until recently, Irish has been taught using the Audio-Lingual method (structural patterns are taught using repetitive drills) and it is only since 1999 that a new communicative curriculum (language teaching is structured around topics in terms of communicative situations) has been developed and integrated. Currently, there are very few CALL resources available for Irish (Hetherington, 2000) and those that do exist may not be as error-free as one would like, are not specifically aimed at primary school learners and are therefore not tied to the Primary School curriculum which hinders their integration into the classroom.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 A FST-Based Morphological Engine for
Irish
</SectionTitle>
      <Paragraph position="0"> Ui Dhonnchadha (2002) has developed an analyser and generator for Irish inflectional morphology using Finite-State Transducers (Beesley and Karttunen, 2003). The FST engine contains approximately 5,000 lexical stems, generates/recognises over 50,000 unique inflected surface forms with a total of almost 400,000 morphological descriptions (due to ambiguous surface forms). The final FST is the result of composing intermediate transducers, each encoding a different morphological process. It is useful to have a record of the morphological processes involved in mapping between lexical (i.e. lemmas and morphological features) and surface forms. By including a marker in the surface form each time a process is applied, a record of the morphological processes involved can be maintained and used in other applications.</Paragraph>
      <Paragraph position="1"> The morphological processes covered include: (i) internal mutations such as lenition, ellipsis, stem internal modification and vocal prefixing; (ii) final mutation, such as vowel harmony with suffixes (broadening, slenderising and syncopation); as well as concatenative morphology (prefixing, suffixing).</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Technology - FST, Perl, XML and Flash
</SectionTitle>
      <Paragraph position="0"> Primary school learners are not interested in viewing output generated by a FST Morphology engine. The challenge in CALL applications (particularly in the primary school scenario) is to exploit underlying technology to present information in a manner appropriate to the learner. To this end we developed animation software interfaced with the output generated by the FST engine.</Paragraph>
      <Paragraph position="1"> Animation can enhance the learning process and is especially interesting for younger learners. Flash (2004) is a useful software environment to develop animations but it is difficult for nonprogrammers to use and it is often difficult to use the same animation templates for different inputs. One solution is to use XML (Extensible Markup Language, XML (2004)) files as input into Flash, so that the information displayed is customisable according to the information in the input data file. We outline how animated CALL materials were developed for teaching the conjugation of verbs in the present tense in Irish.</Paragraph>
      <Paragraph position="2"> Output from the FST engine is fed to a Perl script which converts the information into a specified XML format. The XML files are then used by Flash to generate the required animation. Figure 1 outlines the software architecture. Figure 2 shows the conjugation of the verb cuir (to put) in the present tense in Irish. Figure 3 shows modified output from the FST engine to enable automatic animations to be generated (^INF indicates inflectional infix, ^PP indicates inflectional postposition and ^SUF indicates inflectional suffix for Flash).</Paragraph>
      <Paragraph position="3">  animation movie for past tense 1 st person singular for the verb 'cuir' in Irish (Inne means yesterday) The Flash-based interface dynamically displays XML data. It reads in XML data at runtime and generates an animation. Learners have full control over the animation. They can play, stop, rewind and skip through the animation. Further interaction is provided via menus to choose specific conjugations (e.g. number, person and tense.) The FST-Flash interface is languageindependent. The XML files contain detailed information about the different string operations and the corresponding targets. The only operations known to the Flash interface are insert, delete and replace. In this way, the animation of language data is abstracted from linguistic terms like prefixation, suffixation or lenition, thus avoiding the problem of varying definitions of these terms in different languages. The transformation of the (linguistically tagged) output from the morphology engine to the XML data necessary for animated presentation is done by Perl scripts which can be tailored specifically to each combination of language and output of a NLP tool.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 CL/NLP Resources for German Primary
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
School CALL
5.1 Background
</SectionTitle>
      <Paragraph position="0"> German is gradually being integrated into Irish primary schools through the Modern Languages in Primary School Initiative (MLPSI), which has been running since 1998. At present, over 300 schools in Ireland are involved in the MLPSI.</Paragraph>
      <Paragraph position="1"> German is taught during the senior two years of the primary school cycle (children aged 10-13).</Paragraph>
      <Paragraph position="2"> Irish students do not receive any instruction in Modern Foreign Languages (MFL) up until this point (Irish is not considered a MFL). The communicative curriculum we developed is based on a draft curriculum which was developed by the National Council for Curriculum and Assessment (NCCA) (NCCA, 2004) for teachers participating in the MLPSI.</Paragraph>
      <Paragraph position="3"> The integration of type A CL technology into CALL in this environment is ideal. The target language is restricted to a beginner's curriculum. This represents a restricted domain. Sentence constructions are simple with few structures that could present coverage or ambiguity difficulties to CL systems. Given that the target language is German, many CL tools are available for almost every aspect of language processing.</Paragraph>
      <Paragraph position="4"> In this section we will focus on the use of type B NLP technology in this environment to meet the needs of students learning German. These needs have been researched qualitatively through observation during German language lessons in a primary school in Ireland during the school year 2003/4. Irish students are native English speakers (some are also native Irish speakers) and as such are unfamiliar with nouns being associated with genders as in German. These students also require extra practise with inflecting verbs correctly.</Paragraph>
      <Paragraph position="5"> Having being asked 'Wie heisst du?', students will often respond with 'Ich heisst ...', for example. We present the use of a POS tagger in the development of a tailored corpus which subsequently feeds into the automatic generation of exercises.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Technology - POS tagging, Perl and XML
</SectionTitle>
      <Paragraph position="0"> CALL courseware generally presents users with exercises to complete after they have studied a particular topic. These are usually static in content and are very time consuming to develop over the full set of language topics. Students are usually presented with a small number of exercises, which they will have completed in their entirety and become familiar with in a limited space of time.</Paragraph>
      <Paragraph position="1"> Larger sets of exercises prove beneficial in providing variety for the student - they will not be presented with the same set of exercises each time they visit a topic. In addition, some students will complete exercises faster than others. This puts pressure on slower students to keep up and on teachers to provide alternative work to keep faster students occupied. Larger sets of exercises mean that exercise selection can be randomised so that students are p esented with new material each time they visit t less pre students complete additional exercises within the same language topic and teachers required to pr CL can significantly generate set A co the NCCA guidelines and tagged</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Sch
</SectionTitle>
      <Paragraph position="0"> The annotated text file converted to divided into topic. Addi file referen file at this stage. Once the annotated corpus XML it can such as lesson generation, automatic dictionar extraction, a concordancer and automatic generation of on t verbs, articles and nou identifies. Inflection and article-noun com can be practised when a student chooses the correct verb ending or article from a selection or types in the correct answer. A version of hangman (a game where students try to guess an unknown word by guessing letters in the word - they only get a certain number of chances for incorrect answers after which the game ends) can also be played with article-noun combinations. By simply specifying the topic section in the curriculum and the type of game, exercises are automatically generated. Each particular exercise is randomised so that the user is presented with a new variant of the problem each time they attempt an exercise or game.  he courseware; slower students will feel ssure to work at a faster pace when faster will not be ovide alternative material.</Paragraph>
      <Paragraph position="1"> reduce the time needed to s of exercises around language topics. mplete curriculum was developed around using Helmut mid's TreeTagger (see TreeTagger homepage). was then automatically XML using Perl. The corpus is separate XML files for each language tional information - audio and graphic ces were added manually to each topic  has been converted to feed into a number of applications y various exercise types. In focusing he latter, we are particularly interested in the ns that the POS tagging  Previous work in automatic exercise generation from corpora highlighted a number of potential pitfalls (Wilson, 1997). Most importantly, the language in the corpus used is best when the linguistic quality of the texts is appropriate for learning a language. Long and complex sentences are best avoided. Our design employs a corpus collated and tailored around the learner's curriculum, thus avoiding this pitfall. The benefit of using CL resources here is similar to the situation in the Irish context. Exercises can be developed automatically for any verb or noun phrase within the curriculum and provide variety for the user. This removes the nece hand-coding each exercise and reduces the risk an</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML