File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0807_intro.xml

Size: 7,169 bytes

Last Modified: 2025-10-06 14:07:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0807">
  <Title>A Corpus-Based Grammar Tutor for Education in Language and Speech Technology</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Pedagogical considerations
</SectionTitle>
    <Paragraph position="0"> There are some pedagogical points that we wish to raise in connection with the design of a grammar tutoring system ibr LST students, and which we feel are inadequately addressed in existing systems of this kind.</Paragraph>
    <Paragraph position="1"> 'This connection is natural to us also because we offer CALL as one of the specialisations in language engineering.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 The importance of authenticity
</SectionTitle>
      <Paragraph position="0"> Several pedagogical systems support training in formal grammar writing (Gazdar and Mellish 1989; Antworth 1990; von Klopp and Dalton 1996; McConnel 1995; Beskow et al. 1997; see also Rogers 1998). In most cases these systems only deal with grammars from an abstract point of view, without calling attention to the issue how well a grammar accounts for real language. These systems do however offer the students valuable facilities, e.g. allow them to evaluate a grammar by using it to parse arbitrary strings or for random generation. For our purposes, these systems are &amp;quot;realistic&amp;quot; in one sense, namely in that they let students express linguistic generalisations in formalisms which are similar to those actually used by language technologists.</Paragraph>
      <Paragraph position="1"> In another sense, however, systems of this kind are spiritually kindred to the &amp;quot;intuitive&amp;quot; method in generative grammar, rather than to the goals of language engineering. The issue of how relevant data is to be found and used is normally left out of the picture altogether. This is a major pedagogical defect as the step from understanding grammars as formal systems to understanding them as theories about existing language use is both crucial and intellectually demanding. It is our experience that this is one of the most difficult aspects of education in formal grammar. We consequently think that there is much to gain by the use of a tutoring system that helps the student to see how a grammar relates to a morphosyntactically annotated corpus. The aim of the work described here is to develop a system which will introduce grammar writing as an empirical process with the aim of accounting for authentic language.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Divide and conquer
</SectionTitle>
      <Paragraph position="0"> The use of a tagged corpus as a testing ground for fledgling formal grammar writers confers another advantage which is often absent from the systems referred to above. Since the aim of these systems is to train the students in writing syntactic or morphological rules, the lexicon is more often than not reduced to the absolute minimum--both in the number and in the complexity of entries--needed to illustrate how the syntactic or morphological rule system works. This is indeed a problem, but it can not be solved simply by urging the students to compile extensive lexicons. On the contrary, there is a clear pedagogical point to the separation of the grammar from the lexicon for training purposes. Generally, it is a good principle to present new material a little at a time, in conceptually coherent portions. Otherwise, the students may get confused, and as a consequence frustrated. In this case, you would like to offer them a ready-made lexicon which should be flexible enough to accommodate a number of grammar formalisms (a &amp;quot;poly-theoretic&amp;quot; lexicon). A morphosyntactically tagged corpus can be made to stand in for such a lexicon, at least in some respects; in addition to the purely linguistic information contained in it, there is also (implicit) information about frequencies of occurence in authentic language, about collocations, etc. Even if there is lexical information which will not, as a rule, be found even in a fairly richly annotated corpus (e.g. valency information and semantics), the information that you can find there still constitutes a vast improvement over the typical lexicons of grammar training systems.</Paragraph>
      <Paragraph position="1"> Conversely, the tagged corpus makes an excellent basis for exercises aiming at learning to identif:y the &amp;quot;atoms&amp;quot; of grammar, i.e. parts of speech and inflectional categories, in a realistic context.</Paragraph>
      <Paragraph position="2"> There are some tutoring systems for this purpose (e.g. Qiao 1996), including one (Mats 1999) that we have been trying out in our department recently. McEnery et al. (1995) compare another such system (the one described by McEnery et al. 1997) to traditional human teaching in a controlled evaluation procedure, and reach the conclusion that tile corpus-based computer-assisted method yields slightly better learning results.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 First things first
</SectionTitle>
      <Paragraph position="0"> It, is a good pedagogical principle not only to divide that which is to be learned into manageable chunks, but also to proceed from simpler to more complex knowledge. Ideally, the tutoring program should impose exactly this ordering for those students that need it (see Lanrillard 1996). The morphosyntacially annotated corpus puts at the students' disposal a &amp;quot;lexicon&amp;quot; which will tag along, as it were, as * they learn to identify not only which part of speech a certain text word is, but also which inflectional information should be associated with it; * their grammars evolve in terminal complexity from simple phrase structure rules with atomic terminal categories, to unification-based grammars with feature structures encoding the full morphosyntactic information for each lexical unit; * their grammars evolve in nonterminal complexity, enabling them to analyse increasingly larger portions of the corpus.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Learning by inquiry
</SectionTitle>
      <Paragraph position="0"> A corpus-based grammar tutor shares with corpus-based CALL in general the trait of being eminently suited for hypothetico-deductive, problem- and data-driven learning (&amp;quot;serendipity learning&amp;quot;; cf. Flowerdew 1996, or &amp;quot;learning by inquiry&amp;quot;; see McArthur et al. 1995). By working with the program the student will develop his skills in evaluating a grammar as an account of the syntactic phenomena found in a corpus. The system will support a process of thinking that highlights important aspects of scientific reasoning. Abstract concepts such as theory, data, precision, recall and prediction are illustrated in a fairly concrete manner, as are (other) basic aspects of formal grammar.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML