File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0807_metho.xml
Size: 17,373 bytes
Last Modified: 2025-10-06 14:15:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0807"> <Title>A Corpus-Based Grammar Tutor for Education in Language and Speech Technology</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The grammar tutor </SectionTitle> <Paragraph position="0"> With these aims in mind, we are developing a corpus-based grammar tutoring system. We are aiming at first for a system with limited functionality--both in order not to overreach ourselves and to facilitate evaluation--which will undergo several rounds of formative evaluation (see Laurillard 1996).</Paragraph> <Paragraph position="1"> The system will provide a learning context that in important respects is a realistic one. The students will work with authentic linguistic material, in the form of a tagged corpus, and use pedagogically adapted versions of formalisms and tools that they will be using also later in their professional life. This is similar in spirit to the approach taken by McArthur et al. (1995), who argue persuasively for the use in education of so-called ES-</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> SCOTS (Educational Support Systems based on </SectionTitle> <Paragraph position="0"> Commercial-Off-The-Shelf software). They report both an unusually short system development time and good learning results (in an experiment where they adapted a commercial Geographic Intbrmation System (GIS) for use in an educational setting). null The system will be used and evaluated in the context of one or more of our LST and linguistics courses (formal syntax and computational syntax, at least, possibly also basics of grammar), starting in the autumn term of 1999. The evaluation will not be carried out as a test group-control group setup. This is mainly for practical reasons, our student population being too small for this kind of experiment. 2 Instead, we will use in-class observa2There are also theoretical motivations for this, as there have been serious concerns voiced in the literature about the meaningfulness of such &quot;experiments&quot; in the context of computer-assisted learning (see Borin 1998).</Paragraph> <Paragraph position="1"> I tion, questionnaires and interviews with the students and teachers, and logging of student activity as our main evaluation instruments. The evaluations will, hopefully, yield two kinds of result. Firstly, we expect to learn something about the effectiveness of using a corpus-based computerised grammar tutor i and, secondly, we will see what should be changed and what added in the system (this is what the &quot;formative&quot; part is about).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Corpus and exercise types </SectionTitle> <Paragraph position="0"> As just stated, any annotated corpus could form the basis of the grammar tutor. As our point \] of departure, we have chosen to use a Swedish one-million-woad balanced corpus, the Stockholm U'me,~ Corpus (SUC; Ejerhed and K~llgren 1997). 3 For the first version, there are two grammar exercise types under development: The most basic exercise is to assign part of speech and morphosyntactic features to words in the corpus. This exercise exists in a preliminary version (Mats 1999), which has been used at our department with encouraging results. 4 The second step is the formulation of grammatical rules and applying them to the corpus with the help of a built-in parser. Random analysis and generation with the same grammar will also be supported. The system will eventually support two formalisms, plain context-free grammar and a feature-structure based one.</Paragraph> <Paragraph position="1"> The parser helps the student to evaluate his/her grammar by making clear which analyses the grammar assigns or fails to assign to the sub-strings of the corpus. One grammatical category (non-terminal symbol) is selected as the one being of particular interest for the moment. The parser locates all strings that are generated as instances of that category. The corpus provides the lexical nodes, i.e. the text word-category pairings.</Paragraph> <Paragraph position="2"> By inspecting these analyses the student will be in a position to decide, with respect to a certain :tSUC was compiled and semi-automatically tagged m the years 1989-1996 (Ejerhed and KPS11gren 1997).</Paragraph> <Paragraph position="3"> The corpus follows the Brown Corpus format: There are 5(}0 text chunks of approximately 2000 words each, with a genre distribution similar to that in other balanced corpora, although only the written standard language is represented. A corrected second version of the corpus is due to appear before the end of 1999. '~It has been tried out during the spring term of 1999 with a group of computer science students taking a course in language engineering in our departmeat (Mats 1999). The students were largely positive in their evaluation of the exercises, but they also suggested some improvements in the user interface and in the way the material was presented to the user. We will incorporate some of these suggestions in the next version of the exercise.</Paragraph> <Paragraph position="4"> category, to what extent the grammar accounts for the instances of the category and to what extent it overgenerates. The tokens found may be listed (with context) or graphically indicated in the running corpus text. This exercise will encourage the student to evaluate a grammar in terms of its precision and recall with respect to the selected category. The student's own grammar-related intuitions are, of course, important in this kind of corpus-oriented setting, as only the words of the corpus are tagged (it is not a treebank). In other words, there is no predefined right answer available (but see below). The evaluation of the student's performance is rather based on his/her own judgments. This is an example of so-called intrinsic/cedback, which is the best kind of feedback, according to several CALL practitioners; see Laurillard 1996. Nevertheless, the system will ensure that the application of these intuitions and the reasoning about the grammar will be supported by considerations of concrete data.</Paragraph> <Paragraph position="5"> The tagged corpus may also be used for random generation. The text word-category pairings define a lexicon which generates expressions of various categories in conjunction with the student's grammar. In this way the lexical material of the corpus and the grammar are used to make grammaticality predictions. The generation exercise will mainly throw light upon how overgeneration problems are discovered and dealt with.</Paragraph> <Paragraph position="6"> The system gives some feedback about the status of the grammar. Warnings are issued if some category is left undefined. The number of rules, categories, and features used is also reported.</Paragraph> <Paragraph position="7"> This is intended to alert the student to the issue of how simple/complicated the grammar is, which is important as simplicity is one of the most important aspects of theoretical adequacy.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Parts of speech and grammars </SectionTitle> <Paragraph position="0"> The tagging provides the link between the student's grammar and the given corpus data. It is therefore crucial which categories are used. As the empirical material is a selection from a particular corpus, the tags visible to the student must be derivable from the tagset used in that corpus. Of course, these tags may be mapped onto the tags of the tutoring system in various ways.</Paragraph> <Paragraph position="1"> The system comes with two predefined mappings from corpus tags to grammar categories, to context-free categories on the one hand and to feature structures on the other. These mappings are defined in a file and can be revised by the teacher.</Paragraph> <Paragraph position="2"> Manipulation of this mapping can, of course, also be a part of more advanced exercises for the student. null As mentioned, the system will support two grammar fbrmalisms--corresponding to the two tagset mappings just mentioned--pure context-free grammar and a feature-structure formalism, the latter in the style of PATR-II (Shieber 1986).</Paragraph> <Paragraph position="3"> A context-free grammar is (by definition) used together with a fiat taxonomy of lexical categories. As the default option, the program operates with such an inventory of categories which is related to the traditional part-of-speech system, but more fine-grained.</Paragraph> <Paragraph position="4"> The feature structure tags used with the PATR-II-style formalism correspond to the full information in the tagset used, i.e., they contain primarily inflectional information, in addition to the syntactic category. This means that the corpus mainly will support constraint-based accounts of agreement phenomena. However, the system as such will allow descriptions dealing with arbitrary aspects of grammar.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Implementation </SectionTitle> <Paragraph position="0"> As the implementation language we have chosen Java, primarily because of its platformindependence and because it is an excellent language for rapid prototyping of applications with sophisticated GUIs, but also to some extent because of its association with the Internet and the WWW (see below).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Planned developments </SectionTitle> <Paragraph position="0"> Explicit evaluation of the students' actual use of the system will, needless to say, provide the main indication of how the system should be improved and extended. The implemented exercises have nevertheless been designed to fit into a scheme of logically linked exercises, which step-wise lead the students on to more complicated and difficult tasks.</Paragraph> <Paragraph position="1"> The present system could in a natural way be extended to deal with a corpus which is preanalysed also with regard to constituent structure. A less advanced task for the student would then be to write a grammar that agrees with the given structure(s). The system would provide detailed feedback evaluating the ability of the grammar to generate the given syntactic structures. This exercise would illustrate the purely formal aspect of grammar tbrmulation. It could preferably be used as a preparation for the exercises relying on intrinsic feedback from the student's own grammatical intuitions.</Paragraph> <Paragraph position="2"> Another valuable addition to the system would be a module that encourages the student to organise the empirical evaluation in a systematic way. The compilation and use of test suites provide an often used and simple method with this advantage. A test suite for a certain category is a list of known instances of the category and a list of strings that are known not to belong to the category. A test suite thus provides a collection of data against which a grammar may be automatically evaluated. The system reports the number of positive instances the grammar fails to account for and the number of overgenerations. This exercise shows how the empirical evaluation of a grammar may proceed in a more systematic fashion and encourages trial and error experimentation with the grammar formulation.</Paragraph> <Paragraph position="3"> Another dimension of difficulty is given by the two grammar formalisms. The basic idea is that the system should be a pedagogically organised toolbox for grammar formulation and corpus inspection (taking the ideas presented in Lager 1995 one step further) and this idea makes it natural to integrate various extensions into the system, such as new inspection tools and other grammar formalisms and parsers, e.g. that described in DahllSf 1999 or finite-state formalisms for syntax (e.g. Karlsson et al. 1994) or morphology (e.g. Karttunen 1993).</Paragraph> <Paragraph position="4"> In the context of feature-structure grammars, a unification-failure explanation generator is useful. This component indicates which feature mismatch(es) made it impossible for the grammar rules to assemble a certain phrase. A simple version of this facility is implemented in DahllSf (1999) and it has turned out to be very useful during grammar construction. Pedagogically developed versions of it would likely be valuable for students (and professionals) as it often is very difficult to see how feature~assignments interact in a constraint-based grammar and to locate the source of unwanted unification failures.</Paragraph> <Paragraph position="5"> A longer-term goal would be to provide the system with intelligent error analysis and help facilities. This is an exciting but largely unexplored research topic in CALL, known as Intelligent CALL, or ICALL, which draws on research in the fields of Artificial Intelligence and Computational Linguistics. null It would also be desirable to develop some kind of authoring interface to the system. Direct manipulation of the system's Java code would presuppose fairly advanced programming skills and this would presumably make it impossible fbr most teachers to adapt the system to new learning tasks. An authoring facility, allowing users to define new exercises in a suitable authoring language, would consequently extend the usefulness of the system. Such an interface can also be given a more direct pedagogical motivation: There are CALL applications where students step into tile role of the teacher, as it were, designing exercises (as if) for their fellow students, and learning about the subject matter in doing so (see Borin 1998).</Paragraph> <Paragraph position="6"> In its first version, the grammar tutor will, for practical reasons, be accompanied by written instructions and conventional coursebooks. We do however intend to integrate this information in the system. 5</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.5 Benefits from Internet use </SectionTitle> <Paragraph position="0"> As we mentioned above, our choice of Java as the programming language for the grammar tutor was only partially motivated by its status as th.e programming language of the World Wide Web. Rather, we chose it because it is platform-independent and because the GUI capabilities we need are built into the language. ~ Thus, the application was not built with the WWW in mind, although it is fully feasible to use it over the Internet. In this case, a possible division of labour could be implemented, where the exercise programs are Java applets locally executed in the student's computer, while the corpus resides in a server-side database.</Paragraph> <Paragraph position="1"> From the experiences of the CALL community, we know that the Internet can bring two distinctly different kinds of pedagogical added value to a learning situation: 1. In this case, the pedagogical value is only incidental upon the general advantage of a client-server setUp, i.e. that it is easier to maintain and upgrade an application if you only have to do, it once and in one location. For a CALL application, this means that data and exercises probably can be updated more often than otherwise would have been the case.</Paragraph> <Paragraph position="2"> 2. Tlle other case turns around using the Internet as a widely accessible time-ofday-indepdndent communications network.</Paragraph> <Paragraph position="3"> Thanks to the Internet, students and teachers, who may be geographically far apart, can collaborate both asynchronously and synchronously in creating an optimal virtual learning environment for some types of r'This matter may deserve some deliberation.</Paragraph> <Paragraph position="4"> Benyon et al. (1997) point out that turning written coursebooks directly into hypertext rarely yields good results, and m Nygren (1996), on the basis of practical experiences of medical information systems, we are warned that paper-based information often loses in lu(:idity and navigability as a result of it being poured into a computer.</Paragraph> <Paragraph position="5"> (~The A~VT and JFC class libraries.</Paragraph> <Paragraph position="6"> learning tasks (Pennington 1996; Warschauer 1996; Levy 1997; Borin 1998).</Paragraph> <Paragraph position="7"> The grammar tutoring system has been designed with self-study in mind, so that it is hard to see how it could benefit pedagogically other than incidentally--i.e, as in (1) above--from being made into an Internet application. On the other hand, positive learning effects have been noted in situations where students cooperate in fi'ont of the computer to do the exercises in a CALL program designed for self-study (Chapelle et al.</Paragraph> <Paragraph position="8"> 1996). This points to the possibility of designing for a more central role of the Internet even in a program such as the one discussed here. Thus, for instance, grammatical analysis could be carried out collaboratively (or competitively) by several students over the network.</Paragraph> </Section> </Section> class="xml-element"></Paper>