File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0405_intro.xml

Size: 3,944 bytes

Last Modified: 2025-10-06 14:06:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0405">
  <Title>Modeling the language assessment process and result: Proposed architecture for automatic oral proficiency assessment</Title>
  <Section position="4" start_page="24" end_page="24" type="intro">
    <SectionTitle>
2 Modeling the rater
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="24" end_page="24" type="sub_section">
      <SectionTitle>
2.1 Inference gz Inductive Bias
</SectionTitle>
      <Paragraph position="0"> Research in machine learning has demonstrated the need for some form of inductive bias, to limit the space of possible hypotheses the learning system can infer. In simple example-based concept learning, concepts are often restricted to certain classes of Boolean combinations, such as conjuncts of disjuncts, in order to make learning tractable. Recent research in automatic induction of context-free grammars, a topic of more direct interest to language learning and assessment, also attests to the importance of structuring the class of grammars that can be induced from a data set. For instance Pereira and Schabes (1992) demonstrate that a grammar learning algorithm with a simple constraint on binary branching (CNF) achieves less than 40% accuracy after training on an unbracketed corpus.</Paragraph>
      <Paragraph position="1"> Two alternatives achieve comparable increases in grammatical accuracy. Training on partially bracketed corpora - providing more supervision and a restriction on allowable grammars - improves to better than 90%. (DeMarcken, 1995) finds that requiring binary branching, as well as headedness and head projection restrictions on the acquirable grammar, leads to similar improvements. These results argue strongly that simply presenting raw text or feature sequences to a machine learning program to build an automatic rating system for language assessment is of limited utility. Results will be poorer and require substantially more training data than if some knowledge of the task or classifter end-state based in human knowledge and linguistic theory is applied to guide the search for classifiers.</Paragraph>
    </Section>
    <Section position="2" start_page="24" end_page="24" type="sub_section">
      <SectionTitle>
2.2 Encoding Linguistic Knowledge
</SectionTitle>
      <Paragraph position="0"> Why, then, if it is necessary to encode human knowledge in order to make machine learning practical, do we not simply encode each piece of the relevant assessment knowledge from the per-son to the machine? Here again parallels with other areas of Natural Language Processing (NLP) and Artificial Intelligence (AI) provide guidance. While both rule-based, hand-crafted grammars and expert systems have played a useful role, they require substantial labor to construct and become progressively more difficult to maintain as the number of rules and rule interactions increases. Furthermore, this labor is not transferable to a new (sub-)language or topic and is difficult to encode in a way that allows for graceful degradation.</Paragraph>
      <Paragraph position="1"> Another challenge for primarily hand-crafted approaches is identifying relevant features and their relative importance. As is often noted, human assessment of language proficiency is largely holistic. Even skilled raters have difficulty identifying and quantifying those features used and their weights in determining an assessment. Finally, even when identifiable, these features may not be directly available to a computer system. For instance, in phonology, human listeners perceive categorical distinctions between phonemes (Eimas et al., 1971; Thibodeau and Sussman, 1979) whereas acoustic measures vary continuously.</Paragraph>
      <Paragraph position="2"> We appeal to machine learning techniques in the acoustic module, as well as in the pooling of information from both acoustic and non-acoustic features.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML