File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0203_evalu.xml

Size: 2,871 bytes

Last Modified: 2025-10-06 13:59:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0203">
  <Title>A real-time multiple-choice question generation for language testing a preliminary study-</Title>
  <Section position="8" start_page="18" end_page="19" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> To examine the quality of the questions generated by the current system, we have evaluated the blank positions determined by a Naive Bayes classifier and a KNN classifier (K=3) with a certainty of more than 50 percent in 10 articles.</Paragraph>
    <Paragraph position="1"> Among 3138 words in total, 361 blanks were made and they were manually evaluated according to their possibility of being a multiple-choice question, with an assumption of having alternatives of the same part of speech. The blank positions were categorized into three groups, which are E (possible to make a question), and D (difficult, but possible to make a question), NG (not possible or not suitable e.g. on a punctuation). The guideline for deciding E or D was if a question is on a grammar rule, or it requires more semantic understanding, for instance, a background knowledge  .</Paragraph>
    <Paragraph position="2"> Table 2. shows the comparison of the number of blank positions decided by the two classifiers, each with a breakdown for each evaluation. The number in braces shows the proportion of the blanks with a certain evaluation over the total number of blanks made by the classifier. The rightmost column I shows the number of the same blank positions selected by both classifiers.</Paragraph>
    <Paragraph position="3"> The KNN classifier tends to be more accurate and seems to be more robust, although given the fact that it produces less blanks. The fact that an instance-based algorithm exceeds Naive Bayes, whose decision depends on the whole data, can be ascribed to a mixed nature of the training data. For example, blanks for grammar questions might have different features from ones for vocabulary questions.</Paragraph>
    <Paragraph position="4"> The result we sampled has exhibited another problem of Naive Bayes algorithm. In two articles among the data, it has shown the tendency to make a blank on be-verbs. Naive Bayes tends to choose the  A blank on a verbs or a part of idioms (as [according] to) was evaluated as E, most of the blanks on an adverbs, and (as [now]) were D and a blank on a punctuation or a quotation mark was NG.</Paragraph>
    <Paragraph position="5">  same word as a blank position, therefore generates many questions on the same word in one article.</Paragraph>
    <Paragraph position="6"> Another general problem of these methods would be that the blank positions are decided without consideration of one another; the question will be sometimes too difficult when another blank is next to or in the vicinity of the blank.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML