File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0203_evalu.xml
Size: 2,871 bytes
Last Modified: 2025-10-06 13:59:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0203"> <Title>A real-time multiple-choice question generation for language testing a preliminary study-</Title> <Section position="8" start_page="18" end_page="19" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> To examine the quality of the questions generated by the current system, we have evaluated the blank positions determined by a Naive Bayes classifier and a KNN classifier (K=3) with a certainty of more than 50 percent in 10 articles.</Paragraph> <Paragraph position="1"> Among 3138 words in total, 361 blanks were made and they were manually evaluated according to their possibility of being a multiple-choice question, with an assumption of having alternatives of the same part of speech. The blank positions were categorized into three groups, which are E (possible to make a question), and D (difficult, but possible to make a question), NG (not possible or not suitable e.g. on a punctuation). The guideline for deciding E or D was if a question is on a grammar rule, or it requires more semantic understanding, for instance, a background knowledge .</Paragraph> <Paragraph position="2"> Table 2. shows the comparison of the number of blank positions decided by the two classifiers, each with a breakdown for each evaluation. The number in braces shows the proportion of the blanks with a certain evaluation over the total number of blanks made by the classifier. The rightmost column I shows the number of the same blank positions selected by both classifiers.</Paragraph> <Paragraph position="3"> The KNN classifier tends to be more accurate and seems to be more robust, although given the fact that it produces less blanks. The fact that an instance-based algorithm exceeds Naive Bayes, whose decision depends on the whole data, can be ascribed to a mixed nature of the training data. For example, blanks for grammar questions might have different features from ones for vocabulary questions.</Paragraph> <Paragraph position="4"> The result we sampled has exhibited another problem of Naive Bayes algorithm. In two articles among the data, it has shown the tendency to make a blank on be-verbs. Naive Bayes tends to choose the A blank on a verbs or a part of idioms (as [according] to) was evaluated as E, most of the blanks on an adverbs, and (as [now]) were D and a blank on a punctuation or a quotation mark was NG.</Paragraph> <Paragraph position="5"> same word as a blank position, therefore generates many questions on the same word in one article.</Paragraph> <Paragraph position="6"> Another general problem of these methods would be that the blank positions are decided without consideration of one another; the question will be sometimes too difficult when another blank is next to or in the vicinity of the blank.</Paragraph> </Section> class="xml-element"></Paper>