File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0203_metho.xml
Size: 6,459 bytes
Last Modified: 2025-10-06 14:09:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0203"> <Title>A real-time multiple-choice question generation for language testing a preliminary study-</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 System Design </SectionTitle> <Paragraph position="0"> The system we have implemented works in a simple pipelined manner; it takes an HTML file and turns it into the one of quiz session. The process of converting the input to multiple-choice questions includes extracting features, deciding the blank positions, and choosing the wrong alternatives (which are called distractors), which are all done in a moment when the user feeds the input. When the user submits their answer, it shows the text with the correct answers as well as an overall feed back.</Paragraph> </Section> <Section position="5" start_page="0" end_page="17" type="metho"> <SectionTitle> 3 Methodology </SectionTitle> <Paragraph position="0"> The process of deciding blank positions in a given text follows a standard machine learning framework, which is first training a classifier on a training data (i.e. TOEIC questions), then applying it on an unseen test data, (i.e. the input text). In the current system, the mechanism of choosing distractors is implemented with the simplest algorithm, and its investigation is left to future work.</Paragraph> <Section position="1" start_page="17" end_page="17" type="sub_section"> <SectionTitle> 3.1 Preparing the Training Data </SectionTitle> <Paragraph position="0"> The training data is a collection of fill-in-the-blank questions from a TOEIC preparation book (Matsuno et al., 2000). As shown in the box below, a question consists of a sentence with a missing word (or words) and four alternatives one of among which best fits into the blank.</Paragraph> <Paragraph position="1"> Many people showed up early to [ ] for the position that was open.</Paragraph> <Paragraph position="2"> 1. apply 2. appliance 3. applies 4. application The training instances are obtained from 100 questions by shifting the blank position. The original position is labeled as true, while sentences with a blank in a shifted position are at first labeled as false. The instance shown above therefore yields instances [ ] people showed up early to apply for the position that was open., Many [ ] showed up early to apply for the position that was open., and so on, all of which are labeled as false except the original blank position. 1962 (100 true and 1862 false) instances were obtained.</Paragraph> <Paragraph position="3"> The label true here is supposed to indicate that it is possible to make a question with the sentence with a blank in the specified position, while many of the shifted positions which are labeled false can also be good blanks. A semi-supervised learning</Paragraph> <Paragraph position="5"> is conducted in the following manner to retrieve the instances that are potentially true among the ones initially classified as false. We retrieved the 13 instances (shown in Table 1.) which had initially been labeled as false and classified as true in a test-on-train result with a certainty of more than 0.5 with a Naive Bayes classifier</Paragraph> </Section> </Section> <Section position="6" start_page="17" end_page="18" type="metho"> <SectionTitle> . The </SectionTitle> <Paragraph position="0"> labels of those instances were changed to true before re-training the classifier. In this way, a training set with 113 true instances was obtained.</Paragraph> <Section position="1" start_page="17" end_page="18" type="sub_section"> <SectionTitle> 3.2 Deciding Blank Positions </SectionTitle> <Paragraph position="0"> For the current system we use news articles from BBC.com , which consist approximately 200-500 words. The test text goes through tagging and feature extraction in the same manner as the training Semi-supervised learning is a method to identify the class of unclassified instances in the dataset where only some of the instances are classified.</Paragraph> <Paragraph position="1"> The result of a classification of a instance is obtained along with a certainty value between 0.0 to 1.0 for each class, which indicates how certain it is that an instance belongs to the class. Seven features which are word, POS, POS of the previous word, POS of the next word, position in the sentence, sentence length, word length and were used.</Paragraph> <Paragraph position="2"> data, and the instances are classified into true or false. The positions of the blanks are decided according to the certainty of the classification so the blanks (i.e. questions) are generated as many as the user has specified.</Paragraph> </Section> <Section position="2" start_page="18" end_page="18" type="sub_section"> <SectionTitle> 3.3 Choosing Distractors </SectionTitle> <Paragraph position="0"> In the current version of the system, the distractors are chosen randomly from the same article excluding punctuations and the same word as the other alternatives. null</Paragraph> </Section> </Section> <Section position="7" start_page="18" end_page="18" type="metho"> <SectionTitle> 4 Current system </SectionTitle> <Paragraph position="0"> The real-time system we are presenting is implemented as a Java servlet, whose one of the main screens is shown below. The tagger used here is the Tree tagger (Schmid, 1994), which uses the Penn- null with an enlarged answer selector.</Paragraph> <Paragraph position="1"> The current version of the system is available at http://www.iii.u-tokyo.ac.jp/ ~qq36126/mcwa1/. The interface of the system consists of three sequenced web pages, namely 1)the parameter selection page, 2)the quiz session page and 3)the result page.</Paragraph> <Paragraph position="2"> The parameter selection page shows the list of the articles which are linked from the top page of the BBC website, along with the option selectors for number of blanks (5-30) and the classifier (Naive Bayes or Nearest Neighbors).</Paragraph> <Paragraph position="3"> The question session page is shown in Figure 1. It displays the headline and the image from the chosen article under the title and a brief instruction. The alternatives are shown on option selectors, which are placed in the article text.</Paragraph> <Paragraph position="4"> The result page shows the text with the right answers shown in green when the user's choice is correct, red when it is wrong.</Paragraph> </Section> class="xml-element"></Paper>