File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1027_metho.xml
Size: 12,653 bytes
Last Modified: 2025-10-06 14:09:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1027"> <Title>Question Answering as Question-Biased Term Extraction: A New Approach toward Multilingual QA</Title> <Section position="4" start_page="216" end_page="218" type="metho"> <SectionTitle> 3 QBTE Model 1 </SectionTitle> <Paragraph position="0"> This section presents a framework, QBTE Model 1, to construct a QA system from question-answer pairs based on the QBTE Approach. When a user gives a question, the framework nds answers to the question in the following two steps.</Paragraph> <Paragraph position="1"> Document Retrieval retrieves the top N articles or paragraphs from a large-scale corpus.</Paragraph> <Paragraph position="2"> QBTE creates input data by combining the question features and documents features, evaluates the input data, and outputs the top M answers.3 Since this paper focuses on QBTE, this paper uses a simple idf method in document retrieval.</Paragraph> <Paragraph position="3"> Let wi be words and w1,w2,. . .wm be a document. Question Answering in the QBTE Model 1 involves directly classifying words wi in the document into answer words or non-answer words. That is, given input x(i) for wi, its class label is selected from among {I, O, B} as follows: I: if the word is in the middle of the answer word sequence; O: if the word is not in the answer word sequence; B: if the word is the start word of the answer word sequence.</Paragraph> <Paragraph position="4"> The class labeling system in our experiment is IOB2 (Sang, 2000), which is a variation of IOB (Ramshaw and Marcus, 1995).</Paragraph> <Paragraph position="5"> Input x(i) of each word is de ned as described below. null</Paragraph> <Section position="1" start_page="216" end_page="217" type="sub_section"> <SectionTitle> 3.1 Feature Extraction </SectionTitle> <Paragraph position="0"> This paper employs three groups of features as features of input data: A Question Feature Set (QF) is a set of features extracted only from a question sentence. This feature set is de ned as belonging to a question sentence. null The following are elements of a Question Feature Set: qw: an enumeration of the word n-grams (1 [?] n [?] N), e.g., given question What is CNN? , the features are {qw:What, qw:is, qw:CNN, qw:What-is, qw:is-CNN } if N = 2, qq: interrogative words (e.g., who, where, what, phological analyzer ChaSen. For example, Tokyo is analyzed as POS1 = noun, POS2 = propernoun, POS3 = location, and POS4 = general. This paper used up to 4-grams for qw.</Paragraph> <Paragraph position="1"> Document Feature Set (DF) is a feature set extracted only from a document. Using only DF corresponds to unbiased Term Extraction (TE). For each word wi, the following features are extracted: null dw-k,. . .,dw+0,. . .,dw+k: k preceding and following words of the word wi, e.g., { dw 1:wi[?]1, dw+0:wi, dw+1:wi+1} if k = 1, dm1-k,. . .,dm1+0,. . .,dm1+k: POS1 of k preceding and following words of the word wi, dm2-k,. . .,dm2+0,. . .,dm2+k: POS2 of k preceding and following words of the word wi, dm3-k,. . .,dm3+0,. . .,dm3+k: POS3 of k preceding and following words of the word wi, dm4-k,. . .,dm4+0,. . .,dm4+k: POS4 of k preceding and following words of the word wi. In this paper, k is set to 3 so that the window size is 7.</Paragraph> <Paragraph position="2"> Combined Feature Set (CF) contains features created by combining question features and document features. QBTE Model 1 employs CF. For each word wi, the following features are created.</Paragraph> <Paragraph position="3"> cw-k,. . .,cw+0,. . .,cw+k: matching results (true/false) between each of dw k,...,dw+k features and any qw feature, e.g., cw 1:true if dw 1:President and qw: President, cm1-k,. . .,cm1+0,. . .,cm1+k: matching results (true/false) between each of dm1 k,...,dm1+k features and any POS1 in qm1 features, cm2-k,. . .,cm2+0,. . .,cm2+k: matching results (true/false) between each of dm2 k,...,dm2+k features and any POS2 in qm2 features, cm3-k,. . .,cm3+0,. . .,cm3+k: matching results (true/false) between each of dm3 k,...,dm3+k features and any POS3 in qm3 features, cm4-k,. . .,cm4+0,. . .,cm4+k: matching results (true/false) between each of dm4 k,...,dm4+k features and any POS4 in qm4 features, cq-k,. . .,cq+0,. . .,cq+k: combinations of each of dw k,...,dw+k features and qw features, e.g., cq 1:President&Who is a combination of dw 1:President and qw:Who.</Paragraph> </Section> <Section position="2" start_page="217" end_page="217" type="sub_section"> <SectionTitle> 3.2 Training and Execution </SectionTitle> <Paragraph position="0"> The training phase estimates a probabilistic model from training data (x(1),y(1)),...,(x(n),y(n)) generated from the CRL QA Data. The execution phase evaluates the probability of yprime(i) given inputxprime(i) using the the probabilistic model.</Paragraph> </Section> <Section position="3" start_page="217" end_page="218" type="sub_section"> <SectionTitle> Training Phase </SectionTitle> <Paragraph position="0"> 1. Given question q, correct answer a, and document d.</Paragraph> <Paragraph position="1"> 2. Annotate <A> and </A> right before and after answer a in d.</Paragraph> <Paragraph position="2"> 3. Morphologically analyze d.</Paragraph> <Paragraph position="3"> 4. For d = w1, ...,<A> , wj, ..., wk,</A> , ..., wm, extract features as x(1),...,x(m).</Paragraph> <Paragraph position="4"> 5. Class label y(i) = B if wi follows <A> , y(i) = I if wi is inside of <A> and </A> , and y(i) = O otherwise.</Paragraph> <Paragraph position="5"> 1. Given question q and paragraph d. 2. Morphologically analyze d.</Paragraph> <Paragraph position="6"> 3. For wi of d = w1, ..., wm, create input data xprime(i) by extracting features.</Paragraph> <Paragraph position="7"> 4. For each yprime(j) [?] Y, compute pl [?] (yprime(j)|xprime(i)), which is a probability of yprime(j) given xprime(i). 5. For each xprime(i), yprime(j) with the highest probability is selected as the label of wi.</Paragraph> <Paragraph position="8"> 6. Extract word sequences that start with the word labeled B and are followed by words labeled I from the labeled word sequence of d.</Paragraph> <Paragraph position="9"> 7. Rank the top M answers according to the prob null ability of the rst word.</Paragraph> <Paragraph position="10"> This approach is designed to extract only the most highly probable answers. However, pin-pointing only answers is not an easy task. To select the top ve answers, it is necessary to loosen the condition for extracting answers. Therefore, in the execution phase, we only give label O to a word if its probability exceeds 99%, otherwise we give the second most probable label.</Paragraph> <Paragraph position="11"> As a further relaxation, word sequences that include B inside the sequences are extracted for answers. This is because our preliminary experiments indicated that it is very rare for two answer candidates to be adjacent in Question-Biased Term Extraction, unlike an ordinary Term Extraction task.</Paragraph> </Section> </Section> <Section position="5" start_page="218" end_page="219" type="metho"> <SectionTitle> 4 Experimental Results </SectionTitle> <Paragraph position="0"> We conducted 10-fold cross validation using the CRL QA Data. The output is evaluated using the Top5 score and MRR.</Paragraph> <Paragraph position="1"> Top5 Score shows the rate at which at least one correct answer is included in the top 5 answers. MRR (Mean Reciprocal Rank) is the average reciprocal rank (1/n) of the highest rank n of a correct answer for each question.</Paragraph> <Paragraph position="2"> Judgment of whether an answer is correct is done by both automatic and manual evaluation. Automatic evaluation consists of exact matching and partial matching. Partial matching is useful for absorbing the variation in extraction range. A partial match is judged correct if a system's answer completely includes the correct answer or the correct answer completely includes a system's answer. Table 2 presents the experimental results. The results show that a QA system can be built by using our QBTE approach. The manually evaluated performance scored MRR=0.36 and Top5=0.47. However, manual evaluation is costly and time-consuming, so we use automatic evaluation results, i.e., exact matching results and partial matching results, as a pseudo lower-bound and upper-bound of the performances. Interestingly, the manual evaluation results of MRR and Top5 are nearly equal to the average between exact and partial evaluation.</Paragraph> <Paragraph position="3"> To con rm that the QBTE ranks potential answers to the higher rank, we changed the number of paragraphs retrieved from a large corpus from N = 1, 3, 5 to 10. Table 3 shows the results. Whereas the performances of Term Extraction (TE) and Term Extraction with question features (TE+QF) signi cantly degraded, the performance of the QBTE (CF) did not severely degrade with the larger number of retrieved paragraphs.</Paragraph> </Section> <Section position="6" start_page="219" end_page="220" type="metho"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> Our approach needs no question type system, and it still achieved 0.36 in MRR and 0.47 in Top5. This performance is comparable to the results of SAIQA-II (Sasaki et al., 2004) (MRR=0.4, Top5=0.55) whose question analysis, answer candidate extraction, and answer selection modules were independently built from a QA dataset and an NE dataset, which is limited to eight named entities, such as PERSON and LOCATION. Since the QA dataset is not publicly available, it is not possible to directly compare the experimental results; however we believe that the performance of the QBTE Model 1 is comparable to that of the conventional approaches, even though it does not depend on question types, named entities, or class names.</Paragraph> <Paragraph position="1"> Most of the partial answers were judged correct in manual evaluation. For example, for How many times bigger ...? , two times is a correct answer but two was judged correct. Suppose that John Kerry is a prepared correct answer in the CRL QA Data. In this case, Senator John Kerry would also be correct. Such additions and omissions occur because our approach is not restricted to particular extraction units, such as named entities or class names. The performance of QBTE was affected little by the larger number of retrieved paragraphs, whereas the performances of TE and TE + QF signi cantly degraded. This indicates that QBTE Model 1 is not mere Term Extraction with document retrieval but Term Extraction appropriately biased by questions.</Paragraph> <Paragraph position="2"> Our experiments used no information about question types given in the CRL QA Data because we are seeking a universal method that can be used for any QA dataset. Beyond this main goal, as a reference, The Appendix shows our experimental results classi ed into question types without using them in the training phase. The results of automatic evaluation of complete matching are in Top5 (T5), and MRR and partial matching are in Top5 (T5') and MRR'.</Paragraph> <Paragraph position="3"> It is interesting that minor question types were correctly answered, e.g., SEA and WEAPON, for which there was only one training question.</Paragraph> <Paragraph position="4"> We also conducted an additional experiment, as a reference, on the training data that included question types de ned in the CRL QA Data; the question-type of each question is added to the qw feature. The performance of QBTE from the rst-ranked paragraph showed no difference from that of experiments shown in Table 2.</Paragraph> </Section> <Section position="7" start_page="220" end_page="220" type="metho"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> There are two previous studies on integrating QA components into one using machine learning/statistical NLP techniques. Echihabi et al. (Echihabi et al., 2003) used Noisy-Channel Models to construct a QA system. In this approach, the range of Term Extraction is not trained by a data set but selected from answer candidates, e.g., named entities and noun phrases, generated by a decoder. Lita et al. (Lita and Carbonell, 2004) share our motivation to build a QA system only from question-answer pairs without depending on the question types. Their method nds clusters of questions and de nes how to answer questions in each cluster. However, their approach is to nd snippets, i.e., short passages including answers, not exact answers extracted by Term Extraction.</Paragraph> </Section> class="xml-element"></Paper>