File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1208_metho.xml

Size: 20,374 bytes

Last Modified: 2025-10-06 14:08:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1208">
  <Title>Question Classification using HDAG Kernel</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Question Classification
</SectionTitle>
    <Paragraph position="0"> Question classification is defined as a task that maps a given question to more than one of a3 question types (classes).</Paragraph>
    <Paragraph position="1"> In the general concept of QA systems, the result of question classification is used in a downstream process, answer selection, to select a correct answer from among the large number of answer candidates that are extracted from the source documents. The result of the question classification, that is, the labels of the question types, can reduce the number of answer candidates. Therefore, we no longer have to evaluate every noun phrase in the source documents to see whether it provides a correct answer to a given question. Evaluating only answer candidates that match the results of question classification is an efficient method of obtaining correct answers. Thus, question classification is an important process of a QA system. Better performance in question classification will lead to better total performance of the QA system.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Question Types: Classes of Questions
</SectionTitle>
      <Paragraph position="0"> Numerous question taxonomies have been defined, but unfortunately, no standard exists.</Paragraph>
      <Paragraph position="1"> In the case of the TREC QA-Track, most systems have their own question taxonomy, and these are reconstructed year by year. For example, (Ittycheriah et al., 2001) defined 31 original question types in two levels of hierarchical structure. (Harabagiu et al., 2000) also defined a large hierarchical question taxonomy, and (Hovy et al., 2001) defined 141 question types of a hierarchical question taxonomy.</Paragraph>
      <Paragraph position="2"> Within all of these taxonomies, question types are defined from the viewpoint of the target intention of the given questions, and they have hierarchical structures, even though these question taxonomies are defined by different researchers. This because the purpose of question classification is to reduce the large number of answer candidates by restricting the target intention via question types. Moreover, it is very useful to handle question taxonomy constructed in a hierarchical structure in the downstream processes.</Paragraph>
      <Paragraph position="3"> Thus, question types should be the target intention and constructed in a hierarchical structure.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Properties
</SectionTitle>
      <Paragraph position="0"> Question classification is quite similar to Text Categorization, which is one of the major tasks in Natural Language Processing (NLP). These tasks require classification of the given text to certain defined classes. In general, in the case of text categorization, the given text is one document, such as a newspaper article, and the classes are the topics of the articles. In the case of question classification, a given text is one short question sentence, and the classes are the target answers corresponding to the intention of the given question.</Paragraph>
      <Paragraph position="1"> However, question classification requires much more complicated features than text categorization, as shown by (Li and Roth, 2002). They proved that question classification needs richer information than simple key terms (bag-of-words), which usually give us high performance in text classification. Moreover, the previous work of (Suzuki et al., 2002a) showed that the sequential patterns constructed by different levels of attributes, such as words, part-of-speech (POS) and semantical information, improve the performance of question classification. The experiments in these previous works indicated that the structural and semantical features inside questions have the potential to improve the performance of question classification. In other words, high-performance question classification requires us to extract the structural and semantical features from the given question.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Learning and Classification Task
</SectionTitle>
      <Paragraph position="0"> This paper focuses on the machine learning approach to question classification. The machine learning approach has several advantages over manual methods.</Paragraph>
      <Paragraph position="1"> First, the construction of a manual classifier for questions is a tedious task that requires the analysis of a large number of questions. Moreover, mapping questions into question types requires the use of lexical items and, therefore, an explicit representation of the mapping may be very large. On the other hand, machine learning approaches only need to define features. Finally, the classifier can be more flexibly reconstructed than a manual one because it can be trained on a new taxonomy in a very short time.</Paragraph>
      <Paragraph position="2"> As the machine learning algorithm, we chose the Support Vector Machines (SVMs) (Cortes and Vapnik, 1995) because the work of (Joachims, 1998; Taira and Haruno, 1999) reported state-of-the-art performance in text categorization as long as question classification is a similar process to text categorization. null</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 HDAG Kernel
</SectionTitle>
    <Paragraph position="0"> Recently, the design of kernel functions has become a hot topic in the research field of machine learning.</Paragraph>
    <Paragraph position="1"> A specific kernel can drastically increase the performance of specific tasks. Moreover, a specific kernel can handle new feature spaces that are difficult to manage directly with conventional methods.</Paragraph>
    <Paragraph position="2"> The HDAG Kernel is a new kernel function that is designed to easily handle structured natural language data. According to the discussion in the previous section, richer information such as structural and semantical information is required for high-performance question classification.</Paragraph>
    <Paragraph position="3"> We think that the HDAG Kernel is suitable for improving the performance of question classification: The HDAG Kernel can handle various linguistic structures within texts, such as chunks and their relations, as the features of the text without converting such structures to numerical feature vectors explicitly. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Feature Space
</SectionTitle>
      <Paragraph position="0"> Figure 1 shows examples of the structures within questions that are handled by the HDAG kernel.</Paragraph>
      <Paragraph position="1"> As shown in Figure 1, the HDAG kernel accepts several levels of chunks and their relations inside the text. The nodes represent several levels of chunks including words, and directed links represent their relations. Suppose a4a6a5a8a7a10a9a5a11a7a13a12a15a14a17a16 and a4a19a18a20a7a6a9a18a21a7a13a12a23a22a24a16 represent each node. Some nodes have a graph inside themselves, which are called &amp;quot;non-terminal nodes&amp;quot;.</Paragraph>
      <Paragraph position="2"> Each node can have more than one attribute, such as words, part-of-speech tags, semantic information like WordNet (Fellbaum, 1998), and class names of the named entity. Moreover, nodes are allowed to not have any attribute, in other words, we do not have to assign attributes to all nodes.</Paragraph>
      <Paragraph position="3"> The &amp;quot;attribute sequence&amp;quot; is a sequence of attributes extracted from the node in sub-paths of HDAGs. One type of attribute sequence becomes one element in the feature vector. The framework of the HDAG Kernel allows node skips during the extraction of attribute sequences, and its cost is based the decay factor a25a27a26a21a28a17a29a30a25a32a31a30a33a19a34 , since HDAG Kernel deals with not only the exact matching of the sub-structures between HDAGs but also the approximate structure matching of them.</Paragraph>
      <Paragraph position="4"> Explicit representation of feature vectors in the HDAG kernel can be written as a35a36a26a21a37a24a34a39a38 a26a21a35a41a40a42a26a21a37a24a34a21a43a45a44a46a44a45a44a47a43a46a35a49a48a50a26a21a37a24a34a21a34 , where a35 represents the explicit feature mapping from the HDAG to the feature vector and a2 represents the number of all possible types of attribute sequences extracted to the HDAGs.</Paragraph>
      <Paragraph position="5"> The value of a35 a7a26a20a37a50a34 is the number of occurrences of thea51 'th attribute sequence in the HDAGa37 , weighted according to the node skip.</Paragraph>
      <Paragraph position="6"> Table 1 shows a example of attribute sequences that are extracted from the example question in Figure 1. The symbol a52 in the sub-path column shows that more than one node skip occurred there. The parentheses &amp;quot;( )&amp;quot; in the attribute sequence column represents the boundaries of a node. For example, attribute sequence &amp;quot;purchased-(NNP-Bush)&amp;quot; is ex-</Paragraph>
      <Paragraph position="8"> Question: George Bush purchased a small interest in which baseball team ? How far is it from Denver to Aspen</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
WRB RB VBZ PRP IN NNP TO NNP
LOCATION LOCATIONADVP
</SectionTitle>
    <Paragraph position="0"> Question: How far is it from Denver to Aspen ?  of feature vectors, extracted from the example question in Figure 1 sub-path attribute sequence: element value</Paragraph>
    <Paragraph position="2"> tracted from sub-path &amp;quot;a5a66a65 -a5a27a40 &amp;quot;, and &amp;quot;NNP-Bush&amp;quot; is in the nodea5a67a40 .</Paragraph>
    <Paragraph position="3"> The return value of the HDAG Kernel can be defined as:</Paragraph>
    <Paragraph position="5"> where input objects a95 and a96 are the objects represented in HDAG a37 a40 and a37a98a97 , respectively. According to this formula, the HDAG Kernel calculates the inner product of the common attribute sequences weighted according to their node skips and the occurrence between the two HDAGs, a37a32a40 and a37 a97 .</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Efficient Calculation Method
</SectionTitle>
      <Paragraph position="0"> In general, the dimension of the feature space a2 in equation (1) becomes very high or even infinity. It might thus be computationally infeasible to generate feature vector a35a36a26a20a37a50a34 explicitly.</Paragraph>
      <Paragraph position="1"> To solve this problem, we focus on the framework of the kernel functions defined for a discrete structure, Convolution Kernels (Haussler, 1999). One of the most remarkable properties of this kernel methodology is that it can calculate kernel functions by the &amp;quot;inner products between pairs of objects&amp;quot; while it retains the original representation of objects.</Paragraph>
      <Paragraph position="2"> This means that we do not have to map input objects to the numerical feature vectors by explicitly representing them, as long as an efficient calculation for the inner products between a pair of texts is defined.</Paragraph>
      <Paragraph position="3"> However, Convolution Kernels are abstract concepts. The Tree Kernel (Collins and Duffy, 2001) and String Subsequence Kernel (SSK) (Lodhi et al., 2002) are examples of instances in the Convolution Kernels developed in the NLP field.</Paragraph>
      <Paragraph position="4"> The HDAG Kernel also use this framework: we can learn and classify without creating explicit numerical feature vectors like equation (1). The efficient calculation of inner products between HDAGs, the return value of HDAG Kernel, was defined in a recursive formulation (Suzuki et al., 2003). This recursive formulation for HDAG Kernel can be rewritten as &amp;quot;for loops&amp;quot; by using the dynamic programming technique. Finally, the HDAG Kernel can be calculated ina99a50a26a100a9a14a32a9a101a9a22a102a9a34 time.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 ORGANIZATION 733
3 COMPANY 119
3 *COMPANY GROUP 0
3 *MILITARY 4
3 INSTITUTE 26
3 *MARKET 0
3 POLITICAL ORGANIZATION 103
4 GOVERNMENT 38
4 POLITICAL PARTY 43
4 PUBLIC INSTITUTION 19
3 GROUP 96
4 !SPORTS TEAM 20
3 *ETHNIC GROUP 4
3 *NATIONALITY 4
2 LOCATION 752
3 GPE 265
4 CITY 77
4 *COUNTY 1
4 PROVINCE 47
4 COUNTRY 116
3 REGION 23
3 GEOLOGICAL REGION 22
4 *LANDFORM 9
4 *WATER FORM 7
4 *SEA 3
3 *ASTRAL BODY 5
4 *STAR 2
4 *PLANET 2
3 ADDRESS 59
4 POSTAL ADDRESS 24
4 PHONE NUMBER 22
4 *EMAIL 4
4 *URL 8
2 FACILITY 147
3 GOE 99
4 SCHOOL 27
4 *MUSEUM 3
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 LINE 24
4 *RAILROAD 3
4 !ROAD 11
4 *WATERWAY 0
4 *TUNNEL 1
4 *BRIDGE 1
3 *PARK 2
3 *MONUMENT 3
2 PRODUCT 468
3 VEHICLE 37
4 *CAR 8
4 *TRAIN 2
4 *AIRCRAFT 5
4 *SPACESHIP 8
4 !SHIP 12
3 DRUG 15
3 *WEAPON 4
3 *STOCK 0
3 *CURRENCY 8
3 AWARD 11
3 *THEORY 1
3 RULE 66
3 *SERVICE 2
3 *CHARCTER 4
3 METHOD SYSTEM 33
3 ACTION MOVEMENT 21
3 *PLAN 1
3 *ACADEMIC 5
3 *CATEGORY 0
3 SPORTS 11
3 OFFENCE 10
3 ART 78
4 *PICTURE 2
4 *BROADCAST PROGRAM 6
4 MOVIE 15
4 *SHOW 4
4 MUSIC 13
3 PRINTING 31
4 !BOOK 10
4 *NEWSPAPER 7
4 *MAGAZINE 4
2 DISEASE 44
2 EVENT 99
3 *GAMES 8
3 !CONFERENCE 17
3 *PHENOMENA 6
3 *WAR 3
3 *NATURAL DISASTER 5
3 *CRIME 6
2 TITLE 97
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Data Set
</SectionTitle>
      <Paragraph position="0"> We used three different QA data sets together to evaluate the performance of our proposed method.</Paragraph>
      <Paragraph position="1"> One is the 1011 questions of NTCIR-QAC11, which were gathered from 'dry-run', 'formal-run' and 'additional-run.' The second is the 2000 questions described in (Suzuki et al., 2002b). The last one is the 2000 questions of CRL-QA data2. These three QA data sets are written in Japanese.</Paragraph>
      <Paragraph position="2"> These data were labeled with the 150 question types that are defined in the CRL-QA data, along with one additional question type, &amp;quot;OTHER&amp;quot;. Table 2 shows all of the question types we used in this experiment, where a104 represents the depth of the hi- null each question type, including the number of questions in &amp;quot;child question types&amp;quot;.</Paragraph>
      <Paragraph position="3"> While considering question classification as a learning and classification problem, we decided not to use question types that do not have enough questions (more than ten questions), indicated by an asterisk (*) in front of the name of the question type, because classifier learning is very difficult with very few data. In addition, after the above operations, if only one question type belongs to one parent question type, we also deleted it, which is indicated by an exclamation mark (!). Ultimately, we evaluated 68 question types.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Comparison Methods
</SectionTitle>
      <Paragraph position="0"> We compared the HDAG Kernel (HDAG) to a base-line method that is sometimes referred to as the bag-of-words kernel, a bag-of-words (BOW) with a polynomial kernel (d1: first degree polynomial kernel, d2: second degree polynomial kernel).</Paragraph>
      <Paragraph position="1"> HDAG and BOW differ in how they consider the structures of a given question. BOW only considers attributes independently (d1) or combinatorially (d2) in a given question. On the other hand, HDAG can consider the structures (relations) of the attributes in a given question.</Paragraph>
      <Paragraph position="2"> We selected SVM for the learning and classification algorithm. Additionally, we evaluated the performance using SNoW3 to compare our method to indirectly the SNoW-based question classifier (Li and Roth, 2002). Note that BOW was used as features for SNoW.</Paragraph>
      <Paragraph position="3"> Finally, we compared the performances of HDAG-SVM, BOW(d2)-SVM, BOW(d1)-SVM, and BOW-SNoW. The parameters of each comparison method were set as follows: The decay factor a25 was 0.5 for HDAG, and the soft-margin a105 of all SVM was set to 1. For SNoW, we used a106 a38a107a33a77a44a101a108a77a109a19a43a57a110a30a38a111a28a19a44a75a112 , and a113a114a38a111a108 . These parameters were selected based on preliminary experiments.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Decision Model
</SectionTitle>
      <Paragraph position="0"> Since the SVM is a two-class classification method, we have to make a decision model to determine the question type of a given question that is adapted for question classification, which is a multi-class hierarchical classification problem.</Paragraph>
      <Paragraph position="1"> Figure 2 shows how we constructed the final decision model for question classification.</Paragraph>
      <Paragraph position="2"> First, we made 68 SVM classifiers for each question type, and then we constructed &amp;quot;one-vs-rest models&amp;quot; for each node in the hierarchical question taxonomy. One of the one-vs-rest models was constructed by some of the SVM classifiers, which were the child question types of the focused node. For example, the one-vs-rest model at the node &amp;quot;TOP&amp;quot; was constructed by five SVM classifiers: &amp;quot;NAME&amp;quot;, &amp;quot;NATURAL OBJECT&amp;quot;, &amp;quot;COLOR&amp;quot;, &amp;quot;TIME TOP&amp;quot; and &amp;quot;NUMEX&amp;quot;. The total number of one-vs-rest models was 17.</Paragraph>
      <Paragraph position="3"> Finally, the decision model was constructed by setting one-vs-rest models in the hierarchical question taxonomy to determine the most plausible ques- null SVM classifiers tion type of a given question.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.4 Features
</SectionTitle>
      <Paragraph position="0"> We set four feature sets for each comparison method.</Paragraph>
      <Paragraph position="1">  1. words only (W) 2. words and named entities (W+N) 3. words and semantic information (W+S) 4. words, named entities and semantic information (W+N+S)  The words were analyzed in basic form, and the semantic information was obtained from the &amp;quot;Goitaikei&amp;quot; (Ikehara et al., 1997), which is similar to WordNet in English. Words, chunks and their relations in the texts were analyzed by CaboCha (Kudo and Matsumoto, 2002), and named entities were analyzed by the SVM-based NE tagger (Isozaki and Kazawa, 2002).</Paragraph>
      <Paragraph position="2"> Note that even when using the same feature sets, method of how to construct feature spaces are entirely different between HDAG and BOW.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.5 Evaluation Method
</SectionTitle>
      <Paragraph position="0"> We evaluated the 5011 questions by using five-fold cross-validation and used the following two approaches to evaluate the performance.</Paragraph>
      <Paragraph position="1">  1. Average accuracy of each one-vs-rest model (Macc)  This measure evaluates the performance of each one-vs-rest model independently. If a one-vs-rest model classifies a given question correctly, it scores a 1, otherwise, it scores a 0.  2. Average accuracy of each given question (Qacc)  This measure evaluates the total performance of the decision model, the question classifier. If each given question is classified in a correct question type, it scores a 1, otherwise, it scores a 0.</Paragraph>
      <Paragraph position="2"> In Qacc, classifying with a correct question type implies that all of the one-vs-rest models from the top of the hierarchy of the question taxonomy to the given question type must classify correctly.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML