File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1906_metho.xml
Size: 10,638 bytes
Last Modified: 2025-10-06 14:10:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1906"> <Title>BRUJA: Question Classification for Spanish. Using Machine Translation and an English Classifier.</Title> <Section position="3" start_page="0" end_page="40" type="metho"> <SectionTitle> 2 Question Classification </SectionTitle> <Paragraph position="0"> Question Classification is the task that, given a question, classifies it in one of k semantic classes.</Paragraph> <Paragraph position="1"> Some QC systems are based on regular expressions and manual grammatical rules (Van Durme et al., 2003).</Paragraph> <Paragraph position="2"> EACL 2006 Workshop on Multilingual Question Answering - MLQA06 Recent works in QC have studied different machine learning methods. (Zhang and Lee, 2003) propose a QC system that uses Support Vector Machine (SVM) as the best machine learning algorithm. They compare the obtained results with other algorithms, such as Nearest Neighbors, Naive Bayes, Decision Tree or Sparse Network of Winnows (SNoW).</Paragraph> <Paragraph position="3"> (Li and Roth, 2002) propose a system based on SNoW. They used five main classes and fifty fined classes. Other systems have used SVM and modified kernels.</Paragraph> <Paragraph position="4"> QC systems have some restrictions (Hacioglu and Ward, 2003), such as: + Traditional question classification uses a set of rules, for instance &quot;questions that start with Who ask about a person&quot;. These are manual rules that have to be revised to improve the results.</Paragraph> <Paragraph position="5"> + These rules are very weak, because when new questions arise, the system has to be updated to classify them.</Paragraph> <Paragraph position="6"> Most of the QC systems use English as the main language, and some of the best and standard resources are developed for English.</Paragraph> <Paragraph position="7"> It would be possible to build a question classifier for every language based on machine learning, using a good training corpus for each language, but is something expensive to produce. For this reason we have used Machine Translation Systems. null Machine Translation (MT) systems are very appreciated in CLIR (McNamee et al., 2000). Last years these systems have improved the results, but there are not translators for each language pair and the quality of the result depends on this pair. The reason of using MT and not a Spanish classifier is simple: we have developed a multilingual QA system that works in this moment with three languages: English, Spanish and French. Because it is too complex for us to work with resources into these three languages and also to manage the information into three languages, our kernel system works into English, and we use MT to translate information when it is necessary.</Paragraph> <Paragraph position="8"> We have developed a QC system that covers three tasks: + It uses machine learning algorithms. We have tested methods based on Support Vector Machine, for instance SVMLight or LibSVM, and TiMBL. TiMBL 1 is a program that implements several Memory-Based Learning techniques. It stores a representation of the training set explicitly in memory, and classifies new cases by extrapolation from the most similar stored cases.</Paragraph> <Paragraph position="9"> + To classify Spanish questions we have checked two online machine translators. Our proposal is to study how the translation can affect in the final results, compared to the original English results.</Paragraph> <Paragraph position="10"> + Finally, we would obtain different results applying different levels of features (lexical, syntactic and semantic). In the next section well explain them and in results chapter we will see these differences.</Paragraph> <Paragraph position="11"> Our QC system has three independent modules, so it will be easy to replace each one with other to improve the final results. In Figure 1 we can see them.</Paragraph> <Paragraph position="12"> The first module translates the question into other languages, Spanish in this case. We have used two machine translation systema that work well for the language pair Spanish-English: Epals and Prompt. This module could work with other machine translation systems and other languages if there would be a good translator for the language pair used.</Paragraph> <Paragraph position="13"> The second module extracts some relevant features (see next section) using the original or the translated English questions. Some of these features will be used by the machine learning module (lexical, syntactic and semantic features) and the others will be used later in the answers extraction phase. Take into account that the second module also extracts important features such as the context of the question, the focus or the keywords that we would use in next steps of the Question Answering system.</Paragraph> <Paragraph position="14"> The final module applies the machine learning algorithm and returns the question category or class. In our first experiments we used Library for Support Vector Machines (LibSVM) and Bayesian Logistic Regression (BBR), but for this one we have used Tilburg Memory Based Learner (TiMBL).</Paragraph> <Paragraph position="15"> TiMBL (Daelemans et al., 2004) implements several Memory-Based Learning techniques, classic k-NN classification kernel and several metrics. It implements Stanfill modified value difference metric (MVDM), Jeffrey Divergence and Class voting in the k-NN kernel according to the distance of the nearest neighbors. It makes classification using heuristic approximations, such as the IGTREE decision tree algorithm and the TRIBL and TRIBL2 hybrids. It also has optimizations for fast classification.</Paragraph> <Section position="1" start_page="40" end_page="40" type="sub_section"> <SectionTitle> 2.1 Features in Question Classification </SectionTitle> <Paragraph position="0"> We have analyzed each question in order to extract the following features: + Lexical Features - The two first words of the question - All the words of the question in lower-case null - The stemming words - Bigrams of the question - Each word with its position in the question null - The interrogative pronoun of the question null - The headwords of the nouns and verbs + Syntactic Features - The interrogative pronoun and the Part of Speech (POS) of the rest of the words - The headword (a word to which an independent meaning can be assigned) of the first noun phrase - All POS - Chunks - The first verb chunk - The length of the question + Semantic Features - The question focus (a noun phrase that is likely to be present in the answer) - POS with the named entities recognized - The type of the entity if the focus is one of them - Wordnet hypernyms for the nouns and Wordnet synonyms for the verbs We have used some English resources such as the POS tagger TreeTagger (Schmid, 1994), Lingpipe 2 to make Named Entity Recognition, and the Porter stemmer (Porter, 1980). We have also used Wordnet to expand the queries.</Paragraph> </Section> </Section> <Section position="4" start_page="40" end_page="41" type="metho"> <SectionTitle> 3 Experiments and Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="40" end_page="41" type="sub_section"> <SectionTitle> 3.1 Experimental Method </SectionTitle> <Paragraph position="0"> The experiments are made using some public datasets available by USC (Hovy et al., 1999), UIUC and TREC 3 as training and test collections.</Paragraph> <Paragraph position="1"> These datasets have been labeled manually by UIUC group by means of the following general and detailed categories: ABBR: abbreviation, expansion.</Paragraph> <Paragraph position="2"> DESC: definition, description, manner, reason. ENTY: animal, body, color, creation, currency, disease/medical, event, food, instrument, language, letter, other, plant, product, religion, sport, substance, symbol, technique, term, vehicle, word.</Paragraph> <Paragraph position="3"> HUM: description, group, individual, title.</Paragraph> <Paragraph position="4"> LOC: city, country, mountain, other, state.</Paragraph> <Paragraph position="5"> NUM: code, count, date, distance, money, order, other, percent, period, speed, temperature, size, weight.</Paragraph> <Paragraph position="6"> For instance the question &quot;What does NATO mean?&quot; is an ABBR (abbreviation) category, &quot;What is a receptionist?&quot; is a DESC (definition) category or &quot;When did George Bush born?&quot; is a NUM (numeric) category.</Paragraph> <Paragraph position="7"> The training data are a set of 5500 questions and the test data are a set of 500 questions. All questions were labelled for the 10th conference Cross- null The same dataset has been used in other investigations, such as in (Li and Roth, 2002). The distribution of these 5500 training questions, with respect to its interrogative pronoun or the initial word is showed in Table 1.</Paragraph> <Paragraph position="8"> Likewise, the distribution of categories of these 5500 training questions is showed in Table 2.</Paragraph> <Paragraph position="9"> The distribution of the 500 test questions, with respect to its interrogative pronoun or the initial word, is showed in Table 3, and the distribution of categories of these 500 test questions is showed in In our experiments we try to identify the general category. Our proposal is to try a detailed classification later.</Paragraph> <Paragraph position="10"> We have used the Accuracy as a general measure and the Precision of each category as a detailed measure.</Paragraph> <Paragraph position="12"> Other measure used is the F-score, defined as the harmonic mean of precision and recall (Van Rijsbergen, 1979). It is a commonly used metric to summarize precision and recall in one measure.</Paragraph> <Paragraph position="13"> F !score = 2/precision/recallprecision+recall (3)</Paragraph> </Section> <Section position="2" start_page="41" end_page="41" type="sub_section"> <SectionTitle> 3.2 Results </SectionTitle> <Paragraph position="0"> We have made some experiments changing the machine translation systems: + 5500 training questions and 500 test questions, all into English. This is the basic case. + 5500 training questions into English and 500 test questions translated from Spanish using the MT Epals.</Paragraph> <Paragraph position="1"> + 5500 training questions into English and 500 test questions translated from Spanish using the MT Prompt.</Paragraph> <Paragraph position="2"> The MT resources are available in the following URLs:</Paragraph> </Section> </Section> class="xml-element"></Paper>