File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1201_metho.xml

Size: 8,290 bytes

Last Modified: 2025-10-06 14:08:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1201">
  <Title>A Language Independent Method for Question Classification</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Learning Question Classifiers
</SectionTitle>
    <Paragraph position="0"> Question classification is very similar to text classification. One thing they have in common is that in both cases we need to assign a class, from a finite set of possible classes, to a natural language text. Another similarity is attribute information; what has been used as attributes for text classification can also be extracted and used in question classification. Finally, in both cases we have high dimensional attributes: if we want to use the bag-of-words approach, we will face the problem of having very large attribute sets.</Paragraph>
    <Paragraph position="1"> An important difference is that question classification introduces the problem of dealing with short sentences, compared with text documents, and thus we have less information available on each question instance. This is the reason why question classification approaches are trying to use other information (e.g. chunks and named entities) besides the words within the questions.</Paragraph>
    <Paragraph position="2"> However, the main disadvantage of relying on semantic analyzers, named entity taggers and the like, is that for some languages these tools are not yet well developed. Plus, most of them are very sensitive to changes in the domain of the corpus; and even if these tools are accurate, in some cases acquiring one for a particular language may be a difficult task. This is our prime motivation for searching for different, more easier to gather, information to solve the question classification problem. Our learning scenario considers as attribute information prefixes of words in combination with attributes whose values are obtained from the Internet.</Paragraph>
    <Paragraph position="3"> These Internet based attributes are targeted to extract evidence of the possible semantic class of the question.</Paragraph>
    <Paragraph position="4"> The next subsection will explain how the Internet is used to extract attributes for our question classification problem. In subsection 3.2 we present a brief description of Support Vector Machines, the learning algorithm used on our experiments.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Using Internet
</SectionTitle>
      <Paragraph position="0"> As Kilgarriff and Grefenstette wrote, the Internet is a fabulous linguists' playground (Kilgarriff and Grefenstette, 2003). It has become the greatest information source available worldwide, and although English is the dominant language represented on the Internet it is very likely that one can find information in almost any desired language. Considering this, and the fact that the texts are written in natural language, we believe that new methods that take advantage of this large corpus must be devised.</Paragraph>
      <Paragraph position="1"> In this work we propose using the Internet in order to acquire information that can be used as attributes in our classification problem. This attribute information can be extracted automatically from the web and the goal is to provide an estimate about the possible semantic class of the question.</Paragraph>
      <Paragraph position="2"> The procedure for gathering this information from the web is as follows: we use a set of heuristics to extract from the question a word w, or set of words, that will complement the queries submitted for the search. We then go to a search engine, in this case Google, and submit queries using the word w in combination with all the possible semantic classes for our purpose. For instance, for the question Who is the President of the French Republic? we extract the word President using our heuristics, and run 5 queries in the search engine, one for each possible class.</Paragraph>
      <Paragraph position="3"> These queries take the following form:  * &amp;quot;President is a person&amp;quot; * &amp;quot;President is a place&amp;quot; * &amp;quot;President is a date&amp;quot; * &amp;quot;President is a measure&amp;quot; * &amp;quot;President is an organization&amp;quot;  We count the number of results returned by Google for each query and normalize them by their sum. The resultant numbers are the values for the attributes used by the learning algorithm. As can be seen, it is a very straightforward approach, but as the experimental results will show, this information gathered from the Internet is quite useful. In Table 1 we present the figures obtained from Google for the question presented above, column Results show the number of hits returned by the search engine and in column Normalized we present the number of hits normalized by the total of all results returned for the different queries.</Paragraph>
      <Paragraph position="4"> An additional advantage of using the Internet is that by approximating the values of attributes in this way, we take into account words or entities belonging to more than one class (polysemy). null Now that we have introduced the use of the Internet in this work, we continue describing the set of heuristics that we use in order to perform the web search.</Paragraph>
      <Paragraph position="5">  We begin by eliminating from the questions all words that appear in our stop list. This stop list contains the usual items: articles, prepositions and conjunctions plus all the interrogative adverbs and all lexical forms of the verb &amp;quot;to be&amp;quot;. The remaining words are sent to the search engine in combination with the possible semantic classes, as described above. If no results are returned for any of the semantic classes we then start eliminating words from right to left until the search engine returns results for at least one of the semantic categories. As an example consider the question posed previously: Who is the President of the French Republic? we eliminate the words from the stop list and then formulate queries for the remaining words. These queries are of the following form: &amp;quot;President French Republic is a si&amp;quot; where s [?] {Person,Organization,Place,Date,Measure}.</Paragraph>
      <Paragraph position="6"> The search engine did not return any results for this query, so we start eliminating words from right to left. The query is now like this: &amp;quot;President French is a si&amp;quot; and given that again we have no results returned we finally formulate the last possible query: &amp;quot;President is a si&amp;quot; which returns results for all the semantic classes except for Date.</Paragraph>
      <Paragraph position="7"> Being heuristics, we are aware that in some cases they do not work well. Nevertheless, for the vast majority of the cases they presented surprisingly good results, in the three  languages, as shown in the experimental evaluation. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Support Vector Machines
</SectionTitle>
      <Paragraph position="0"> Given that Support Vector Machines have proven to perform well over high dimensionality data they have been successfully used in many natural language related applications, such as text classification (Joachims, 1999; Joachims, 2002; Tong and Koller, 2001) and named entity recognition (Mitsumori et al., 2004; Solorio and L'opez, 2004). This technique uses geometrical properties in order to compute the hyperplane that best separates a set of training examples (Stitson et al., 1996). When the input space is not linearly separable SVM can map, by using a kernel function, the original input space to a high-dimensional feature space where the optimal separable hyperplane can be easily calculated. This is a very powerful feature, because it allows SVM to overcome the limitations of linear boundaries. They also can avoid the over-fitting problems of neural networks as they are based on the structural risk minimization principle. The foundations of these machines were developed by Vapnik, for more information about this algorithm we refer the reader to (Vapnik, 1995; Sch&amp;quot;olkopf and Smola, 2002).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML