File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1026_intro.xml

Size: 3,141 bytes

Last Modified: 2025-10-06 14:01:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1026">
  <Title>Organizing Encyclopedic Knowledge based on the Web and its Application to Question Answering</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Reflecting the growth in utilization of the World Wide Web, a number of Web-based language processing methods have been proposed within the natural language processing (NLP), information retrieval (IR) and artificial intelligence (AI) communities. A sample of these includes methods to extract linguistic resources (Fujii and Ishikawa, 2000; Resnik, 1999; Soderland, 1997), retrieve useful information in response to user queries (Etzioni, 1997; McCallum et al., 1999) and mine/discover knowledge latent in the Web (Inokuchi et al., 1999).</Paragraph>
    <Paragraph position="1"> In this paper, mainly from an NLP point of view, we explore a method to produce linguistic resources.</Paragraph>
    <Paragraph position="2"> Specifically, we enhance the method proposed by Fujii and Ishikawa (2000), which extracts encyclopedic knowledge (i.e., term descriptions) from the Web.</Paragraph>
    <Paragraph position="3"> In brief, their method searches the Web for pages containing a term in question, and uses linguistic expressions and HTML layouts to extract fragments describing the term. They also use a language model to discard non-linguistic fragments. In addition, a clustering method is used to divide descriptions into a specific number of groups.</Paragraph>
    <Paragraph position="4"> On the one hand, their method is expected to enhance existing encyclopedias, where vocabulary size is relatively limited, and therefore the quantity problems has been resolved.</Paragraph>
    <Paragraph position="5"> On the other hand, encyclopedias extracted from the Web are not comparable with existing ones in terms of quality. In hand-crafted encyclopedias, term descriptions are carefully organized based on domains and word senses, which are especially effective for human usage. However, the output of Fujii's method is simply a set of unorganized term descriptions. Although clustering is optionally performed, resultant clusters are not necessarily related to explicit criteria, such as word senses and domains.</Paragraph>
    <Paragraph position="6"> To sum up, our belief is that by combining extraction and organization methods, we can enhance both quantity and quality of Web-based encyclopedias.</Paragraph>
    <Paragraph position="7"> Motivated by this background, we introduce an organization model to Fujii's method and reformalize the whole framework. In other words, our proposed method is not only extraction but generation of encyclopedic knowledge.</Paragraph>
    <Paragraph position="8"> Section 2 explains the overall design of our encyclopedia generation system, and Section 3 elaborates on our organization model. Section 4 then explores a method for applying our resultant encyclopedia to NLP research, specifically, question answering. Section 5 performs a number of experiments to evaluate our methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML