File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-2020_intro.xml

Size: 1,499 bytes

Last Modified: 2025-10-06 14:01:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2020">
  <Title>Automatic Collection of Related Terms from the Web</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 System
</SectionTitle>
    <Paragraph position="0"> Figure 1 shows the configuration of the system. The system consists of three steps: compiling corpus, automatic term recognition (ATR), and filtering. This system is implemented for Japanese language.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Compiling corpus
</SectionTitle>
      <Paragraph position="0"> The first step, compiling corpus, produces a corpus</Paragraph>
      <Paragraph position="2"> for a seed term s. In general, compiling corpus is to select the appropriate passages from a document set. We use the Web for the document set and select the passages that describe s for the corpus. The actual procedure of compiling corpus is: 1. Web page collection For a given seed term s, the system first makes four queries: &amp;quot;s toha&amp;quot;, &amp;quot;s toiu&amp;quot;, &amp;quot;s ha&amp;quot;, and &amp;quot;s&amp;quot;, where toha, ha, and toiu are Japanese functional words that are often used for defining or explaining a term. Then, the system collects the top K (= 100) pages at maximum for each query by using a search engine. If a collected page has a link whose anchor string is s, the system collects the linked page too.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML