File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1120_evalu.xml

Size: 2,320 bytes

Last Modified: 2025-10-06 13:59:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1120">
  <Title>Cross-Language Information Retrieval Based on Category Matching Between Language Versions of a Web Directory</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> We are conducting experiments of the proposed method to detect relevance category of a query. In this experiment, we used the same subsets mentioned in section 4.1. We merged the categories three levels below the category &amp;quot;Computers and Internet&amp;quot; into the parent. The number of categories after category merging is 342 in English and 265 in Japanese.</Paragraph>
    <Paragraph position="1"> At first, we have done the experiment using the following formula that uses only inner product, before using the calculation mentioned in section 3.2.3.</Paragraph>
    <Paragraph position="2"> relinner(q;c) = ~q C/~c In this experiment, the query has three terms: &amp;quot;encryption&amp;quot;(=q1), &amp;quot;security&amp;quot;(=q2), and &amp;quot;system&amp;quot;(=q3).</Paragraph>
    <Paragraph position="3"> Table 1 is the list of top 10 relevant categories in first experiment. Almost all the categories in the Table 1 are relevant to the query. Thus, the relevance calculation method by only inner product is regarded as an effective method. However, this method has the following problem. The category that has few query terms might be given high relevance when the category has the only one query term whose weight in the category is extremely high.</Paragraph>
    <Paragraph position="4"> In order to reduce this effect, we propose the improved method mentioned in section 3.2.3. The method is revised to take account of the angle between ~q and ~c. Ultimately, the most relevant category has the vector whose length is long and whose factors are flat. The length is considered by inner product, on the other hand, flatness is considered by the angle between ~q and ~c.</Paragraph>
    <Paragraph position="5"> Table 2 is the list of top 10 relevant categories in the second experiment using revised method. Although noticeable improvement does not appear, the relevance of the categories which matches few query terms are ranked lower than the first experiment.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML