File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-1315_evalu.xml

Size: 3,889 bytes

Last Modified: 2025-10-06 13:58:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1315">
  <Title>Empirical Term Weighting and Expansion Frequency</Title>
  <Section position="7" start_page="121" end_page="122" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Two measures of performance are reported: (1) 11 point average precision and (2) R, precision after retrieving Nrd documents, where Nrd is the number of relevant documents. We used the &amp;quot;short query&amp;quot; condition of the NACSIS NTCIR-1 Test Collection (Kando et al., 1999) which consists of about 300,000 documents in Japanese, plus about 30 queries with labeled relevance judgement for training and 53 queries with relevance judgements for testing. The result of &amp;quot;short query&amp;quot; is shown in page 25 of(Kando et al., 1999), which shows that &amp;quot;short query&amp;quot; is hard for statistical methods. Two previously published systems are included in the tables below: JCB1 and BKJJBIDS. JCB1, submitted by Just System, a company with a commercially successful product for Japanese wordprocessing, produced the best results using sophisticated (and proprietary) natural language processing techniques.(Fujita, 1999) BKJJBIDS used Berkeley's logistic regression methods (with about half a dozen variables) to fit term weights to the labeled training material.</Paragraph>
    <Paragraph position="1"> Table 8 shows that training often helps. The methods above the line (with the possible exception of JCB1) use training; the methods below the line do not. Fit-E has very respectable performance, nearly up to the level of JCB1, not bad for a purely statistical method.</Paragraph>
    <Paragraph position="2"> The performance of fit-B is close to that of BKJJBIDS. For comparison sake, fit-B is shown both with and without the K filter. The K filter restricts terms to sequences of Katakana and Kanji characters. BKJJBIDS uses a similar heuristic to eliminate Japanese function words. Although the K filter does not change performance very much, the use of this filter changes the relative order of fit-B and BKJJBIDS. These results suggest that R * 2: restrict terms to bigrams explicitly men.351 tioned in query (where ~- D) .363 * 2+: restrict terms to bigrams, but include .293 where = E as well as where = D  on these choices, though not too much, which is fortunate, since since we don't understand stop  lists very well.</Paragraph>
    <Paragraph position="3"> filter trained on sys.</Paragraph>
    <Paragraph position="4"> 2+, E1 tf, where,ef fit-E 2+, E2 tf, where,ef fit-E 2+, E4 tf, where,ef fit-E  formance of the best method (fit-E) to nearly the level of JCB1.</Paragraph>
    <Paragraph position="5"> the K filter is slightly unhelpful.</Paragraph>
    <Paragraph position="6"> A number of filters have been considered (table 9). Results vary somewhat depending on these choices, though not too much, which is fortunate, since since we don't understand stop lists very well. To the extent that there is a pattern, we suspect that words axe slightly better than bigrams, and that the E filter is slightly better than the B filter which is slightly better than the K filter. Table 10 shows that the best filters (Ek) improve the performance of the best method (fit-E) to nearly the level of JCB1.</Paragraph>
    <Paragraph position="7"> filter sys. UL  slightly better than one, and oneis slightly better than none. (UL = upper limit of ~ &lt; idf; LL = lower limit of 0 _&lt; ~)  The final experiment (table 11) shows that restricting ~ to 0 &lt; ~ &lt; id\] improves performance slightly. The combination of both the upper limit and the lower limit is slightly better than just one limit which is better than none. We view limits as a robustness device. Hopefully, they won't have to do much but every once in a while they prevent the system from wandering far astray.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML