File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1161_evalu.xml

Size: 10,962 bytes

Last Modified: 2025-10-06 13:58:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1161">
  <Title>Lexical Query Paraphrasing for Document Retrievala0</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 Evaluation
</SectionTitle>
    <Paragraph position="0"> For our evaluation, we performed two retrieval tasks on the TREC LA Times collection, using TREC judgments to identify the queries that had relevant documents in this collection. Our main evaluation was performed for the TREC-9 question-answering task, since our ultimate goal is to answer questions posed to an Internet resource. From a total of 131,896 documents in the collection, 1211 documents contained the correct answer for 404 of the 693 TREC-9 queries. An additional evaluation was performed for the TREC-6 ad-hoc retrieval task, where 1105 documents were judged relevant to 48 of the 50 TREC-6 keyword-based queries.</Paragraph>
    <Paragraph position="1"> Our results show that query paraphrasing improves overall retrieval performance. For the ad-hoc task, when 20 retrieved documents were retained for each query, 22 correct documents in total were retrieved without paraphrasing, while a maximum of 20 paraphrases per query yielded 35 correct documents (only 18 of the 48 queries were paraphrased). For the question answering task, under the same retrieval conditions, recall improved from 294 correct documents without paraphrasing to 337 with a maximum of 20 paraphrases per query. Specifically, the number of queries for which correct documents were retrieved improved from 169 to 182.</Paragraph>
    <Paragraph position="2"> In addition, we tested the effect of the following factors on retrieval performance.</Paragraph>
    <Paragraph position="3"> a0 WordNet co-locations - three usages of word co-locations (none, for scoring only, for scoring and paraphrase generation).</Paragraph>
    <Paragraph position="4"> a0 Tagging accuracy - manually-corrected tagging versus automatic PoS tagging (Brill, 1992), which tagged correctly 84% of the queries.</Paragraph>
    <Paragraph position="6"> should take into account the word order in a query (strict consideration, ignore word order, intermediate).</Paragraph>
    <Paragraph position="7"> a0 Absent adjacent-pair divisor (AbsAdjDiv) - how much we should penalize lemma-pairs that are adjacent in the query but absent from the corpus (same penalty as non-adjacent absent lemmapairs, a little higher, a lot higher).</Paragraph>
    <Paragraph position="8"> a0 Query length - how the number of words in the query affects retrieval performance.</Paragraph>
    <Paragraph position="9"> For each run, we submitted to the retrieval engine increasing sets of paraphrases as follows: first the lemmatized query alone (Set 0), next the query plus 1 paraphrase (Set 1), then the query plus 2 paraphrases (Set 2), and so on, up to a maximum of 19 paraphrases (Set 19). For each submission, we varied the number of documents returned by the retrieval engine from 1 to 20 documents.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.1 WordNet Co-locations
</SectionTitle>
      <Paragraph position="0"> As indicated above, we considered three usages of WordNet with respect to word co-locations: Col, 0 5 10 15 20290  NoCol and ColScore. Under the Col setting, our mechanism checked whether a lemma-pair in the input query corresponds to a WordNet co-location, and if so, generated synonyms for the pair, instead of the individual lemmas. For instance, given the lemma-pair &amp;quot;folic acid&amp;quot;, the Col setting yielded synonyms such as &amp;quot;folate&amp;quot; and &amp;quot;vitamin m&amp;quot; for the lemma-pair. During paraphrase scoring, these co-locations were assigned a high frequency score, corresponding to the 999th percentile of pair frequencies in the corpus. In contrast, the NoCol setting did not take into account WordNet co-locations at all. For instance, one of the paraphrases generated by this method for &amp;quot;folic acid&amp;quot; was &amp;quot;folic lsd&amp;quot;. ColScore is a hybrid setting, where WordNet was used for scoring lemma-pairs in the proposed paraphrases, but not for generating them.</Paragraph>
      <Paragraph position="1"> Figure 1 depicts the total number of correct documents retrieved (for 20 retrieved documents per query), for each of the three co-location settings, as a function of the number of paraphrases in a set (from 0 to 19). The values for the other factors were: a62 order=1, AbsAdjDiv=2, and manually-corrected tagging. 294 correct documents were retrieved when only the lemmatized query was submitted for retrieval (0 paraphrases). This number increases dramatically for the first few paraphrases, and eventually levels out for about 12 paraphrases.</Paragraph>
      <Paragraph position="2"> In order to compare queries that had different numbers of paraphrases, when the maximum number of paraphrases for a query was less than 19, the results obtained for this maximum number were replicated for the paraphrase sets of higher cardinality. For instance, if only 6 paraphrases were generated for a query, the number of correct documents retrieved  for the 6 paraphrases was replicated for Sets 7 to 19. Figure 2 depicts the total number of correct documents retrieved (for 19 paraphrases or maximum paraphrases), for each of the three co-location settings, as a function of the number of documents retrieved per query (from 1 to 20). As for Figure 1, paraphrasing improves retrieval performance. In addition, as expected, recall performance improves as more documents are retrieved.</Paragraph>
      <Paragraph position="3"> The Col setting generally yielded fewer and more felicitous paraphrases than those generated without considering co-locations (for the 118 queries where co-locations were identified). Surprisingly however, this effect did not transfer to the retrieval process, as the NoCol setting yielded a marginally better performance. This difference in performance may be attributed to whether a lemma or lemma-pair that was important for retrieval was retained in enough paraphrases. This happened in 9 instances of the NoCol setting and 2 instances of the Col setting, yielding a slightly better performance for the NoCol setting overall. For example, the identification of &amp;quot;folic acid&amp;quot; as a co-location led to synonyms such as &amp;quot;vitamin m&amp;quot; and &amp;quot;vitamin bc&amp;quot;, which appeared in most of the paraphrases. As a result, the effect of the lemma-pair &amp;quot;folic acid&amp;quot;, which was actually responsible for retrieving the correct document, was obscured. In contrast, the recognition of &amp;quot;major league&amp;quot; as a co-location (which was paraphrased to &amp;quot;big league&amp;quot; in only 3 of the 19 paraphrases) enabled the retrieval of the correct document. Since the performance under the ColScore condition was consistently worse than the performance under the other two conditions, we do not consider it in the rest of our evaluation.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.2 Tagging accuracy
</SectionTitle>
      <Paragraph position="0"> The PoS-tagger incorrectly tagged 64 of the 404 queries in our corpus (usually, one word was mis-tagged in each of these queries). The instances of mis-tagging which had the largest impact on the quality of the generated paraphrases occurred when nouns were mis-tagged as verbs and vice versa (18 cases). In addition, proper nouns were mis-tagged as other PoS and vice versa in 24 cases, and the verb &amp;quot;name&amp;quot; (e.g., &amp;quot;Name the highest mountain&amp;quot;) was mis-tagged as a noun in 17 instances. Surprisingly, retrieval performance was affected only in 5 instances both for the Col and the NoCol settings: 3 of these instances had a mis-tagged &amp;quot;name&amp;quot;, and 2 had a noun mis-tagged as another PoS.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.3 Out-of-order weight
</SectionTitle>
      <Paragraph position="0"> We considered three settings for the out-of-order weight, a62 order (Equation 3): 1, 0 and 0.5. The first setting ignores word order. For instance, given the query &amp;quot;how many dogs pull a sled in the Iditarod?&amp;quot; the frequency of the lemma-pair &amp;quot;dog-pull&amp;quot; is added to that of the pair &amp;quot;pull-dog&amp;quot;. The second setting enforces a strict word order, e.g., only &amp;quot;dog-pull&amp;quot; is considered. The third setting considers out-of-order lemma-pairs, but gives their frequency half the weight of the ordered pairs.</Paragraph>
      <Paragraph position="1"> Interestingly, this factor had no effect on retrieval performance. This may be explained by the observation that the lemma order in the queries reflects their order in the corpus. Thus, when an ordered lemma-pair in a query matches a dictionary entry, the additional frequency count contributed by the reverse lemma order is often insufficient to affect significantly the relative score of the paraphrases.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.4 Penalty for absent adjacent lemma-pairs
</SectionTitle>
      <Paragraph position="0"> We considered four settings for the penalty assigned to lemma-pairs that are adjacent in a paraphrase but absent from the dictionary. These settings are represented by the values 1, 2, 10 and 20 for the divisor AbsAdjDiv. For instance, a value of 10 means that the score for an absent adjacent lemma-pair is 1/10 of the score of an absent non-adjacent lemma-pair.</Paragraph>
      <Paragraph position="1"> That is, the score of a paraphrase is divided by 100 for each absent adjacent lemma-pair.</Paragraph>
      <Paragraph position="2"> This factor had only a marginal effect on retrieval performance, with the best performance being obtained for AbsAdjDiv = 10.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.5 Query Length
</SectionTitle>
      <Paragraph position="0"> Our investigation of the effect of query length on retrieval performance indicates that better performance is obtained for shorter queries. Figure 3 shows the percentage of queries where at least one correct document was retrieved, as a function of  query length in words (20 documents were retrieved using 19 or maximum paraphrases). These results were obtained for the settings Col, a62 order a2 a16 and AbsAdjDiv=10, with manually-corrected tagging. As seen in Figure 3, there is a drop in retrieval performance for queries with more than 5 words.</Paragraph>
      <Paragraph position="1"> These results generally concur with the observations in (Sanderson, 1994; Gonzalo et al., 1998). Nonetheless, on average we returned a correct document for 42% of the queries which had 6 to 11 words.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML