File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1161_intro.xml
Size: 2,493 bytes
Last Modified: 2025-10-06 14:01:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1161"> <Title>Lexical Query Paraphrasing for Document Retrievala0</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Related Research </SectionTitle> <Paragraph position="0"> The vocabulary mis-match between user queries and indexed documents is often addressed through query expansion. Two common techniques for query expansion are blind relevance feedback (Buckley et al., 1995; Mitra et al., 1998) and word sense disambiguation (WSD) (Mihalcea and Moldovan, 1999; Lytinen et al., 2000; Sch&quot;utze and Pedersen, 1995; Lin, 1998). Blind relevance feed-back consists of retrieving a small number of documents using a query given by a user, and then constructing an expanded query that includes content words that appear frequently in these documents.</Paragraph> <Paragraph position="1"> This expanded query is used to retrieve a new set of documents. WSD often precedes query expansion to avoid retrieving irrelevant information. Mihalcea and Moldovan (1999) and Lytinen et al. (2000) used a machine readable thesaurus, specifically WordNet (Miller et al., 1990), to obtain the sense of a word, while Sch&quot;utze and Pedersen (1995) and Lin (1998) used automatically constructed thesauri.</Paragraph> <Paragraph position="2"> The improvements in retrieval performance reported in (Mitra et al., 1998) are comparable to those reported here (note that these researchers consider precision, while we consider recall). The results obtained by Sch&quot;utze and Pedersen (1995) and by Lytinen et al. (2000) are encouraging. However, experimental results reported in (Sanderson, 1994; Gonzalo et al., 1998) indicate that the improvement in IR performance due to WSD is restricted to short queries, and that IR performance is very sensitive to disambiguation errors.</Paragraph> <Paragraph position="3"> Our approach to document retrieval differs from the above approaches in that the expansion of a query takes the form of alternative lexical paraphrases. Like Harabagiu et al. (2001), we use WordNet to propose synonyms for the words in a query. However, they apply heuristics to select which words to paraphrase. In contrast, we use corpus-based information in the context of the entire query to calculate the score of a paraphrase and select which paraphrases to retain, and then use the paraphrase scores to influence the document retrieval process.</Paragraph> </Section> class="xml-element"></Paper>