File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-0412_abstr.xml

Size: 1,403 bytes

Last Modified: 2025-10-06 13:43:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0412">
  <Title>Non-Contiguous Word Sequences for Information Retrieval</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> The growing amount of textual information available electronically has increased the need for high performance retrieval. The use of phrases was long seen as a natural way to improve retrieval performance over the common document models that ignore the sequential aspect of word occurrences in documents, considering them as &amp;quot;bags of words&amp;quot;. However, both statistical and syntactical phrases showed disappointing results for large document collections. In this paper we present a recent type of multi-word expressions in the form of Maximal Frequent Sequences (Ahonen-Myka, 1999).</Paragraph>
    <Paragraph position="1"> Mined phrases rather than statistical or syntactical phrases, their main strengths are to form a very compact index and to account for the sequentiality and adjacency of meaningful word co-occurrences, by allowing for a gap between words.</Paragraph>
    <Paragraph position="2"> We introduce a method for using these phrases in information retrieval and present our experiments. They show a clear improvement over the well-known technique of extracting frequent word pairs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML