File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/c00-2104_relat.xml
Size: 3,781 bytes
Last Modified: 2025-10-06 14:15:34
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2104"> <Title>Experiments in Automated Lexicon Building for Text Searching</Title> <Section position="3" start_page="719" end_page="719" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> There has been a large body of work ill the collection of co-occurrence data from a broad spectrum of perspectives, fi'om information retrieval to the developnlent of statistical methods for investigating word similarity and classification. Our efforts fall somewhere in tile middle.</Paragraph> <Paragraph position="1"> Compared with document retrieval tasks, we are more closely focused on the words themselves and on specific concepts than on document &quot;aboutness.&quot; Jing and Croft (1994) exanfined words and phrases in paragraph units, and found that the association data improves retrieval performance. Callan (1994) compared paragraph units and fixed windows of text in examining passage-level retrieval.</Paragraph> <Paragraph position="2"> In the question-answering context, Morton (1999) collected document co-occurrence statistics to uncover 1)art-whole and synonymy relationships to use in a question-answering system. The key difference here was that co-occurrence was considered on a whole-docmnent basis. Harabagiu and Maiorano (1999) argued that indexing in question answering should be based on 1)aragraphs.</Paragraph> <Paragraph position="3"> One recent al)proach to automatic lexicon building has used seed words to lmild up larger sets of semmltically similar words in one or nlore categories (Riloff and Shepherd, 1997). In addition, Strzalkowski and Wang (1996) used a bootstrapping technique to identify types of references, and Riloff and Jones (1999) adapted bootstrapping techniques to lexicon building targeted to information extraction.</Paragraph> <Paragraph position="4"> In the same vein, researchers at Brown University (Caraballo and Charniak, 1999)~ (Berland and Charniak, 1999), (Caraballo, 1999) and (Roark and Charniak, 1998) focused on target constructions, in particular complex noun t)hrases, and searched for information not only on identifying classes of nouns, lint also hypernyms, noun specificity and meronymy.</Paragraph> <Paragraph position="5"> We have a diflbrent perspective than these lines of inquiry. They were specifying various semantic relationships and seeking ways to collect similar pairs. We. have a less restrictive focus and are relying on surface syntactic information about clauses.</Paragraph> <Paragraph position="6"> For more than a decade, a variety of statistical techniques have been developed and refilled. Tile focus of much of this work was to develop the methods themselves. Church and Hanks (1989) explored tile use of mutual information statistics in ranking co-occurrences within five-word windows.</Paragraph> <Paragraph position="7"> Smadja (1992) gathered co-occurrences within five-word windows to find collocations, particularly in specific domains. Hindle (1990) classified nouns on the basis of co-occurring patterns of subject-verb and verb-object pairs. Hatzivassiloglou and MeKeown (1993) clustered adjectives into semantic classes, and Pereira et al. (1993) clustered nouns on their appearance ill verb-object pairs. We are trying to be less restrictive in learning multiple salient relationshil)s between words rather than seeldng a particular relationship.</Paragraph> <Paragraph position="8"> Ill a way, our idea is the mirror image of Barzilay and Elhadad (1997), who used Wordnet to identify lexical chains that would coincide with cohesive text segments. We assunmd that documents are cohesive and that co-occurrence l)atterns call uncover word relationships.</Paragraph> </Section> class="xml-element"></Paper>