File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2075_intro.xml

Size: 7,200 bytes

Last Modified: 2025-10-06 14:03:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2075">
  <Title>Integrating Pattern-based and Distributional Similarity Methods for Lexical Entailment Acquisition</Title>
  <Section position="4" start_page="91904" end_page="91904" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="91904" end_page="91904" type="sub_section">
      <SectionTitle>
2.1 Distributional Similarity and
Lexical Entailment
</SectionTitle>
      <Paragraph position="0"> The general idea behind distributional similarity is that words which occur within similar contexts are semantically similar (Harris, 1968). In a computational framework, words are represented by feature vectors, where features are context words weighted by a function of their statistical association with the target word. The degree of similarity between two target words is then determined by a vector comparison function.</Paragraph>
      <Paragraph position="1"> Amongst the many proposals for distributional similarity measures, (Lin, 1998) is maybe the most widely used one, while (Weeds et al., 2004) provides a typical example for recent research.</Paragraph>
      <Paragraph position="2"> Distributional similarity measures are typically computed through exhaustive processing of a corpus, and are therefore applicable to corpora of bounded size.</Paragraph>
      <Paragraph position="3"> It was noted recently by Geffet and Dagan (2004, 2005) that distributional similarity captures a quite loose notion of semantic similarity, as exemplified by the pair country - party (identified by Lin's similarity measure). Consequently, they proposed a definition for the lexical entailment relation, which conforms to the general framework of applied textual entailment (Dagan et al., 2005). Generally speaking, a word w lexically entails another word v if w can substitute v in some contexts while implying v's original meaning. It was suggested that lexical entailment captures major application needs in modeling lexical variability, generalized over several types of known ontological relationships. For example, in Question Answering (QA), the word company in a question can be substituted in the text by firm (synonym), automaker (hyponym) or subsidiary (meronym), all of which entail company.</Paragraph>
      <Paragraph position="4"> Typically, hyponyms entail their hypernyms and synonyms entail each other, while entailment holds for meronymy only in certain cases.</Paragraph>
      <Paragraph position="5"> In this paper we investigate automatic acquisition of the lexical entailment relation. For the distributional similarity component we employ the similarity scheme of (Geffet and Dagan, 2004), which was shown to yield improved predictions of (non-directional) lexical entailment pairs. This scheme utilizes the symmetric similarity measure of (Lin, 1998) to induce improved feature weights via bootstrapping. These weights identify the most characteristic features of each word, yielding cleaner feature vector representations and better similarity assessments.</Paragraph>
    </Section>
    <Section position="2" start_page="91904" end_page="91904" type="sub_section">
      <SectionTitle>
2.2 Pattern-based Approaches
</SectionTitle>
      <Paragraph position="0"> Hearst (1992) pioneered the use of lexical-syntactic patterns for automatic extraction of lexical semantic relationships. She acquired hyponymy relations based on a small predefined set of highly indicative patterns, such as &amp;quot;X, . . . , Y and/or other Z&amp;quot;, and &amp;quot;Z such as X, . . . and/or Y&amp;quot;, where X and Y are extracted as hyponyms of Z.</Paragraph>
      <Paragraph position="1"> Similar techniques were further applied to predict hyponymy and meronymy relationships using lexical or lexico-syntactic patterns (Berland and Charniak, 1999; Sundblad, 2002), and web page structure was exploited to extract hyponymy relationships by Shinzato and Torisawa (2004). Chklovski and Pantel (2004) used patterns to extract a set of relations between verbs, such as similarity, strength and antonymy. Synonyms, on the other hand, are rarely found in such patterns. In addition to their use for learning lexical semantic relations, patterns were commonly used to learn instances of concrete semantic relations for Information Extraction (IE) and QA, as in (Riloff and Shepherd, 1997; Ravichandran and Hovy, 2002; Yangarber et al., 2000).</Paragraph>
      <Paragraph position="2"> Patterns identify rather specific and informative structures within particular co-occurrences of the related words. Consequently, they are relatively reliable and tend to be more accurate than distributional evidence. On the other hand, they are susceptive to data sparseness in a limited size corpus. To obtain sufficient coverage, recent works such as (Chklovski and Pantel, 2004) applied pattern-based approaches to the web. These methods form search engine queries that match likely pattern instances, which may be verified by post-processing the retrieved texts.</Paragraph>
      <Paragraph position="3"> Another extension of the approach was automatic enrichment of the pattern set through bootstrapping. Initially, some instances of the sought  relation are found based on a set of manually defined patterns. Then, additional co-occurrences of the related terms are retrieved, from which new patterns are extracted (Riloff and Jones, 1999; Pantel et al., 2004). Eventually, the list of effective patterns found for ontological relations has pretty much converged in the literature. Amongst these, Table 1 lists the patterns that were utilized in our work.</Paragraph>
      <Paragraph position="4"> Finally, the selection of candidate pairs for a target relation was usually based on some function over the statistics of matched patterns. To perform more systematic selection Etzioni et al.</Paragraph>
      <Paragraph position="5"> (2004) applied a supervised Machine Learning algorithm (Naive Bayes), using pattern statistics as features. Their work was done within the IE framework, aiming to extract semantic relation instances for proper nouns, which occur quite frequently in indicative patterns. In our work we incorporate and extend the supervised learning step for the more difficult task of acquiring general language relationships between common nouns.</Paragraph>
    </Section>
    <Section position="3" start_page="91904" end_page="91904" type="sub_section">
      <SectionTitle>
2.3 Combined Approaches
</SectionTitle>
      <Paragraph position="0"> It can be noticed that the pattern-based and distributional approaches have certain complementary properties. The pattern-based method tends to be more precise, and also indicates the direction of the relationship between the candidate terms. The distributional similarity approach is more exhaustive and suitable to detect symmetric synonymy relations. Few recent attempts on related (though different) tasks were made to classify (Lin et al., 2003) and label (Pantel and Ravichandran, 2004) distributional similarity output using lexical-syntactic patterns, in a pipeline architecture. We aim to achieve tighter integration of the two approaches, as described next.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML