File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3806_metho.xml

Size: 10,272 bytes

Last Modified: 2025-10-06 14:11:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3806">
  <Title>Similarity between Pairs of Co-indexed Trees for Textual Entailment Recognition</Title>
  <Section position="3" start_page="33" end_page="34" type="metho">
    <SectionTitle>
2 Learning Textual Entailment from
examples
</SectionTitle>
    <Paragraph position="0"> To carry out automatic learning from examples, we need to define a cross-pair similarity K((Tprime,Hprime),(Tprimeprime,Hprimeprime)). This function should consider pairs similar when: (1) texts and hypotheses are structurally and lexically similar (structural similarity); (2) the relations between the sentences in the pair (Tprime,Hprime) are compatible with the relations in (Tprimeprime,Hprimeprime) (intra-pair word movement compatibility). We argue that such requirements could be met by augmenting syntactic trees with placeholders that co-index related words within pairs. We will then define a cross-pair similarity over these pairs of co-indexed trees.</Paragraph>
    <Section position="1" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
2.1 Training examples as pairs of co-indexed
trees
</SectionTitle>
      <Paragraph position="0"> Sentence pairs selected as possible sentences in entailment are naturally co-indexed. Many words (or expressions) wh in H have a referent wt in T. These pairs (wt,wh) are called anchors. Possibly, it is more important that the two words in an anchor are related than the actual two words. The entailment could hold even if the two words are substitued with two other related words. To indicate this we co-index words associating placeholders with anchors.</Paragraph>
      <Paragraph position="1"> For example, in Fig. 1, 2&amp;quot; indicates the (companies,companies) anchor between T1 and H1. These placeholders are then used to augment tree nodes. To better take into account argument movements, placeholders are propagated in the syntactic trees following constituent heads (see Fig. 1).</Paragraph>
      <Paragraph position="2"> In line with many other researches (e.g., (Corley and Mihalcea, 2005)), we determine these anchors using different similarity or relatedness dectors: the exact matching between tokens or lemmas, a similarity between tokens based on their edit distance, the derivationally related form relation and the verb entailment relation in WordNet, and, finally, a WordNet-based similarity (Jiang and Conrath, 1997). Each of these detectors gives a different weight to the anchor: the actual computed similarity for the last and 1 for all the others. These weights will be used in the final kernel.</Paragraph>
    </Section>
    <Section position="2" start_page="33" end_page="34" type="sub_section">
      <SectionTitle>
2.2 Similarity between pairs of co-indexed
</SectionTitle>
      <Paragraph position="0"> trees Pairs of syntactic trees where nodes are co-indexed with placeholders allow the design a cross-pair similarity that considers both the structural similarity and the intra-pair word movement compatibility.</Paragraph>
      <Paragraph position="1"> Syntactic trees of texts and hypotheses permit to verify the structural similarity between pairs of sentences. Texts should have similar structures as well as hypotheses. In Fig. 1, the overlapping subtrees are in bold. For example, T1 and T3 share the sub-tree starting with S - NP VP. Although the lexicals in T3 and H3 are quite different from those T1 and H1, their bold subtrees are more similar to those of T1 and H1 than to T1 and H2, respectively. H1 and H3 share the production NP - DT JJ NN NNS while H2 and H3 do not. To decide on the entailment for (T3,H3), we can use the value of (T1,H1).</Paragraph>
      <Paragraph position="2"> Anchors and placeholders are useful to verify if two pairs can be aligned as showing compatible intra-pair word movement. For example, (T1,H1) and (T3,H3) show compatible constituent movements given that the dashed lines connecting placeholders of the two pairs indicates structurally equivalent nodes both in the texts and the hypotheses. The dashed line between 3 and b links the main verbs both in the texts T1 and T3 and in the hypotheses H1 and H3. After substituting 3 to b and 2 to a , T1 and T3 share the subtree S - NP 2 VP 3 . The same subtree is shared between H1 and H3. This implies that words in the pair (T1,H1) are correlated like words in (T3,H3). Any different mapping between the two anchor sets would not have this property.</Paragraph>
      <Paragraph position="3"> Using the structural similarity, the placeholders, and the connection between placeholders, the over-all similarity is then defined as follows. Let Aprime and Aprimeprime be the placeholders of (Tprime,Hprime) and (Tprimeprime,Hprimeprime), respectively. The similarity between two co-indexed syntactic tree pairs Ks((Tprime,Hprime),(Tprimeprime,Hprimeprime)) is defined using a classical similarity between two trees KT(t1,t2) when the best alignment between the Aprime and Aprimeprime is given. Let C be the set of all bijective</Paragraph>
      <Paragraph position="5"> where (1) t(S,c) returns the syntactic tree of the hypothesis (text) S with placeholders replaced by means of the substitution c, (2) i is the identity substitution and (3) KT(t1,t2) is a function that measures the similarity between the two trees t1 and t2.</Paragraph>
    </Section>
    <Section position="3" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
2.3 Enhancing cross-pair syntactic similarity
</SectionTitle>
      <Paragraph position="0"> As the computation cost of the similarity measure depends on the number of the possible sets of correspondences C and this depends on the size of the anchor sets, we reduce the number of placeholders used to represent the anchors. Placeholders will have the same name if these are in the same chunk both in the text and the hypothesis, e.g., the placeholders 2' and 2&amp;quot; are collapsed to 2 .</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="34" end_page="35" type="metho">
    <SectionTitle>
3 Experimental investigation
</SectionTitle>
    <Paragraph position="0"> The aim of the experiments is twofold: we show that (a) entailments can be learned from examples and (b) our kernel function over syntactic structures is effective to derive syntactic properties. The above goals can be achieved by comparing our cross-pair similarity kernel against (and in combination with) other methods.</Paragraph>
    <Section position="1" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
3.1 Experimented kernels
</SectionTitle>
      <Paragraph position="0"> We compared three different kernels: (1) the kernel Kl((Tprime,Hprime),(Tprimeprime,Hprimeprime)) based on the intra-pair  lexical similarity siml(T,H) as defined in (Corley and Mihalcea, 2005). This kernel is defined as Kl((Tprime,Hprime),(Tprimeprime,Hprimeprime)) = siml(Tprime,Hprime)x siml(Tprimeprime,Hprimeprime). (2) the kernel Kl+Ks that combines our kernel with the lexical-similarity-based kernel; (3) the kernel Kl + Kt that combines the lexical-similarity-based kernel with a basic tree kernel.</Paragraph>
      <Paragraph position="1"> This latter is defined as Kt((Tprime,Hprime),(Tprimeprime,Hprimeprime)) = KT(Tprime,Tprimeprime)+KT(Hprime,Hprimeprime). We implemented these kernels within SVM-light (Joachims, 1999).</Paragraph>
    </Section>
    <Section position="2" start_page="35" end_page="35" type="sub_section">
      <SectionTitle>
3.2 Experimental settings
</SectionTitle>
      <Paragraph position="0"> For the experiments, we used the Recognizing Textual Entailment (RTE) Challenge data sets, which we name as D1, T1 and D2, T2, are the development and the test sets of the first and second RTE challenges, respectively. D1 contains 567 examples whereas T1, D2 and T2 have all the same size, i.e.</Paragraph>
      <Paragraph position="1"> 800 instances. The positive examples are the 50% of the data. We produced also a random split of D2.</Paragraph>
      <Paragraph position="2"> The two folds are D2(50%)prime and D2(50%)primeprime.</Paragraph>
      <Paragraph position="3"> We also used the following resources: the Charniak parser (Charniak, 2000) to carry out the syntactic analysis; the wn::similaritypackage (Pedersen et al., 2004) to compute the Jiang&amp;Conrath (J&amp;C) distance (Jiang and Conrath, 1997) needed to implement the lexical similarity siml(T,H) as defined in (Corley and Mihalcea, 2005); SVM-light-TK (Moschitti, 2004) to encode the basic tree kernel function, KT , in SVM-light (Joachims, 1999).</Paragraph>
    </Section>
    <Section position="3" start_page="35" end_page="35" type="sub_section">
      <SectionTitle>
3.3 Results and analysis
</SectionTitle>
      <Paragraph position="0"> Table 1 reports the accuracy of different similarity kernels on the different training and test split described in the previous section. The table shows some important result.</Paragraph>
      <Paragraph position="1"> First, as observed in (Corley and Mihalcea, 2005) the lexical-based distance kernel Kl shows an accuracy significantly higher than the random baseline, i.e. 50%. This accuracy (second line) is comparable with the best systems in the first RTE challenge (Dagan et al., 2005). The accuracy reported for the best systems, i.e. 58.6% (Glickman et al., 2005; Bayer et al., 2005), is not significantly far from the result obtained with Kl, i.e. 58.88%.</Paragraph>
      <Paragraph position="2"> Second, our approach (last column) is significantly better than all the other methods as it provides the best result for each combination of training and test sets. On the &amp;quot;Train:D1-Test:T1&amp;quot; testbed, it exceeds the accuracy of the current state-of-the-art models (Glickman et al., 2005; Bayer et al., 2005) by about 4.4 absolute percent points (63% vs.</Paragraph>
      <Paragraph position="3"> 58.6%) and 4% over our best lexical similarity measure. By comparing the average on all datasets, our system improves on all the methods by at least 3 absolute percent points.</Paragraph>
      <Paragraph position="4"> Finally, the accuracy produced by our kernel based on co-indexed trees Kl + Ks is higher than the one obtained with the plain syntactic tree kernel Kl + Kt. Thus, the use of placeholders and co-indexing is fundamental to automatically learn entailments from examples.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML