File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-1203_concl.xml

Size: 2,429 bytes

Last Modified: 2025-10-06 13:55:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1203">
  <Title>Measuring the Semantic Similarity of Texts</Title>
  <Section position="6" start_page="16" end_page="17" type="concl">
    <SectionTitle>
5 Discussion and Conclusions
</SectionTitle>
    <Paragraph position="0"> For the task of paraphrase recognition, incorporating semantic information into the text similarity measure increases the likelihood of recognition significantly over the random baseline and over the lexical matching baseline. In the unsupervised setting, the best performance is achieved using a method that combines several similarity metrics into one, for an overall accuracy of 68.8%. When learning is used to find the optimal combination of metrics and optimal threshold, the highest accuracy of 71.5% is obtained  by combining the similarity metrics and the lexical matching baseline together.</Paragraph>
    <Paragraph position="1"> For the entailment data set, although we do not explicitly check for entailment, the directional similarity computed for textual entailment recognition does improve over the random and lexical matching baselines. Once again, the combination of similarity metrics gives the highest accuracy, measured at 58.3%, with a slight improvement observed in the supervised setting, where the highest accuracy was measured at 58.9%. Both these figures are competitive with the best results achieved during the PASCAL entailment evaluation (Dagan et al., 2005).</Paragraph>
    <Paragraph position="2"> Although our method relies on a bag-of-words approach, as it turns out the use of measures of semantic similarity improves significantly over the traditional lexical matching metrics4. We are nonetheless 4The improvement of the combined semantic similarity metric over the simpler lexical matching measure was found to be statistically significant in all experiments, using a paired t-test (p &lt; 0.001).</Paragraph>
    <Paragraph position="3">  aware that a bag-of-words approach ignores many of important relationships in sentence structure, such as dependencies between words, or roles played by the various arguments in the sentence. Future work will consider the investigation of more sophisticated representations of sentence structure, such as first order predicate logic or semantic parse trees, which should allow for the implementation of more effective measures of text semantic similarity.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML