File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-1202_concl.xml
Size: 2,655 bytes
Last Modified: 2025-10-06 13:55:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1202"> <Title>The Distributional Similarity of Sub-Parses</Title> <Section position="6" start_page="10" end_page="11" type="concl"> <SectionTitle> 5 Conclusions and Further Work </SectionTitle> <Paragraph position="0"> In conclusion, it is clear that components of phrases do not need to be semantically similar for the encompassing phrases to be semantically similar. Thus, it is necessary to develop techniques which estimate the semantic similarity of two phrases directly rather than combining similarity scores calculated for pairs of words.</Paragraph> <Paragraph position="1"> Our approach is to find the distributional similarity of the sub-parses associated with phrases by extending general techniques for finding lexical distributional similarity. We have illustrated this method for examples, showing how data sparseness can be overcome using the web.</Paragraph> <Paragraph position="2"> We have shown that finding the distributional similarity between phrases, as outlined here, may have potential in identifying paraphrases. In our examples, the distributional similarities of paraphrases was higher than non-paraphrases. However, obviously, more extensive evaluation of the technique is required before drawing more definite conclusions.</Paragraph> <Paragraph position="3"> In this respect, we are currently in the process of developing a gold standard set of similar phrases from the Pascal Textual Entailment Chal- null lenge dataset. This task is not trivial since, even though pairs of sentences are already identified as potential paraphrases, it is still necessary to extract pairs of phrases which convey roughly the same meaning. This is because 1) some pairs of sentences are almost identical in word content and 2) some pairs of sentences are quite distant in meaning similarity. Further, it is also desirable to classify extracted pairs of paraphrases as to whether they are lexical, syntactic, semantic or inferential in nature. Whilst lexical (e.g. &quot;to gather&quot; is similar to &quot;to collect&quot;) and syntactic (e.g. &quot;Cambodian sweatshop&quot; is equivalent to &quot;sweatshop in Cambodia&quot;) are of interest, our aim is to extend lexical techniques to the semantic level (e.g. &quot;X won presidential election&quot; is similar to &quot;X became president&quot;). Once our analysis is complete, the data will be used to evaluate variations on the technique proposed herein and also to compare it empirically to other techniques such as that of Lin and Pantel (2001).</Paragraph> </Section> class="xml-element"></Paper>