File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1112_concl.xml

Size: 1,573 bytes

Last Modified: 2025-10-06 13:55:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1112">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Structural Similarity Measure</Title>
  <Section position="7" start_page="97" end_page="98" type="concl">
    <SectionTitle>
6 Conclusions
</SectionTitle>
    <Paragraph position="0"> This paper contains an outline of a simple language similarity measure based upon the surface syntactic dependency trees. According to our opinion, such a measure expresses more adequately the similarity of languages than simple string-based measures used for the text similarity. The measure is defined on pairs of trees from a parallel corpus. In its current form it doesn't account for differences in morphosyntactic labels of corresponding nodes or edges, although it is an important parameter of language similarity. The proper combination of our basic structural similarity measure with some measure reflecting the differences of labels opens a wide range of options for a future research. Equally important seems to be a task of gathering properly syntactically annotated parallel corpora of a reasonable size.</Paragraph>
    <Paragraph position="1"> The only corpus of such kind which we have at our disposal, the Prague Czech-English Dependency Treebank (CuVr'in et al., 2004) relies on imperfect automatic annotation which might distort the results. The human annotation of the PCEDT is just starting, so there's a  good chance that the measure will bring some reliable results at least for those two lenguages soon.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML