File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/w05-0207_abstr.xml
Size: 1,377 bytes
Last Modified: 2025-10-06 13:44:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0207"> <Title>Using Syntactic Information to Identify Plagiarism</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Using keyword overlaps to identify plagiarism can result in many false negatives and positives: substitution of synonyms for each other reduces the similarity between works, making it dif cult to recognize plagiarism; overlap in ambiguous keywords can falsely in ate the similarity of works that are in fact different in content. Plagiarism detection based on verbatim similarity of works can be rendered ineffective when works are paraphrased even in super cial and immaterial ways. Considering linguistic information related to creative aspects of writing can improve identi cation of plagiarism by adding a crucial dimension to evaluation of similarity: documents that share linguistic elements in addition to content are more likely to be copied from each other. In this paper, we present a set of low-level syntactic structures that capture creative aspects of writing and show that information about linguistic similarities of works improves recognition of plagiarism (over t df-weighted keywords alone) when combined with similarity measurements based on t df-weighted keywords.</Paragraph> </Section> class="xml-element"></Paper>