File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/i05-2026_abstr.xml
Size: 1,127 bytes
Last Modified: 2025-10-06 13:44:19
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2026"> <Title>Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We present a system to determine content similarity of documents.</Paragraph> <Paragraph position="1"> Our goal is to identify pairs of book chapters that are translations of the same original chapter. Achieving this goal requires identification of not only the different topics in the documents but also of the particular flow of these topics.</Paragraph> <Paragraph position="2"> Our approach to content similarity evaluation employs n-grams of lexical chains and measures similarity using the cosine of vectors of n-grams of lexical chains, vectors of tf*idf-weighted keywords, and vectors of unweighted lexical chains (unigrams of lexical chains). Our results show that n-grams of unordered lexical chains of length four or more are particularly useful for the recognition of content similarity.</Paragraph> </Section> class="xml-element"></Paper>