File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0207_metho.xml

Size: 7,713 bytes

Last Modified: 2025-10-06 14:09:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0207">
  <Title>Using Syntactic Information to Identify Plagiarism</Title>
  <Section position="5" start_page="37" end_page="39" type="metho">
    <SectionTitle>
3 Identifying Creative Aspects of Writing
</SectionTitle>
    <Paragraph position="0"> In this paper, we rst identify linguistic elements of expression and then study patterns in the use of these elements to recognize a work even when it is paraphrased. Translated literary works provide examples of linguistic elements that differ in expression but convey similar content. These works provide insight into the linguistic elements that capture expression. For example, consider the following semantically equivalent excerpts from three different translations of Madame Bovary by Gustave Flaubert.</Paragraph>
    <Paragraph position="1"> Excerpt 1: Now Emma would often take it into her head to write him during the day. Through her window she would signal to Justin, and he would whip off his apron and y to la huchette. And when Rodolphe arrived in response to her summons, it was to hear that she was miserable, that her husband was odious, that her life was a torment. (Translated by Unknown1.)  Excerpt 2: Often, even in the middle of the day, Emma suddenly wrote to him, then from the window made a sign to Justin, who, taking his apron off, quickly ran to la huchette. Rodolphe would come; she had sent for him to tell him that she was bored, that her husband was odious, her life frightful. (Translated by Aveling.) Excerpt 3: Often, in the middle of the day, Emma would take up a pen and write to him. Then she would beckon across to Justin, who would off with his apron in an instant and y away with the letter to la huchette. And Rodolphe would come. She wanted to tell him that life was a burden to her, that she could not endure her husband and that things were unbearable. (Translated by Unknown2.) Inspired by syntactic differences displayed in such parallel translations, we identi ed a novel set of syntactic features that relate to how people convey content.</Paragraph>
    <Section position="1" start_page="38" end_page="39" type="sub_section">
      <SectionTitle>
3.1 Syntactic Elements of Expression
</SectionTitle>
      <Paragraph position="0"> We hypothesize that given particular content, authors choose from a set of semantically equivalent syntactic constructs to express this content. To paraphrase a work without changing content, people try to interchange semantically equivalent syntactic constructs; patterns in the use of various syntactic constructs can be suf cient to indicate copying.</Paragraph>
      <Paragraph position="1"> Our observations of the particular expressive choices of authors in a corpus of parallel translations led us to de ne syntactic elements of expression in terms of sentence-initial and - nal phrase structures, semantic classes and argument structures of verb phrases, and syntactic classes of verb phrases.</Paragraph>
      <Paragraph position="2">  structures The order of phrases in a sentence can shift the emphasis of a sentence, can attract attention to particular pieces of information and can be used as an  expressive tool.</Paragraph>
      <Paragraph position="3"> 1 (a) Martha can nally put some money in the bank. (b) Martha can put some money in the bank, nally. (c) Finally, Martha can put some money in the bank. 2 (a) Martha put some money in the bank on Friday. (b) On Friday, Martha put some money in the bank. (c) Some money is what Martha put in the bank on Friday. null (d) In the bank is where Martha put some money on  Friday.</Paragraph>
      <Paragraph position="4"> The result of such expressive changes affect the distributions of various phrase types in sentence-initial and - nal positions; studying these distributions can help us capture some elements of expression. Despite its inability to detect the structural changes that do not affect the sentence-initial and - nal phrase types, this approach captures some of the phrase-level expressive differences between semantically equivalent content; it also captures different sentential structures, including question constructs, imperatives, and coordinating and subordinating conjuncts.</Paragraph>
      <Paragraph position="5">  Levin (1993) observed that verbs that exhibit similar syntactic behavior are also related semantically. Based on this observation, she sorted 3024 verbs into 49 high-level semantic classes. Verbs of sending and carrying , such as convey, deliver, move, roll, bring, carry, shuttle, and wire, for example, are collected under this semantic class and can be further broken down into ve semantically coherent lower-level classes which include drive verbs , carry verbs , bring and take verbs , slide verbs , and send verbs . Each of these lower-level classes represents a group of verbs that have similarities both in semantics and in syntactic behavior, i.e., they can grammatically undergo similar syntactic alternations. For example, send verbs can be seen in the following alternations (Levin, 1993):  1. Base Form * Nora sent the book to Peter.</Paragraph>
      <Paragraph position="6"> * NP + V + NP + PP.</Paragraph>
      <Paragraph position="7"> 2. Dative Alternation * Nora sent Peter the book.</Paragraph>
      <Paragraph position="9"> Semantics of verbs in general, and Levin's verb classes in particular, have previously been used for evaluating content and genre similarity (Hatzivassiloglou et al., 1999). In addition, similar semantic classes of verbs were used in natural language processing applications: START was the rst natural language question answering system to use such verb classes (Katz and Levin, 1988). We use  Levin's semantic verb classes to describe the expression of an author in a particular work. We assume that semantically similar verbs are often used in semantically similar syntactic alternations; we describe part of an author's expression in a particular work in terms of the semantic classes of verbs she uses and the particular argument structures, e.g., NP + V + NP + PP, she prefers for them. As many verbs belong to multiple semantic classes, to capture the dominant semantic verb classes in each document we credit all semantic classes of all observed verbs. We extract the argument structures from part of speech tagged text, using context-free grammars (Uzuner, 2005).</Paragraph>
      <Paragraph position="10">  Levin's verb classes include exclusively nonembedding verbs , i.e., verbs that do not take clausal arguments, and need to be supplemented by classes of embedding verbs that do take such arguments. Alexander and Kunz (1964) identi ed syntactic classes of embedding verbs, collected a comprehensive set of verbs for each class, and described the identi ed verb classes with formulae written in terms of phrasal and clausal elements, such as verb phrase heads (Vh), participial phrases (Partcp.), innitive phrases (Inf.), indicative clauses (IS), and subjunctives (Subjunct.). We used 29 of the more frequent embedding verb classes and identi ed their distributions in different works. Examples of these verb classes are shown in Table 1. Further examples can be found in (Uzuner, 2005; Uzuner and Katz, 2005).</Paragraph>
      <Paragraph position="11">  embedding verb classes.</Paragraph>
      <Paragraph position="12"> We study the syntax of embedding verbs by identifying their syntactic class and the structure of their observed embedded arguments. After identifying syntactic and semantic characteristics of verb phrases, we combine these features to create further elements of expression, e.g., syntactic classes of embedding verbs and the classes of semantic nonembedding verbs they co-occur with.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML