File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1024_metho.xml

Size: 4,851 bytes

Last Modified: 2025-10-06 14:08:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1024">
  <Title>Japanese Zero Pronoun Resolution based on Ranking Rules and Machine Learning</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CPOS The POS tags of the last word of a37 .
</SectionTitle>
    <Paragraph position="0"> Siblings When CP is wa or mo, it is not clear whether a37 is a subject. However, a verb rarely has the same entity in two or more cases. Therefore, if a37 modifies a verb that has a subject, a37 is not a subject.</Paragraph>
    <Paragraph position="1"> In the next example, hon is an object of katta.</Paragraph>
    <Paragraph position="2">  (As for that book, Tom bought it.) In order to learn such things, we use sibling case-markers that modify the same verb as a37 's features. We also use the following features of a36 as well as ZP.</Paragraph>
    <Paragraph position="3"> Conjunct The latest conjunctive postposition in the sentence and its classification (Okumura and Tamura, 1996; Yoshimoto, 1986).</Paragraph>
    <Paragraph position="4"> ZSem Semantic categories of the verb that a36 modifies. We use them only when the verb is sahen meishi + 'suru.' Sahen meishi is a kind of noun that can be an object of the verb 'suru' (do) (e.g., 'shopping' in 'do the shopping').</Paragraph>
    <Paragraph position="5"> We also use the following relations between a36 and a37 as well as Ag, Vi, and Di.</Paragraph>
    <Paragraph position="6"> Relative Whether a37 is in a relative clause.</Paragraph>
    <Paragraph position="7"> Unfinished Whether the relative clause is unfinished at a36 .</Paragraph>
    <Paragraph position="8"> Intra (for intrasentential coreference) Whether a37 explicitly appears in a36 's sentence.</Paragraph>
    <Paragraph position="9"> Sometimes it is difficult to distinguish cataphora from anaphora. Even if an antecedent appears in a preceding sentence, it is sometimes easier to find a candidate after a36 , as illustrated by the case of 'his' in the next English example.</Paragraph>
    <Paragraph position="10"> Bob and John separately drove to Charlie's house. . . . Since his car broke down, John made a phone call.</Paragraph>
    <Paragraph position="11"> Even if Di a11 a46 holds, Intra does not necessarily hold because we introduce resolved zeros as new candidaites.</Paragraph>
    <Paragraph position="12"> Parallel Whether a37 appears in a clause parallel to a clause in which a zero appears. This will be useful for the resolution of a zero as with 'it' in the next English sentence.</Paragraph>
    <Paragraph position="13"> He turned on the TV set and she turned it off.</Paragraph>
    <Paragraph position="14"> Immediate Whether a37 's bunsetsu appears immediately before a36 's. In the following sentence, a candidate ryoushin is located immediately before the  (His parents believe that (a36 ) is still alive.) Here, we represent all of the above features by a boolean value: 0 or 1. Semantic categories can be represented by a 0/1 vector whose a52 -th component corresponds to thea52 -th semantic category. Similarly, POS tags can be represented by a 0/1 vector whose a52 -th component corresponds to the a52 -th POS tag. On the other hand, Di has a non-negative integer value. We also encode the distance by a 0/1 vector whose a52 -th component corresponds to the fact that the distance is a52 . The distance has an upper bound maxDi. In this way, we can represent a candidate by a boolean feature vector. A candidate a37a10a53 's feature vector is denoted a54a55a53 . If a boolean feature appears only once in the given data, we remove the feature from the feature vectors.</Paragraph>
    <Paragraph position="15"> The training data comprise the set of pairs</Paragraph>
    <Paragraph position="17"> of a zero. Otherwise, a56 a53 is a33 a23 . By using the training data, SVM finds a decision function a58 a21 a54 a26a59a11</Paragraph>
    <Paragraph position="19"> a54 is the feature vector of a candidate a37 and a63 a53 s are support vectors selected from the training data. a60 a53 is a constant. a61 a21a14a67a3 a67a26 is called a kernel function. If a58 a21 a54 a26 a44 a46 holds, a54 is classified as a correct antecedent.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Combinations
</SectionTitle>
      <Paragraph position="0"> Here, we use the following method to combine the  ordering and SVM.</Paragraph>
      <Paragraph position="1"> 1. Sort candidates by using the lexicographical order. null 2. Classify each candidate by using SVM in this order.</Paragraph>
      <Paragraph position="2"> 3. Ifa58 a21 a54 a53 a26 is positive, stop there and sort the evaluated candidates by a58 a21 a54a68a53 a26 in decreasing order. 4. If no candidate satisfies a58 a21 a54a69a53 a26 a44 a46 , return the best candidate in terms of a58 a21 a54a68a53 a26 .</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML