File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1105_intro.xml

Size: 3,110 bytes

Last Modified: 2025-10-06 14:03:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1105">
  <Title>Japanese Dependency Parsing Using Co-occurrence Information and a Combination of Case Elements</Title>
  <Section position="3" start_page="0" end_page="833" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Dependency parsing is a basic technology for processing Japanese and has been the subject of much research. The Japanese dependency structure is usually represented by the relationship between phrasal units called bunsetsu, each of which consists of one or more content words that may be followed by any number of function words. The dependency between two bunsetsus is direct from a dependent to its head.</Paragraph>
    <Paragraph position="1"> Manually written rules have usually been used to determine which bunsetsu another bunsetsu tendstomodify, butthismethodposesproblemsin terms of the coverage and consistency of the rules. The recent availability of larger-scale corpora annotated with dependency information has thus resulted in more work on statistical dependency  analysistechnologiesthatusemachinelearningalgorithms (Kudo and Matsumoto, 2002; Sassano, 2004; Uchimoto et al., 1999; Uchimoto et al., 2000).</Paragraph>
    <Paragraph position="2"> Work on statistical Japanese dependency analysis has usually assumed that all the dependency relations in a sentence are independent of each other, and has considered the bunsetsus in a sentence independently when judging whether or not a pair of bunsetsus is in a dependency relation. In judging which bunsetsu a bunsetsu modifies, this type of work has used as features the information of two bunsetsus, such as the head words of the two bunsetsus, and the morphemes at the ends of the bunsetsus (Uchimoto et al., 1999). It is necessary, however, to also consider features for the contextual information of the two bunsetsus. One such feature is the constraint that two case elements with the same case do not modify a verb.</Paragraph>
    <Paragraph position="3"> Statistical Japanese dependency analysis takes into account syntactic information but tends not to take into account lexical information, such as co-occurrence between a case element and a verb.</Paragraph>
    <Paragraph position="4"> The recent availability of more corpora has enabled much information about dependency relations to be obtained by using a Japanese dependency analyzer such as KNP (Kurohashi and Nagao, 1994) or CaboCha (Kudo and Matsumoto, 2002). Although this information is less accurate than manually annotated information, these automatic analyzers provide a large amount of co-occurrence information as well as information about combinations of multiple cases that tend to modify a verb.</Paragraph>
    <Paragraph position="5"> In this paper, we present a method for improving the accuracy of Japanese dependency analysis by representing the lexical information of co-occurrence and dependency relations of multiple cases as statistical models. We also show the results of experiments demonstrating the effectiveness of our method.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML