File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-4002_intro.xml
Size: 2,605 bytes
Last Modified: 2025-10-06 14:03:04
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-4002"> <Title>Evaluation of a Japanese CFG Derived from a Syntactically Annotated Corpus with Respect to Dependency Measures</Title> <Section position="3" start_page="0" end_page="9" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Parsing is one of the important processes for natural language processing and, in general, a large-scale CFG is used to parse a wide variety of sentences. Although it is difficult to build a large-scale CFG manually, a CFG can be derived from a large-scale syntactically annotated corpus. For many languages, large-scale syntactically annotated corpora have been built (e.g. the Penn Tree-bank (Marcus et al., 1993)), and many parsing algorithms using CFGs have been proposed.</Paragraph> <Paragraph position="1"> However, such a syntactically annotated corpus has not been built for Japanese as of yet. Dependency analysis is preferred in order to analyze Japanese sentences (dependency relation between Japanese phrasal unit, called bunsetsu) (Kurohashi and Nagao, 1998; Uchimoto et al., 2000; Kudo and Matsumoto, 2002), and only a few studies about Japanese CFG have been conducted.</Paragraph> <Paragraph position="2"> Since many efficient parsing algorithms for CFG have been proposed, a Japanese CFG is necessary to apply the algorithms to Japanese.</Paragraph> <Paragraph position="3"> We have been building a large-scale Japanese syntactically annotated corpus to derive a Japanese CFG for syntactic parsing (Noro et al., 2004a; Noro et al., 2004b). According to the result, a CFG derived from the corpus can parse sentences with high accuracy and coverage. However, as mentioned previously, dependency analysis is usually adopted in Japanese NLP, and it is difficult to compare our result with results of other dependency analysis since we evaluated our CFG with respect to phrase structure based measure. Although we evaluated with respect to dependency measure as a preliminary experiment in order to compare, the scale was quite small (evaluated on only 100 sentences) and the comparison was unfair since we did not use the same evaluation data.</Paragraph> <Paragraph position="4"> In this paper, we show an evaluation result of a CFG derived from our corpus and compare it with results of other Japanese dependency analyzers.</Paragraph> <Paragraph position="5"> We used the Kyoto corpus (Kurohashi and Nagao, 1997) for evaluation data, and chose KNP (Kurohashi and Nagao, 1998) and CaboCha (Kudo and Matsumoto, 2002) for comparison.</Paragraph> </Section> class="xml-element"></Paper>