File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2042_intro.xml

Size: 2,255 bytes

Last Modified: 2025-10-06 14:03:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2042">
  <Title>Detection of Quotations and Inserted Clauses and its Application to Dependency Structure Analysis in Spontaneous Japanese Ryoji Hamabe DD</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The &amp;quot;Spontaneous Speech: Corpus and Processing Technology&amp;quot; project sponsored the construction of the Corpus of Spontaneous Japanese (CSJ) (Maekawa et al., 2000). The CSJ is the biggest spontaneous speech corpus in the world, consisting of roughly 7M words with the total speech length of 700 hours, and is a collection of monologues such as academic presentations and simulated public speeches. The CSJ includes transcriptions of the speeches as well as audio recordings of them. Approximately one tenth of the speeches in the CSJ were manually annotated with various kinds of information such as morphemes, sentence boundaries, dependency structures, and discourse structures.</Paragraph>
    <Paragraph position="1"> In Japanese sentences, word order is rather free, and subjects or objects are often omitted.</Paragraph>
    <Paragraph position="2"> In Japanese, therefore, the syntactic structure of a sentence is generally represented by the relationships between phrasal units, or bunsetsus in Japanese, based on a dependency grammar, as represented in the Kyoto University text corpus (Kurohashi and Nagao, 1997). In the same way, the syntactic structure of a sentence is represented by dependency relationships between bunsetsus in the CSJ. For example, the sentence &amp;quot; tx lX2MoM&amp;quot; (He is walking slowly) can be divided into three bunsetsus, &amp;quot; tx, kare-wa&amp;quot; (he), &amp;quot;lX, yukkuri&amp;quot; (slowly), and &amp;quot;2Mo M, arui-te-iru&amp;quot; (is walking). In this sentence, the first and second bunsetsus depend on the third one. The dependency structure is described as follows. null</Paragraph>
    <Paragraph position="4"> In this paper, we first describe the problems with dependency structure analysis of spontaneous speech. We focus on ambiguous clause boundaries as the biggest problem and present a solution.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML