File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1049_intro.xml

Size: 3,648 bytes

Last Modified: 2025-10-06 14:06:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1049">
  <Title>An Efficient Statistical Speech Act Type Tagging System for Speech Translation Systems</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper describes a statistical speech act type tagging system that utilizes linguistic, acoustic and situational features. This work can be viewed as a study on automatic &amp;quot;Discourse Tagging&amp;quot; whose objective is to assign tags to discourse units in texts or dialogues. Discourse tagging is studied mainly from two different viewpoints, i.e., linguistic and engineering viewpoints. The work described here belongs to the latter group. More specifically, we are interested in automatically recognizing the speech act types of utterances and in applying them to speech translation systems.</Paragraph>
    <Paragraph position="1"> Several studies on discourse tagging to date have been motivated by engineering applications. The early studies by Nagata and Morimoto (1994) and Reithinger and Maier (1995) showed the possibility of predicting dialogue act tags for next utterances with statistical methods. These studies, however, presupposed properly segmented utterances, which is not a realistic assumption. In contrast to this assumption, automatic utterance segmentation (or discourse segmentation) is desired here.</Paragraph>
    <Paragraph position="2"> Discourse segmentation in linguistics, whether manual or automatic, has also received keen attention because such segmentation provides the foundation of higher discourse structures (Grosz and Sidnet, 1986).</Paragraph>
    <Paragraph position="3"> Discourse segmentation has also received keen attention from the engineering side because the natural language processing systems that follow the speech recognition system are designed to accept linguistically meaningful units (Stolcke and Shriberg, 1996). There has been a lot of research following this line such as (Stolcke and Shriberg, 1996) (Cettolo and Falavigna, 1998), to only mention a few.</Paragraph>
    <Paragraph position="4"> We can take advantage of these studies as a pre-process for tagging. In this paper, however, we propose a statistical tagging system that optimally performs segmentation and tagging at the same time.</Paragraph>
    <Paragraph position="5"> Previous studies like (Litman and Passonneau, 1995) have pointed out that the use of a multiple information source can contribute to better segmentation and tagging, and so our statistical model integrates linguistic, acoustic and situational information.</Paragraph>
    <Paragraph position="6"> The problem can be formalized as a search problem on a word graph, which can be efficiently handled by an extended dynamic programming algorithm. Actually, we can efficiently find the optimal solution without limiting the search space at all.</Paragraph>
    <Paragraph position="7"> The results of our tagging experiments involving both Japanese and English corpora indicated a high performance for Japanese but a considerably lower performance for the English corpora. This work also reports on the use of speech act type tags for translating Japanese and English positive response expressions. Positive responses quite often appear in task-oriented dialogues like those in our tasks.</Paragraph>
    <Paragraph position="8"> They are often highly ambiguous and problematic in speech translation. We will show that these expressions can be effectively translated with the help of dialogue information, which we call speech act type tags.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML