File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1001_intro.xml
Size: 7,038 bytes
Last Modified: 2025-10-06 14:01:04
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1001"> <Title>Activity detection for information access to oral communication</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Information access to oral communication is becoming an interesting research area since recording, storing and transmitting large amounts of audio (and video) data is feasible today. While written information is often available electronically (especially since it is typically entered on computers) oral communication is usually only documented by constructing a new document in written form such as a transcript (court proceedings) or minutes (meetings). Oral communications are therefore a large untapped resource, especially if no corresponding written documents are available and the cost of documentation using traditional techniques is considered high: Tutorial introductions by a senior sta member might be worthwhile to attend by many newcomers, o ce meetings may contain informations relevant for others and should be reproducable, informal and formal group meetings may be interesting but not fully documented. In essence the written form is already a reinterpretation of the original rejoinder. Such a reinterpretation are used to extract and condense information add or delete information change the meaning cite the rejoinder relate rejoinders to each other Reinterpretation is a time consuming, expensive and optional step and written documentation is combining reinterpretation and documentation step in one 1. If however reinterpretation is not necessary or unwanted a system which is producing audiovisual records is superior. If reinterpretation is wanted or needed a system using audiovisual records may be used to improve the reinterpretation by adding all audiovisual data and the option to go back to the unaltered original. Whether reinterpretation is done or not it is crucial to be able to navigate e ectively within an audiovisual document and to nd a speci c document.</Paragraph> <Paragraph position="1"> 1The most important exception is the literal courtroom transcript, however one could argue that even transcripts are reinterpretations since they do not contain a number of informations present in the audio channel such as emotions, hesitations, the use of slang and certain types of hetereglossia, accents and so forth. This is speci cally true if transcription machines are used which restrict the transcriber to standard orthography.</Paragraph> <Paragraph position="2"> Willie is sickNeed new coding schemePersonal stuffSetting up the new hard drive Data collection, last WednesdayDialogue detection, with HansLanguage modeling tutorial for Tim munications take place in very di erent formats and the rst step in the search is to determine the database (or sub-database) of the rejoinder. The next step is to nd the speci c rejoinder. Since rejoinders can be very long the rejoinder has to segmented and a segment has to be selected.</Paragraph> <Paragraph position="3"> While keywords are commonly used in information access to written information the use of other indices such as style is still uncommon (but see Kessler et al. (1997); van Bretan et al. (1998)). Oral communication is richer than written communication since it is an interactive real time accomplishment between participants, may involve speech gestures such as the display of emotion and is situated in space and time. Bahktin (1986) characterizes a conversation by topic, situation and style. Information access to oral communication can therefore make use of indices that pertain to the oral nature of the discourse (Fig. 2). Indices other than topic (represented by keywords) increase in importance since browsing audio documents is cumbersome which makes the common interactive retrieval strategy \query, browse, reformulate&quot; less e ective. Finally the topic may not be known at all or may not be that relevant for the query formulation, for example if one just wants to be reminded what was being discussed last time a person was met. Activities are suggested as an alternative index and are a description of the type of interaction. It is common to use \action-verbs&quot; such as story-telling, discussing, planning, informing, etc. to describe activities 2. Items similar to activities have been shown to be directly retrievable from autobiographic memory (Herrmann, 1993) and are therefore indices that are available to participants of the conversation. Other indices may be very e ective but not available: The frequency of the word \I&quot; in the conversation, the histogram of word lengths or the histogram of pitch per participant.</Paragraph> <Paragraph position="4"> In Fig. 1 the information access hierarchy is being introduced which allows to understand the problem of information access to oral communication at di erent levels. In Ries (1999) we have shown that the detection of general di2 The de nition of activities such as planning may vary vastly across general dialogue genres, for example compare a military combat situation with a mother child interaction.</Paragraph> <Paragraph position="5"> However it is often possible to develop activities and dialogue typologies for a speci c dialogue genre. The related problem of general typologies of dialogues is still far from being settled and action-verbs are just one potential catego- null Bahktin (1986) describes a discourse along the three major properties style, situation and topic. Current information retrieval systems focus on the topical aspect which might be crucial in written documents.</Paragraph> <Paragraph position="6"> Furthermore, since throughout text analysis is still a hard problem, information retrieval has mostly used keywords to characterize topic. Many features that could be extracted are therefore ignored in a traditional keyword based approach.</Paragraph> <Paragraph position="7"> alogue genre (database level in Fig. 1) can be done with high accuracy if a number of di erent example types have been annotated; in Ries et al. (2000) we have shown that it is hard but not impossible to distinguish activities in personal phone calls (segment level in Fig. 1) . In this paper we will address activities in meetings and other types of dialogues and show that these activities can be distinguished using certain features and a neural network based classi er (Sec. 2, segment level in Fig. 1). The concept of information retrieval assessment using information theoretic measures is applied to this task (Sec. 3). Additionally we will introduce a level somewhat below the database level in Fig. 1 that we call \sub-genre&quot; and we have collected a large database of TV-shows that are automatically classi ed for their show-type (Sec. 4). We also explore whether there are other indices similar to activities that could be used and we are presenting results on emotions in meetings (Sec. 5).</Paragraph> </Section> class="xml-element"></Paper>