File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2159_metho.xml
Size: 6,981 bytes
Last Modified: 2025-10-06 14:13:45
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2159"> <Title>PAUSE AS A PHRASE DEMARCATOR FOR SPEECH AND LANGUAGE PROCESSING</Title> <Section position="4" start_page="987" end_page="988" type="metho"> <SectionTitle> 2 ANALYSIS OF SPONTA- NEOUS DIALOGUES </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="987" end_page="987" type="sub_section"> <SectionTitle> 2.1 Spontaneous Dialogue Data </SectionTitle> <Paragraph position="0"> As sources of spontaneous data, we nse four Japanese dialogues concerning directions from Kyoto station to either a conference center or a hotel, collected in the Environment for Multi-Modal lnteraction\[10\].</Paragraph> <Paragraph position="1"> Speaker A is pre-trained to give the directions, mentioning possible transportation, location and so forth. Two subjects seeking directions, Speaker B and Speaker C, are given some keywords, such as the name and tim date of the conference. They may use telephone connections only, or may use a multimodal setnp with onscreen graphics and video as well. Table 1 shows how many words are used in tile dialogues studied: The corpora consists of 3541 words in total, and contains 440 different words, it has 403 turn-takings, and thus roughly 403 sentences.</Paragraph> <Paragraph position="2"> In the multimedia setup, speakers use deictic expressions such as koko and kore meaning &quot;here&quot; and &quot;this,&quot; respectively. The dialogues also la~sted longer than those in the telephone-only setup. Itowever, we did not find any further distinct differences between the two setups. We therefore analyse all of the dialogues in tile same way.</Paragraph> <Paragraph position="3"> For our stndy, transcripts of the spontaneous dialogues have been prepared, and these contain too> photogical tags and turn-taking information. Pause information within turns, i.e., breaths or silences longer than 400 miliseconds, is provided a~s well.</Paragraph> </Section> <Section position="2" start_page="987" end_page="988" type="sub_section"> <SectionTitle> 2.2 Pause as a Phrase Demarcator </SectionTitle> <Paragraph position="0"> In Table 2 we illustrate the adequacy of the interpausal phrase as a processing unit with a series of directions to Kyoto station's Karasumachou exit. 3'he entire explanation consists of three turns separated by short response syllables, snch as hat, that do not overlap I,l~e explanation. That is, the speaker paused during these responses. We marked each turn with '/'URN at the end. As a primary demarcator we used pauses and turns. Thus either PAUSE or TURN appears in the second colunm. Further demarcator candidates such as the filled pauses anoo or Pete, the emphasis marker desune and the response syllable hat when overlapping the explanation appear in the third eohmm as FILLED PAUSE, DESUNE and RESPONSE, respectively.</Paragraph> <Paragraph position="1"> A rough translation follows each interpausal phrase: ~2 K ~@&quot;QL2~: 6 PAUSE FILLED PAUSE if it is from here ~ 6 PAUSE this side C/-)~t~>*&'-\[:2Z)~ O &quot;C'N ~ ~ b~ PAUSE R, ESPONSE you go up the stairs c cfo a/~o-cN~- TUaN you cross here all the way ~* PAUSE and ~ ,~,-C/' I~ESPONSE -- ~: J~ Y~JJ m PAUSE when you see the nezt stairs, this one, turn left, first ~_ ~ 7-~&quot; PAUSE DESUNE at this place like a crossroad which appears ~'~cEf o~CT;~ ~ ~ 5- TURN turn rigM &quot;(&quot; ,~ff IC '~ &quot;o &quot;% I~ Iz'~ X2 &quot; PAUSE and yell t'~lrTz right -PC c a) N~-C-I~g ~ -C\]*.~ ~ &quot;~ ~- ~ PAUSE t~ESPONSE and lhen if you go down the stairs here you come out of the karasumachou emil The length of the processing unit plays an impel rant role in speech recognition. Table 2 shows that alternative demarcator candidates such as FILLED PAUSE and RESPONSE usually cooccur with pauses. In Table 2, for example, we find only one case where RESPONSE does not eooecur with a pause. Consequently, tile segments within turns bounded by these alternative markers would not be much different from those bounded by pauses; in particular, they would not be nan& shorter or longer. Thus, at least where length is concerned, the combination of PAUSE and TURN seems appropriate and sufficient to mark out phrases. With respect to language processing, Table 2 shows that interpausal phrases are often adequate as translation units, which suggests that such phrases often function as meaning units.</Paragraph> <Paragraph position="2"> Interpausal phrases typically end with a conjunctive postposition, such ms ya or keredomo; a postpositional phrase; an interjection, such as hat or moshimoshi; the genitive postposition no for adnominals; all adnominal conjugaL|oil forlll; ;t coor(/itmJ.e cot@lgation form; ~mxiliaries with senl;ence liua\[ conjugatiol: form; or a seut,enee final l)arl.icle, such as lea or &quot;ll PS.</Paragraph> </Section> <Section position="3" start_page="988" end_page="988" type="sub_section"> <SectionTitle> 2.3 Features of Spontaneous Dia- logues </SectionTitle> <Paragraph position="0"> We studied t, en features of Sl)Ont~mc.ous dialogues which are not, consid(,red iu grammars for weal \['ormed senl;ences\[6\]\[I 1\]. Table 3 shows the fi'ah:res and t;hcir frequem:ies: In Ex. 2 Speaker \]3 did not; finish whag he wm,i, ed t.o say, but SpeMcer A m:derstood his iutention and inl;err:ll)ted his utterance, which is therefore fragumntary. Speaker 11 continued but, before he could liaish We expected a very high frequency of the \[|{led pauses a'0oo and celo flmctioaiag as discourse managers\[I2\], lloweve.r, Table 3 shows only a rood est frequency. Iq~ol:ological varim, ions such as utb*oo al:d aTio for a11oo ;Hid etlov a:ld cello \['or 0el0 were uot coltllted. This may be why the \['requeucy off bed: cxpr(..ssions ix unexpectedly low.</Paragraph> <Paragraph position="1"> Some flai, ures shown in Table :1 are disc:,ssed in the ('.X;-UI/I)Ie sets below. Fe.al, ures it: focus ;~re iu bold type: F,x. 1 soch.h'a ~Io (lesmte noviba kava basu ga desune dele.masu there is a bus fl'om that bus s~,op &quot;\]'he person giving dire.cdons off, e:: uses dm expression desu~:e. The use o\[&quot; dcsu'ne emphasiz(:s t, he preceding utterance., typically the inlmediat.ely preceding miMmal phrase. In Ex. I the first use emphasizes sochira no and the second sl, resse.s ba.s.u yR.</Paragraph> <Paragraph position="2"> We deuol, e t, he person giving the directions as Sp(,akcr A aud the person seeking the infornmtion as Speal:er B in Examples 2, and 3.</Paragraph> <Paragraph position="3"> completed his ul, terancv Spealce.r B interrupts witl~ the station name. SpeM:er A did not continue his \[h'sl, utterance and agreed wit\[: Speaker B. St)e.ake.r A's first utterance is a non:|hal phrase, which is never eomlJe.ted.</Paragraph> <Paragraph position="4"> .... -4 1 - &quot; 3 APPI,ICA\]ION OF THE</Paragraph> </Section> </Section> class="xml-element"></Paper>