File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2140_intro.xml
Size: 4,460 bytes
Last Modified: 2025-10-06 14:00:52
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2140"> <Title>DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains</Title> <Section position="4" start_page="0" end_page="968" type="intro"> <SectionTitle> 2 Issues and Approaches: Overview </SectionTitle> <Paragraph position="0"> In this section, we give a,n overview about the main issues that a.ny sunmmrizat;ion system for spoken dia.logues has to address mid indica.te the approach we are taking for each of these in I)IASUMM.</Paragraph> <Paragraph position="1"> In a generM sense, when dealing with written texts, usually there is plenty of information available which can be used lbr the purpose of summa- null rization, such as capitalization, i)un(-tuation ~narks, t,itles, passage head(rs, i)aragral)h boundaries, or other ,nark-ul)S. (hfforl.mud.ely, however, ,,onc (.)f this holds for :q)ccch data whh:h arrives as a stream of word l,ok('w; from ;I recognizer, (:ut iuto &quot;utt(.q'antes&quot; by using a silence heuristi('. 2.1. Lack of clause. 1)Oulldaries One of the mosl. serious issues is the lack el senten(:e or clause boundaries in spoken dialogues whi(:h ix particularly problemati(: .;in(:e scnten(:es, clauses, or l)aragral)hs a.re (.:onsidercd the &quot;minimal re,its&quot; in virtually all existil,g summarization systcu,s. \'Vheu humans speak, they so,lletillles pause durinq a (:\]a.use, and not always at. l.he eml of a claus(', whi(:h means that the outl)ut of a r(;coguizer (whi(:h us,tally uses some silelme-heuristics to cut the segments) frequently does nol real,eli Iogi(:al sep, l,en(:e or clause boundaries, l,ooking at five I';nglish (~A,,I,HOM,,: (lialogues with an average ii/11111)(&quot;.1' of :{20 iltl\[.('3'a,l('.c~.q eat.h, we find on average 30 such &quot;(:ontinuations&quot; of logical clauses over automa.ti(:ally detcrmiued a(:oustit&quot; segment I)ounda.ries. lu a smmnary, this can cause a. r(;du(:tion in coh(,,ren(:c and r<~dability of the outlmt.</Paragraph> <Paragraph position="2"> We address this issue I)y linking adjac(;nt tm'ns of th(; smue sl)eaker together if the silence between them ix less than a given col,sl.\[/llt (se(;tioll d).</Paragraph> <Section position="1" start_page="968" end_page="968" type="sub_section"> <SectionTitle> 2.2 Distrilml;c.d int'(n'matioll </SectionTitle> <Paragraph position="0"> Siuce we have multi-pari,y conversations as o\])l)oscd to Inonologi('al texts, sonmtimcs the cru(:ial in\['ormatiou is found in a question-auswer-l)air , i.e., it involv('s more than oue Sl)eaker; extracting ouly the question or only the auswer wo,ld be meaningless in ma.ny cases. We found that on average about 10% el' the speaker turns belong to such question-answer l)airs in five examined English (~AIA,IIOME dialogues. Often, either the question or the answer ix very shoI:t and does not contain any words with high relevan(:c. In order not to &quot;lose&quot; these short tutus at a later stage, when only the n~ost, relevant turns are extracted, we link them to the matching question/answer ahead of/.ime, using two different methods to detect questions aud their answers (section 4).</Paragraph> </Section> <Section position="2" start_page="968" end_page="968" type="sub_section"> <SectionTitle> 2.3 Distluent speech </SectionTitle> <Paragraph position="0"> Speech disfluencies in spontaneous convers,ttions -such as fillers, repetitions, repairs, or unfinished clauses -- can make transcril)ts (and summary extracts) quite ha.rd to read and also introduce all tinwanted bias to relevance computations (e.g., word repetitions would cause a higher word count tbr the repeated content words; words in untinished clauses would be included in the word count.) 'l'o alleviate this problem, we employ a clean-up tilter pipeline, which eliminates liller words and ,:el)el.it.ions, and segments the tm'ns into short clauses (sectiou 5). \Ve also remove incomplete clauses, typically sentem:c-iuitial repairs, at this stage of our '.systC/lu. This &quot;clea.niug-up&quot; serves two main pur1)oscs: (i) it. im:rea~cs tim readabilit3~ (for the fiually (;xtracl.cd segments); and (ii)it. ~nakcs the text more tractable by subsequent modules.</Paragraph> <Paragraph position="1"> The following exalnl)le com\])arcs a turn before and after t.he clean-up component: before: I MEAN WE LOSE WE LOSE I CAN'T I</Paragraph> </Section> </Section> class="xml-element"></Paper>