File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2140_intro.xml

Size: 4,460 bytes

Last Modified: 2025-10-06 14:00:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2140">
  <Title>DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains</Title>
  <Section position="4" start_page="0" end_page="968" type="intro">
    <SectionTitle>
2 Issues and Approaches: Overview
</SectionTitle>
    <Paragraph position="0"> In this section, we give a,n overview about the main issues that a.ny sunmmrizat;ion system for spoken dia.logues has to address mid indica.te the approach we are taking for each of these in I)IASUMM.</Paragraph>
    <Paragraph position="1"> In a generM sense, when dealing with written texts, usually there is plenty of information available which can be used lbr the purpose of summa- null rization, such as capitalization, i)un(-tuation ~narks, t,itles, passage head(rs, i)aragral)h boundaries, or other ,nark-ul)S. (hfforl.mud.ely, however, ,,onc (.)f this holds for :q)ccch data whh:h arrives as a stream of word l,ok('w; from ;I recognizer, (:ut iuto &amp;quot;utt(.q'antes&amp;quot; by using a silence heuristi('. 2.1. Lack of clause. 1)Oulldaries One of the mosl. serious issues is the lack el senten(:e or clause boundaries in spoken dialogues whi(:h ix particularly problemati(: .;in(:e scnten(:es, clauses, or l)aragral)hs a.re (.:onsidercd the &amp;quot;minimal re,its&amp;quot; in virtually all existil,g summarization systcu,s. \'Vheu humans speak, they so,lletillles pause durinq a (:\]a.use, and not always at. l.he eml of a claus(', whi(:h means that the outl)ut of a r(;coguizer (whi(:h us,tally uses some silelme-heuristics to cut the segments) frequently does nol real,eli Iogi(:al sep, l,en(:e or clause boundaries, l,ooking at five I';nglish (~A,,I,HOM,,: (lialogues with an average ii/11111)(&amp;quot;.1' of :{20 iltl\[.('3'a,l('.c~.q eat.h, we find on average 30 such &amp;quot;(:ontinuations&amp;quot; of logical clauses over automa.ti(:ally detcrmiued a(:oustit&amp;quot; segment I)ounda.ries. lu a smmnary, this can cause a. r(;du(:tion in coh(,,ren(:c and r&lt;~dability of the outlmt.</Paragraph>
    <Paragraph position="2"> We address this issue I)y linking adjac(;nt tm'ns of th(; smue sl)eaker together if the silence between them ix less than a given col,sl.\[/llt (se(;tioll d).</Paragraph>
    <Section position="1" start_page="968" end_page="968" type="sub_section">
      <SectionTitle>
2.2 Distrilml;c.d int'(n'matioll
</SectionTitle>
      <Paragraph position="0"> Siuce we have multi-pari,y conversations as o\])l)oscd to Inonologi('al texts, sonmtimcs the cru(:ial in\['ormatiou is found in a question-auswer-l)air , i.e., it involv('s more than oue Sl)eaker; extracting ouly the question or only the auswer wo,ld be meaningless in ma.ny cases. We found that on average about 10% el' the speaker turns belong to such question-answer l)airs in five examined English (~AIA,IIOME dialogues. Often, either the question or the answer ix very shoI:t and does not contain any words with high relevan(:c. In order not to &amp;quot;lose&amp;quot; these short tutus at a later stage, when only the n~ost, relevant turns are extracted, we link them to the matching question/answer ahead of/.ime, using two different methods to detect questions aud their answers (section 4).</Paragraph>
    </Section>
    <Section position="2" start_page="968" end_page="968" type="sub_section">
      <SectionTitle>
2.3 Distluent speech
</SectionTitle>
      <Paragraph position="0"> Speech disfluencies in spontaneous convers,ttions -such as fillers, repetitions, repairs, or unfinished clauses -- can make transcril)ts (and summary extracts) quite ha.rd to read and also introduce all tinwanted bias to relevance computations (e.g., word repetitions would cause a higher word count tbr the repeated content words; words in untinished clauses would be included in the word count.) 'l'o alleviate this problem, we employ a clean-up tilter pipeline, which eliminates liller words and ,:el)el.it.ions, and segments the tm'ns into short clauses (sectiou 5). \Ve also remove incomplete clauses, typically sentem:c-iuitial repairs, at this stage of our '.systC/lu. This &amp;quot;clea.niug-up&amp;quot; serves two main pur1)oscs: (i) it. im:rea~cs tim readabilit3~ (for the fiually (;xtracl.cd segments); and (ii)it. ~nakcs the text more tractable by subsequent modules.</Paragraph>
      <Paragraph position="1"> The following exalnl)le com\])arcs a turn before and after t.he clean-up component: before: I MEAN WE LOSE WE LOSE I CAN'T I</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML