File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2050_metho.xml

Size: 2,757 bytes

Last Modified: 2025-10-06 14:10:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2050">
  <Title>Comparing the roles of textual, acoustic and spoken-language features on spontaneous-conversation summarization</Title>
  <Section position="5" start_page="197" end_page="197" type="metho">
    <SectionTitle>
3 Classification based utterance
</SectionTitle>
    <Paragraph position="0"> extraction Spontaneous conversations contain more information than textual features. To utilize these features, we reformulate the utterance selection task as a binary classification problem, an utterance is either labeled as &amp;quot;1&amp;quot; (in-summary) or &amp;quot;0&amp;quot; (not-in-summary). Two state-of-the-art classifiers, support vector machine (SVM) and logistic regression (LR), are used. SVM seeks an optimal separating hyperplane, where the margin is maximal. In our experiments, we use the OSU-SVM package. Logistic regression (LR) is indeed a softmax linear regression, which models the posterior probabilities of the class label with the softmax of linear functions of feature vectors. For the binary classification that we require in our experiments, the model format is simple.</Paragraph>
    <Section position="1" start_page="197" end_page="197" type="sub_section">
      <SectionTitle>
3.1 Features
</SectionTitle>
      <Paragraph position="0"> The features explored in this paper include:  (1) MMR score: the score calculated with MMR (Zechner, 2001) for each utterance. (2) Lexicon features: number of named entities, and utterance length (number of words). The number of named entities includes: person-name number, location-name number, organization-name number, and the total number. Named entities are annotated automatically with a dictionary.</Paragraph>
      <Paragraph position="1"> (3) Structural features: a value is assigned to indicate whether a given utterance is in the first,  middle, or last one-third of the conversation. Another Boolean value is assigned to indicate whether this utterance is adjacent to a speaker turn or not.</Paragraph>
      <Paragraph position="2">  (4) Prosodic features: we use basic prosody: the maximum, minimum, average and range of energy, as well as those of fundamental frequency, normalized by speakers. All these features are automatically extracted.</Paragraph>
      <Paragraph position="3"> (5) Spoken-language features: the spoken-language features include number of repetitions, filled  pauses, and the total number of them.</Paragraph>
      <Paragraph position="4"> Disfluencies adjacent to a speaker turn are not counted, because they are normally used to coordinate interaction among speakers.</Paragraph>
      <Paragraph position="5"> Repetitions and pauses are detected in the same way as described in Zechner (2001).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML