File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/p03-1071_relat.xml
Size: 1,799 bytes
Last Modified: 2025-10-06 14:15:40
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1071"> <Title>Discourse Segmentation of Multi-Party Conversation</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Existing approaches to textual segmentation can be broadly divided into two categories. On the one hand, many algorithms exploit the fact that topic segments tend to be lexically cohesive. Embodiments of this idea include semantic similarity (Morris and Hirst, 1991; Kozima, 1993), cosine similarity in word vector space (Hearst, 1994), inter-sentence similarity matrix (Reynar, 1994; Choi, 2000), entity repetition (Kan et al., 1998), word frequency models (Reynar, 1999), or adaptive language models (Beeferman et al., 1999). Other algorithms exploit a variety of linguistic features that may mark topic boundaries, such as referential noun phrases (Passonneau and Litman, 1997).</Paragraph> <Paragraph position="1"> In work on segmentation of spoken documents, intonational, prosodic, and acoustic indicators are used to detect topic boundaries (Grosz and Hirschberg, 1992; Nakatani et al., 1995; Hirschberg and Nakatani, 1996; Passonneau and Litman, 1997; Hirschberg and Nakatani, 1998; Beeferman et al., 1999; T&quot;ur et al., 2001). Such indicators include long pauses, shifts in speaking rate, great range in F0 and intensity, and higher maximum accent peak.</Paragraph> <Paragraph position="2"> These approaches use different learning mechanisms to combine features, including decision trees (Grosz and Hirschberg, 1992; Passonneau and Litman, 1997; T&quot;ur et al., 2001) exponential models (Beeferman et al., 1999) or other probabilistic models (Hajime et al., 1998; Reynar, 1999).</Paragraph> </Section> class="xml-element"></Paper>