File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0304_concl.xml
Size: 1,657 bytes
Last Modified: 2025-10-06 13:57:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0304"> <Title>Text Segmentation Using Exponential Models*</Title> <Section position="11" start_page="45" end_page="45" type="concl"> <SectionTitle> 8 Conclusions </SectionTitle> <Paragraph position="0"> We have presented and evaluated a new statistical model for segmenting unpartitioned text into coherent fragments. We leverage long- and short-range language models, as well as automatic feature induction techniques, in the design of this model. In this work we rely exclusively on simple lexical features, including a topicality measure called relevance and a number of vocabulary features that are induced from a large space of candidate features.</Paragraph> <Paragraph position="1"> We have proposed a new probabilistically motivated error metric for the assessment of segmentation algorithms. Qualitative assessment as well as the evaluation of our algorithm with this new metric demonstrates its effectiveness in two very different domains, Wall Street Journal articles and broadcast news transcripts.</Paragraph> <Paragraph position="2"> Our immediate application of this model will be to the video-on-demand application called Informedia (Christel et al., 1995). We intend to mix simple audio and video features such as statistics from pauses, black frames, and color histograms with our lexical features in order to segment news broadcasts into component stories. Other applications that we have not explored in this paper include automatic inference of subtopic structure for information retrieval, document summarization, and improved language modeling.</Paragraph> </Section> class="xml-element"></Paper>