File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1003_concl.xml
Size: 1,913 bytes
Last Modified: 2025-10-06 13:55:13
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1003"> <Title>Unsupervised Topic Modelling for Multi-Party Spoken Discourse</Title> <Section position="7" start_page="22" end_page="23" type="concl"> <SectionTitle> 5 Summary and Future Work </SectionTitle> <Paragraph position="0"> We have presented an unsupervised generative model which allows topic segmentation and identification from unlabelled data. Performance on the ICSI corpus of multi-party meetings is comparablewiththepreviousunsupervisedsegmentation null results, and the extracted topics are rated well by human judges. Segmentation accuracy is robust in the face of noise, both in the form of off-topic discussion and speech recognition hypotheses.</Paragraph> <Paragraph position="1"> Future Work Spoken discourse exhibits several features not derived from the words themselves but which seem intuitively useful for segmentation, e.g. speaker changes, speaker identities and roles, silences, overlaps, prosody and so on. As shown by (Galley et al., 2003), some of these features can be combined with lexical information to improve segmentation performance (although in a supervised manner), and (Maskey and Hirschberg, 2003) show some success in broadcast news segmentation using only these kinds of non-lexical features. We are currently investigating the addition of non-lexical features as observed outputs in our unsupervised generative model.</Paragraph> <Paragraph position="2"> We are also investigating improvements into the lexical model as presented here, firstly via simple techniques such as word stemming and replacement of named entities by generic class tokens (Barzilay and Lee, 2004); but also via the use of multiple ASR hypotheses by incorporating word confusion networks into our model. We expect that this will allow improved segmentation and identification performance with ASR data.</Paragraph> </Section> class="xml-element"></Paper>