File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1003_intro.xml
Size: 2,608 bytes
Last Modified: 2025-10-06 14:06:10
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1003"> <Title>High Performance Segmentation of Spontaneous Speech Using Part of Speech and Trigger Word Information</Title> <Section position="3" start_page="0" end_page="12" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In the area of machine translation, one important interface is that between the speech recognizer and the parser. In the case of human-to-human dialogues, the speech recognizer's output is a sequence of turns (a contiguous segment of a single speaker's utterance) which in turn can consist of multiple clauses.</Paragraph> <Paragraph position="1"> Lavie et al. (1996) discuss that using smaller units rather than whole turns can greatly facilitate the task of the parser since it reduces the complexity of its input.</Paragraph> <Paragraph position="2"> The problem is thus how to correctly segment an utterance into clauses.</Paragraph> <Paragraph position="3"> The segmentation procedure described in Lavie et al. (1996) uses a combination of acoustic information, statistical calculation of boundary-trigrams, some highly indicative keywords and also some heuristics from the parser itself.</Paragraph> <Paragraph position="4"> Stolcke and Shriberg (1996) studied the relevance of several word-level features for segmentation performance on the Switchboard corpus (see Godfrey et al. (1992)). Their best results were achieved by using part of speech n-grams, enhanced by a couple of trigger words and biases.</Paragraph> <Paragraph position="5"> Another, more acoustics-based approach for turn segmentation is reported in Takagi and Itahashi (1996).</Paragraph> <Paragraph position="6"> Palmer and Hearst (1994) used a neural network to find sentence boundaries in running text, i.e. to determine whether a period indicates end of sentence or end of abbreviation. The input to their network is a window of words centered around a period, where each word is encoded as a vector of 20 reals: 18 values corresponding to the word's probabilistic membership to each of 18 classes and 2 values representing whether the word is capitalized and whether it follows a punctuation mark. Their best result of 98.5% accuracy was achieved with a context of 6 words and 2 hidden units.</Paragraph> <Paragraph position="7"> In this paper we bring their idea to the realm of speech and investigate the performance of a neural network on the task of turn segmentation using parts of speech, indicative keywords, or both of these features to hypothesize segment boundaries.</Paragraph> </Section> class="xml-element"></Paper>