File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-2097_relat.xml
Size: 4,374 bytes
Last Modified: 2025-10-06 14:15:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2097"> <Title>Visual Information Based on Hidden Markov Models</Title> <Section position="4" start_page="755" end_page="756" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> In Natural Language Processing, text segmentation tasks have been actively studied for information retrieval and summarization. Hearst proposed a technique called TextTiling for subdividing texts into sub-topics (Hearst.M, 1997). This method is based on lexical co-occurrence. Galley et al. presented a domain-independent topic segmentation algorithm for multi-party speech (Galley et al., 2003). This segmentation algorithm uses automatically induced decision rules to combine linguistic features (lexical cohesion and cue phrases) and speech features (silences, overlaps and speaker change). These studies aim just at segmenting a given text, not at identifying topics of segmented texts.</Paragraph> <Paragraph position="1"> Marcu performed rhetorical parsing in the framework of Rhetorical Structure Theory (RST) based on a discourse-annotated corpus (Marcu, 2000). Although this model is suitable for analyzing local modification in a text, it is difficult for this model to capture the structure of topic transition in the whole text.</Paragraph> <Paragraph position="2"> In contrast, Barzilay and Lee modeled a content structure of texts within specific domains, such as earthquake and finance (Barzilay and Lee, 2004). They used HMMs wherein each state corresponds to a distinct topic (e.g., in earthquake domain, earthquake magnitude or previous earthquake occurrences) and generates sentences relevant to that topic according to a state-specific language model. Their method first create clusters via complete-link clustering, measuring sentence similarity by the cosine metric using word bigrams as features. They calculate initial probabilities: state s i specific language model p</Paragraph> <Paragraph position="4"> Xiao Song Cai woQie rimasu. (Cut a Chinese cabbage.) Gen Yuan woQie riLuo tosi, [?] Du Xi imasu. (Cut off its root and wash it.) Dai wariniDa Gen mooisiidesu. (A Japanese radish would taste delicious.) Zong ni3Deng Fen niQie rimasu. (Divide it into three equal parts.) dehaChao meteikimasu. (Now, we'll saute.) ....</Paragraph> <Paragraph position="5"> [individual action] [individual action] [individual action] [substitution] [individual action] [action declaration] atoShao sidesukarakokodakeWan Zhang tuteXia sai. (Just a little more and go for it!) [small talk][small talk]</Paragraph> <Paragraph position="7"> type and the word surrounded by a rectangle means an extracted utterance referring to an action. The bold word means a case frame assigned to the verb.) and state-transition probability p(s</Paragraph> <Paragraph position="9"> . Then, they continue to estimate HMM parameters with the Viterbi algorithm until the clustering stabilizes. They applied the constructed content model to two tasks: information ordering and summarization. We differ from this study in that we utilize multimodal features and domain-independent discourse features to achieve robust topic identification.</Paragraph> <Paragraph position="10"> In the field of video analysis, there have been a number of studies on shot analysis with HMMs.</Paragraph> <Paragraph position="11"> Chang et al. described a method for classifying shots into several classes for highlight extraction in baseball games (Chang et al., 2002). Nguyen et al. proposed a robust statistical framework to extract highlights from a baseball video (Nguyen et al., 2005). They applied multi-stream HMMs to control the weight among different features, such as principal component features capturing color information and frame-difference features for moving objects. Phung et al. proposed a probabilistic framework to exploit hierarchy structure for topic transition detection in educational videos (Q.Phung et al., 2005).</Paragraph> <Paragraph position="12"> Some studies attempted to utilize linguistic information in shot analysis (Jasinschi et al., 2001; Babaguchi and Nitta, 2003). For example, Babaguchi and Nitta segmented closed caption text into meaningful units and linked them to video streams in sports video. However, linguistic information they utilized was just keywords.</Paragraph> </Section> class="xml-element"></Paper>