File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0205_metho.xml
Size: 9,796 bytes
Last Modified: 2025-10-06 14:15:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0205"> <Title>(CMU), and the Battelle Pacific Northwest National Laboratories</Title> <Section position="2" start_page="33" end_page="33" type="metho"> <SectionTitle> AR&T Multimedia Group began support of the CMU Informedia project and obtained a </SectionTitle> <Paragraph position="0"> demonstration system for searching indexed digital video information. At the heart of Informedia is a subsystem that creates text metadata that is descriptive of digital video content. This suggested awav to integrate the two systems.</Paragraph> </Section> <Section position="3" start_page="33" end_page="33" type="metho"> <SectionTitle> 1 Informedia </SectionTitle> <Paragraph position="0"> The Informedia Project has established a large on-line digital video library, incorporating video assets from WQED/Pittsburgh. The project is creating intelligent, automatic mechanisms for populating the library and allowing for its full-content and knowledge-based search and segment retrieval. Our approach applies several techniques for content-based searching and video-sequence retrieval. Content is conveyed in both the narrative (speech and language) and the image.</Paragraph> <Paragraph position="1"> Only by the collaborative interaction of image, speech, and natural-language understanding technology can we successfully populate, segment, index, and search diverse video collections with satisfactory recall and precision.</Paragraph> <Paragraph position="2"> The Informedia Project uses the Sphinx-II speech recognition system to transcribe narratives and dialogues automatically. The resulting transcript is then processed with methods of natural language understanding to extract subjective descriptions and mark potential segment boundaries where significant semantic changes occur. Comparative difference measures are used in processing the video to mark potential segment boundaries.</Paragraph> <Paragraph position="3"> Images with small lfistogram disparity are considered to be relatively equivalent. By detecting significant changes in the weighted histogram of each successive frame, a sequence of images can be grouped into a segment. This simple and robust method for segmentation is fast and can detect 90% of the scene changes in video.</Paragraph> <Paragraph position="4"> Segment breaks produced by image processing are examined along with the boundaries identified by the natural language processing of the transcript, and an improved set of segment boundaries are heuristically derived to partition the video library into sets of segments, or &quot;video paragraphs.&quot; Figure 1 illustrates the video searching facilities of Informedia, which include: * Filmstrip (lower-left of Figure 1) - Select thumbnail images to view video paragraph.</Paragraph> <Paragraph position="5"> * Selective play (upper right of Figure 1) -Prev/next paragraph, prev/next term hit.</Paragraph> <Paragraph position="6"> * Cursor browse - Abstracts and search terms available in filmstrip and play window.</Paragraph> <Paragraph position="7"> * Text query (upper left of Figure 1) - Terms parsed from natural language query.</Paragraph> <Paragraph position="8"> * Skim (not shown) - View video in 10% of normal time.</Paragraph> <Paragraph position="9"> Figure 1 - Informedia Search Screen The reader can find more in-depth discussions of the Informedia project and technologies in</Paragraph> </Section> <Section position="4" start_page="33" end_page="34" type="metho"> <SectionTitle> References 1-4. </SectionTitle> <Paragraph position="0"/> </Section> <Section position="5" start_page="34" end_page="36" type="metho"> <SectionTitle> 2 Starlight </SectionTitle> <Paragraph position="0"> Starlight was originally developed as an interactive information visualization environment for the US Army Intelligence and Security Command (INSCOM). It is designed to integrate several types of data (unstructured and structured text documents, geographic information, and digital imagery') into a single analysis space for rapid comparison of content and interrelationships (see reference 5). In this section, we will concentrate on the Starlight text processing and indexing functions.</Paragraph> <Paragraph position="1"> A major problem with incorporating free-text documents into a visualization environment is that each document must be coded so that it can be clustered with other documents. The Boeing TPT is a prototype software engine that supports automatic coding and categorization of documents, concept-based querying, and visualization over large text document databases. The TPT combines techniques from statistics, linear algebra, and computational linguistics in order to take account of the total context in which words occur in a given document or qUery; it statistically compares a document context with similar contexts from other documents in the database. Through this technique, document sets can be represented in a way that supports visualization and analysis by the presentation components of Starlight.</Paragraph> <Paragraph position="2"> The TPT performs two functions: First. it provides a powerful and flexible mechanism for concept-based searching over large text databases; second, it automatically assigns individual text units to coordinates in a userconfigurable 3D semantic space. Both of these functions derive from the TPT core technique of representing large numbers of text units as points in a higher dimensional space and performing similarity calculations in this space. Conceptually, the flow of data through the TPT is diagrammed in Figure 2: Prior to TPT indexing, a text collection must be preprocessed in three stages involving manual intervention: First, the text is divided into units with topical granularity that best correlates with the expected query patterns. The units can be titles, subject lines, abstracts, individual paragraphs or an entire document. It can also be a caption or a piece of transcribed text from a video. Next, the text is &quot;tokenized&quot; into individual words and phrases. Finally, a list of &quot;'stopwords,'&quot; or ignored terms, is chosen. These include determiners (e.g., a, the), conjunctions (e.g., and, or), and relatives (e.g., what, which), and certain domain-specific terms.</Paragraph> <Paragraph position="3"> After preprocessing, operation of the TPT indexing system is automatic (see Figure 2). The software builds a document/term matrix, performs several transformation and dimension reduction calculations, and stores output matrices in an object-oriented database for use by the Starlight visualization component.</Paragraph> <Paragraph position="4"> Users of Starlight can visually explore the topical structure of a large text database by navigating through a 3D topic space where each item is represented as a point in a scatterplot. Items with close visual proximity have similar content. The TPT provides a selection of dimensions (with associated topic words), any three of which can potentially be selected for axes of the scatterplot.</Paragraph> <Paragraph position="5"> Figure 3 shows a Starlight visualization screen containing a 3D scatterplot of the 322-item video metadata extracted from InformecUa: each axis is labeled with the dominant topics measured by the TPT. At the right of Figure 3 is an example query with results.</Paragraph> <Paragraph position="6"> Figure 3 - Starlight Visualization Screen</Paragraph> </Section> <Section position="6" start_page="36" end_page="37" type="metho"> <SectionTitle> 3 Integration </SectionTitle> <Paragraph position="0"> Integration of the two systems required the insertion of two processing steps, each involving new software development. Figure 4 shows the flow of data in and between the two systems; new integration elements are shown as shaded boxes. The two key elements are: (1) Extract video paragraph text metadata from Informedia, and (2) Display selected video paragraphs.</Paragraph> <Paragraph position="2"> Extraction of metadata is accomplished by a C program that reads ASCII text and control parameters from several flies in the Informedia system and writes a collection of items compatible with TPT processing. Each item represents one video paragraph and includes the following fields: Display of video paragraphs occurs in the context of a web page containing a video viewer and the text transcript; Figure 5 shows an example. When the user selects a particular document within Starlight. a browser displays the HTML page. The browser was coded in Java and the MPEG viewer is a Microsoft ActiveMovie control. Each digital video file was kept intact, that is. the video paragraphs were not partitioned into separate files; rather, paragraphs are viewed by playing from the specified &quot;In&quot; frame to the &quot;Out&quot; frame of the appropriate MPEG file.</Paragraph> <Paragraph position="3"> advantages. Neither system was designed for narrow, precise data querying, rather Starlight features all-inclusive views and Informedia facilitates iterative, progressive narrowing of focus. Motivation for integrating the systems was twofold: Give Starlight access to multimedia data, and investigate possible advantages of the global overview that Starlight can bring to the Informedia database.</Paragraph> <Paragraph position="4"> We collected the following list of video titles from diverse sources at Boeing: Altogether, this material comprises about eight hours of viewing. The digitized MPEG files occupy approximately 4.5GB of disk space and the metadata (transcripts, thumbnail images, and cross-reference files) is about 25MB in size. Figure 5 - Video Paragraph Display</Paragraph> </Section> class="xml-element"></Paper>