File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2707_metho.xml
Size: 10,990 bytes
Last Modified: 2025-10-06 14:10:54
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2707"> <Title>Annotating text using the Linguistic Description Scheme of MPEG-7: The DIRECT-INFO Scenario</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> DFKI GmbH </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We describe the way we adapted a text analysis tool for annotating with the Linguistic Description Scheme of MPEG-7 text related to and extracted from multi-media content. Practically applied in the</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> DIRECT-INFO EC R&D project we </SectionTitle> <Paragraph position="0"> show how such linguistic annotation contributes to semantic annotation of multi-modal analysis systems, demonstrating also the use of the XML schema of MPEG-7 for supporting cross-media semantic content annotation.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In the R&D project DIRECT-INFO the concrete business case of sponsorship tracking was targeted. The scenario investigated within the project was that sponsors want to know how often their brands are mentioned in connection with the sponsored company. The visual detection of a brand (e.g. in videos) is not sufficient to meet the requirements of this business case. Multimodal analysis and fusion - as implemented within DIRECT-INFO - is needed in order to fulfill these requirements (Rehatschek, 2004).</Paragraph> <Paragraph position="1"> Within this context text analysis has been applied to documents reporting on entities, like football teams, that have close relations to large sponsoring companies. In the text analysis component of the system we had to detect if an entity was mentioned positively, negatively or neutrally. Besides all the processing and annotation issues to positive or negative mentions, we had to make our results available to a global MPEG-7 document, which is encoding the annotation results of various analysis of the modalities involved (logo detection, speech recognition, text analysis etc.). This global MPEG-7 document was the input for a fusion component.</Paragraph> <Paragraph position="2"> In the next sections we describe the Text</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Analysis (TA) component of DIRECT-INFO. </SectionTitle> <Paragraph position="0"> We then briefly describe the linguistic description scheme (LDS) of MPEG-7 and show the annotation generated by the TA. Finally we briefly discuss the role the LDS, and generally speaking MPEG-7, can play in supporting an interoperable cross-media annotation strategy. It seems to us, that LDS is offering a good mean for adding semantic metadata to image/video, but not for a real semantic integration of text and media content annotation, which in the case of DIRECT-INFO was performed by an additional fusion component.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="53" type="metho"> <SectionTitle> 2 The detection of positive/negative </SectionTitle> <Paragraph position="0"> mentioning Our work in DIRECT-INFO has been dedicated in enhancing an already existing tool for linguistic annotation. This tool, called SCHUG (Shallow and CHunk-based Unification Grammar tool), is annotating texts considering both linguistic constituency and dependency structures (T. Declerck, M. Vela 2005).</Paragraph> <Paragraph position="1"> A first development step was dedicated in creating specialized lexicons for various types of lexical categories (like nouns, adjectives and verbs) that can bear the property of being intrinsically positive or negative in a specific domain, as can be seen just below in the case of soccer:</Paragraph> <Paragraph position="3"> Considering a sentence like &quot;ManU takes the command in the game against the weak Spanish team&quot;, the head-noun of the direct object (linguistically speaking) &quot;the command&quot; gets from the access to the specialized DIRECT-INFO lexicon a tag &quot;INTERPRETATION&quot; with value &quot;positive&quot;. Whereas the adjective &quot;weak&quot; in the PP-adjunct &quot;in the game against the weak Spanish team&quot; gets an &quot;INTERPRETATION&quot; tag with value &quot;negative&quot;.</Paragraph> <Paragraph position="4"> Once the words in the sentence have been lexically tagged with respect to their interpretation, the computing of the pos./neg. interpretation at the level of linguistic fragments and then at the level of the sentences can start. For this we have defined heuristics along the lines of the dependency structures delivered by the linguistic analysis. So in the case of the NP &quot;the weak Spanish team&quot;, the head noun &quot;team&quot;, as such a neutral expression, is getting the &quot;INTERPRETATION&quot; tag with the value &quot;negative&quot;, since it is modified by a &quot;negative&quot; adjective. In case the reference resolution algorithm of the linguistic tools has been able to specify that the &quot;Spanish team&quot; is in fact &quot;Real Madrid&quot; this entity gets a negative &quot;INTERPRETATION&quot; tag.</Paragraph> <Paragraph position="5"> The head noun of the NP realizing the subject of the sentence, &quot;ManU&quot; gets a positive mention tag, since it is the subject of a positive verb and direct object combination (the NP &quot;the command&quot; having a positive reading, whereas the verb &quot;takes&quot; has a neutral reading).</Paragraph> <Paragraph position="6"> A last aspect to be mentioned here concerns the treatment of the so-called polarity items.</Paragraph> <Paragraph position="7"> Specific words in natural language intrinsically carry a negation or position force (or scope). So the words not, none or no have an intrinsic negation force and negate the words and fragments in the context in which those specific words are occurring. The context that is negated by such words can be also called the &quot;scope&quot; (or the range) of the negation. Consider for example the sentence: &quot;I would definitely pay PS15 million to get Owen, not even a decent striker, instead...&quot; Our tools are able to detect that the NP &quot;decent striker&quot; is negated, and therefore the positive reading of &quot;decent striker&quot; is being ruled out.</Paragraph> </Section> <Section position="6" start_page="53" end_page="55" type="metho"> <SectionTitle> 3 Metadata Description </SectionTitle> <Paragraph position="0"> The different content analysis modules of the DIRECT-INFO system extract different types of metadata, ranging from low-level audiovisual feature descriptions to semantic metadata. The global metadata description must be rich and has to clearly interrelate the various analysis results, as it is the input of the fusion component.</Paragraph> <Section position="1" start_page="53" end_page="55" type="sub_section"> <SectionTitle> 4.1 Using MPEG-7 for Detailed Description of Audiovisual Content </SectionTitle> <Paragraph position="0"> In DIRECT-INFO the MPEG-7 standard is used for metadata description. It is an excellent choice for describing audiovisual content because of its comprehensiveness and flexibility. The comprehensiveness results from the fact that the standard has been designed for a broad range of applications and thus employs very general and widely applicable concepts. The standard contains a large set of tools for diverse types of annotations on different semantic levels. The flexibility of MPEG-7, which is provided by a high level of generality, makes it usable for a broad application area without imposing strict constraints on the metadata models of these applications. The flexibility is very much based on the structuring tools and allows the description to be modular and on different levels of abstraction.</Paragraph> <Paragraph position="1"> MPEG-7 supports fine grained description, and it is possible to attach descriptors to arbitrary segments on any level of detail of the description.</Paragraph> <Paragraph position="2"> Among the descriptive tools developed within the MPEG-7 framework, one is concerned with the use of natural language for adding metadata to the content description of image and video: the tion that can be attached as metadata to some audio-video content. The natural language expression used here is &quot;Spain scores a goal against Sweden. The scoring player is Morientes&quot;.</Paragraph> <Paragraph position="3"> Free Text Annotation: Here only tags are put around the text: On the base of the linguistic analysis of our dependency parser, we generate the &quot;structured annotation&quot; of the MPEG-7 Linguistic Description Scheme. We think that this kind of annotation is the most practical of LDS for adding semantics to multimedia content, since it is probably more intuitive for the media expert as the underlying linguistic dependency structure. At the same time it seems also straightforward to go first for a (internal) dependency analysis, since it is then relatively easy to map automatically dependency units to the &quot;Who&quot;, &quot;WhatAction&quot; and other tags of LDS.</Paragraph> <Paragraph position="4"> The MPEG-7 output of the TA module of DI- null Without going into too much detail here, it is enough to stress that in the first part of the annotation, the link to the general multimedia and multimodal repository is ensured. We have to deal with a PDF document that should be processed by a Text Analysis tool. The &quot;essence&quot; ID is giving information about the location where the application relevant data is stored and where the results of the Text Analysis should be stored. All this metadata is ensuring the combination of the results of the analysis of various modalities dealing with one application relevant dataset (for example the combination of the logo detection of a brand and the related positive or negative mentioning of a team sponsored by this brand). For reason of place, we can not show and comment here the complete (and multimodal) MPEG-7 annotation, but details are given in (G. Kienast, 2005).</Paragraph> <Paragraph position="5"> The second part of the annotation gives the results of the combined linguistic and &quot;structured&quot; analysis we are dealing with. As mentioned above, in the case of DIRECT-INFO, results of text analysis are accessed via the structured annotation of the Linguistic Description Schema of MPEG-7.</Paragraph> </Section> </Section> class="xml-element"></Paper>