XML Viewer - w97-1407

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1407_metho.xml
Size: 19,229 bytes
Last Modified: 2025-10-06 14:14:49
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1407">
  <Title>Scene Direction Based Reference In Drama Scenes</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 EXI CON EXI EXI conversing
3 EXI EXI i CON EXI conversing
4 EXI EXI ACT EXI going out
5 EXI EXI ABS EXI
</SectionTitle>
    <Paragraph position="0"> This example of existence/action map is interpreted as follows. At sub-scene, 0 John and Bill exist there. At sub-scene 1, Alice and Betty come into this scene. Then at sub-scene 2 and 3, all four persons are there, and Betty and John speak one after another. At sub-scene 4, John goes out from the scene. Therefore at sub-scene 5 he is no more in the scene. This kind of map is used in retrieving the image data of sub-scene as later described.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="11" type="metho">
    <SectionTitle>
3 Scene Directions Analysis
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Sentence Patterns of Scene Directions
</SectionTitle>
      <Paragraph position="0"> In this section, we describe how to extract information from scene directions in order to build an existence/action map of the scene. For this purpose, we first characterize the scene directions that are actually restricted Japanese sentences. Simple sentences used as a scene direction are classified into the following six patterns. We also show an example sentences of each pattern:  1. subject , verb phrase (:) Taroo to Hanako ga and NOM kaette -kuru.</Paragraph>
      <Paragraph position="1"> come back -kuru 'Taroo and Hanako come back.' In this type of sentence, &amp;quot;ga&amp;quot;(subject marker), &amp;quot;wa&amp;quot;(topic marker) or &amp;quot;mo&amp;quot;(topic marker + 'too') are used as a nominative particle. Moreover &amp;quot;ga&amp;quot; and &amp;quot;wa&amp;quot; are sometimes replaced with a comma &amp;quot;,&amp;quot;.</Paragraph>
      <Paragraph position="2"> 2. verb phrase , subject Soto o miru (2) outside ACC see 'Taroo who sees outside.' 3. verb phrase Taroo.</Paragraph>
      <Paragraph position="3"> Odorite iru.</Paragraph>
      <Paragraph position="4"> (3) surprised being 'C/ is surprised.' 4. subject, noun phrase (4) Taroo ga hitori.</Paragraph>
      <Paragraph position="5"> NOM alone 'Only Taroo is there.' 5. noun phrase, copula Denwa dearu.</Paragraph>
      <Paragraph position="6"> (5) phone call COPULA 'A phone call arrives.' 6. noun phrase Sibaraku-no tinmoku.</Paragraph>
      <Paragraph position="7"> (6) for a while silence  'No one speaks for a while.' We employ a very simple pattern matching based information extraction system described later based on two reasons: 1) The structures of simple sentence used as a scene direction are limited within these six patterns. 2) What we would like to extract from scene directions in order to build an existence/action map is only the following two references. Namely who is the subject, and what action or state the referent of the subject does or is in. Therefore it is enough to extract the subject and the verb (or 'the verb + the auxiliary verb').</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Subject Extraction
</SectionTitle>
      <Paragraph position="0"> Subjects are extracted by matching the patterns S generated by the following rules.</Paragraph>
      <Paragraph position="1"> rule 1 P is a proper name or a common noun.</Paragraph>
      <Paragraph position="2"> rule 2 P &lt;-- P &amp;quot;,&amp;quot;P rule 3 P +- P &amp;quot;to &amp;quot;P, where &amp;quot;to&amp;quot; means 'and' in English.</Paragraph>
      <Paragraph position="3"> rule 4 S &lt;-- P &amp;quot;,&amp;quot;1 P &amp;quot;ga&amp;quot;l P &amp;quot;wa&amp;quot;l P &amp;quot;me&amp;quot; Here a subject corresponds to P. This rule is for pattern 1 and 4.</Paragraph>
      <Paragraph position="4"> rule 5 S +- P &amp;quot;&amp;quot; This rule is for pattern 2.</Paragraph>
      <Paragraph position="5"> rule 6 S +-- P &amp;quot;dearu&amp;quot; I P &amp;quot;da&amp;quot;, where &amp;quot;dearu&amp;quot; and &amp;quot;da&amp;quot; are copulas in Japanese.</Paragraph>
      <Paragraph position="6"> This rule is for pattern 5.</Paragraph>
      <Paragraph position="7"> 54 H. Nakagawa, Y. Yaginuma and M. Sakauchi It is not necessary to extract a subject from the sentence of pattern 6. A sentence of this pattern usually describes the atmosphere of the scene. As for pattern 3, we have to infer the referent of omitted subject, namely zero subject. We have a plenty of theories for this purpose including centering theories (Brennan et. al, 1987; Kameyama, 1988; Walker et. al , 1994). Here, however, we employ a very simple rule as follows.</Paragraph>
      <Paragraph position="8"> rule 7 The referent of zero subject is the same as the referent of subject of the previous sentence. This rule is a small subset of centering theory, but as you will see later, it works well to extract a subject from a sentence of scene direction. We also apply this rule for a complex sentence in which a subject of main clause is omitted. Namely the omitted sub-ject is deemed to corefer with the explicit subject of subordinate clause. The reason of this is that 1) in a scene direction, a sequence of actions is described, and 2) in a complex sentence of scene direction, a subordinate clause describes an action or state that happens prior to the action or state described by the main clause. In other words, a subordinate and a main clause of the complex sentence can be regarded as two consecutive simple sentences.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Predicate Extraction
</SectionTitle>
      <Paragraph position="0"> As for pattern 1, 2 and 3, we can extract a predicate just by extracting a verb or a verb + an auxiliary verb from the sentence. In pattern 4, we cannot identify the exact action or state from the sentence.</Paragraph>
      <Paragraph position="1"> But at least, we know that there exists a person the subject refers to. Therefore a predicate extracted from a sentence of pattern 4 is regarded as &amp;quot;exist&amp;quot; by default. We couldn't find any reasonable predicate for sentences of pattern 5 and 6. Then we also use &amp;quot;exist&amp;quot; in these patterns as we do in pattern 4. In sum, we use the following rule to extract a predicate from a scene direction.</Paragraph>
      <Paragraph position="2"> rule 8 A predicate of the sentence is either a verb ( + an auxiliary verb) used in a sentence of pattern 1, 2, or 3, or &amp;quot;exist'in a sentence of pattern 4, 5or 6.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="11" type="sub_section">
      <SectionTitle>
3.4 Building An Existence/Action Map
</SectionTitle>
      <Paragraph position="0"> In this system, the purpose of semantic interpretation is limited to build an existence/action map from the extracted subjects and predicates by the way described in section 3.2 and 3.3. For this purpose, the key element of predicate is the so called directional auxiliary verb(Kuno, 1978). There are several directional auxiliary verbs in Japanese. Among them, the most essential ones for our purpose are &amp;quot;-tekuru&amp;quot; and &amp;quot;-teiku&amp;quot;. The directions indicated by these auxiliary verbs are defined relative to the position of camera. If &amp;quot;-tekuru&amp;quot; is a part of the predicate of sentence, a referent of the subject of the sentence comes into the camera angle and/or is approaching the camera. If &amp;quot;-teiku&amp;quot; is a part of the predicate of sentence, a referent of the subject of the sentence goes away from the camera and probably is out of the camera angle. The situation is depicted as shown in figure 1.</Paragraph>
      <Paragraph position="2"> In addition, basic verbs, &amp;quot;kuru('come')&amp;quot; and &amp;quot;iku('go')&amp;quot;, also express the same type of sense of direction as &amp;quot;-tekuru&amp;quot; and &amp;quot;-teiku&amp;quot; do respectively. By these considerations, we drive the following two default rules to infer existence or absence of the referent of the subject in the scene.</Paragraph>
      <Paragraph position="3"> rule 9 If the predicate of sentence includes an auxiliary verb &amp;quot;-tekuru&amp;quot; or a verb &amp;quot;kuru&amp;quot;, a referent of the subject of the sentence had not been in the scene beforehand, and just has come into the scene.</Paragraph>
      <Paragraph position="4"> rule 10 If the predicate of sentence includes an auxiliary verb &amp;quot;-teiku&amp;quot; or a verb &amp;quot;iku&amp;quot;, a referent of the subject of the sentence will not exist in the scene afterward.</Paragraph>
      <Paragraph position="5"> Of course, these two rules are default rules, and there are exceptional cases. For instance, if a sentence explicitly describes that Taroo has existed in the scene beforehand, even though we encounter the sentence that tikayot -tekuru. (7) Taroo SUB approach 'Taroo approaches to here.' , we infer that Taroo has already been in the scene.</Paragraph>
      <Paragraph position="6"> Or in &amp;quot;-teiku&amp;quot; case, if the sentence:  'Taroo is hit by Hanako.' , then we infer that Taroo is still in the scene after the action described by (9).</Paragraph>
      <Paragraph position="7"> Scene Direction Based Reference in Drama Scenes 55 Another exceptional case is that a verb is either stative or state change without action. For instance, the sentence: (10) kaoiro-ga-warukunat -teiku. Taroo SUB become pale 'Taroo turns pale.' indicates that Taroo is still in the scene when he looks pale. We can identify these kind of verbs, stative and state change without action, with a dictionary like IPAL-Basic Verb Dictionary For Computer(IPA, 1990).</Paragraph>
      <Paragraph position="8"> Although the rules and the treatment of exceptional cases are very important, generally if a sentence of scene direction describes that the referent of subject does an action or is in a certain state, the referent surely exists in the scene. Another group of expressions that are frequently used and are important to build an existence/action map are a stative verb &amp;quot;iru('exist')&amp;quot; and its negation &amp;quot;inai('not exist').&amp;quot; They explicitly show the existence or non-existence of a referent of subject in the scene. Then we have the following two rules.</Paragraph>
      <Paragraph position="9"> rule 11 If a sentence describes an action or state, the referent of the subject is in the scene.</Paragraph>
      <Paragraph position="10"> rule 12 If &amp;quot;iru&amp;quot; is used as a predicate, the referent of the subject is in the scene.</Paragraph>
      <Paragraph position="11"> If &amp;quot;inai&amp;quot; is used as a predicate, the referent of the subject is not in the scene.</Paragraph>
      <Paragraph position="12"> These two rules can override the results we infer by rule 9 or rule 10, because rule 9 and 10 are default rules and rule 11 and 12 explicitly describe the scene. One question we have here is that if we encounter a negative predicate like &amp;quot;inai&amp;quot;, how should we infer. However, in reality, we don't find negative predicates except for &amp;quot;inai&amp;quot; in scene directions, because scene directions describe what players should do in the scene, and they almost never describe what players shouldn't do in the scene. Things not to be done in a scene are usually directed by the human director of the drama.</Paragraph>
      <Paragraph position="13"> Now we show an example of existence/action map built from the following scene directions.</Paragraph>
      <Paragraph position="14">  (11) Taroo to Jiro ga and SUB kaet -tekuru.</Paragraph>
      <Paragraph position="15"> come back 'Taroo and Jiro come back.' Taroo &amp;quot; tadaima &amp;quot; (12) SUB &amp;quot; I'm home &amp;quot; 'Taroo says &amp;quot; I'm home.&amp;quot;' (13) Hanako ~ &amp;quot;gokurousan&amp;quot; UB &amp;quot; you did well. &amp;quot; 'IIanako says &amp;quot; You did well&amp;quot;' (14) to nagusameru.</Paragraph>
      <Paragraph position="16"> and comfort 'and comforts two of them.'  We build an existence/action map by the following procedure.</Paragraph>
      <Paragraph position="17"> step 0 Step 1 through step 3 are applied sequentially for scene directions in a sentence by sentence manner.</Paragraph>
      <Paragraph position="18"> step 1 A sentence of scene directions is analyzed with the Japanese morphological analyzer JU-MAN(Matsumoto et. al , 1992) to segment a sentence into a sequence of word accompanied by part of speech tags.</Paragraph>
      <Paragraph position="19"> step 2 The subject and predicate of sentence are extracted using rule 1 through 8.</Paragraph>
      <Paragraph position="20"> step 3 For each player the value of sub-scene, namely, ABS, EXI, CON or ACT, is inferred with rule 9 through 12.</Paragraph>
      <Paragraph position="21"> We did build existence/action maps for scene directions of five Japanese dramas. These include a suspense drama, a home drama, a love story, a school life drama, and a comedy drama. Each drama lasts one hour ( including CM time). The number of the sentence we analyze is 1272. The first results which are shown in the table 2 are the rates that step 1 / 56 H. Nakagawa, Y. Yaginuma and M. Sakauchi and 2 correctly extract subjects and predicates. We use not a parser which is based on phrase structure rules but a simple pattern matching based on rule 1 through 8. Nevertheless these results indicate that our rules for extracting subjects and predicates work quite well.</Paragraph>
      <Paragraph position="22"> that even though our natural language analysis system employs a shallow understanding mechanism which is easily implemented with today's NLP technologies, it works very well for scene directions. This is a very limited area but useful for scene retrieval system, which is a promising application of multi-media information retrieval.</Paragraph>
      <Paragraph position="23">  The main reason of failing to extract a subject is the failure of inferring the referent of zero subject. The almost all reasons of failing predicate extraction is the failure of morphological analyzer.</Paragraph>
      <Paragraph position="24"> The next results we show are the accuracy of our existence/action map. The key factor for scene retrieval is whether a specific player appears on the scene or not. Therefore we focus on how accurately existences and absences, namely EXIs and ABSs, are inferred. We estimate this with recall and precision rates defined as follows.</Paragraph>
      <Paragraph position="25">  where ~CI, #I, and #I means &amp;quot;number of correctly inferred cases in the map by our rules&amp;quot;, &amp;quot;number of cases to be correctly inferred&amp;quot; , and &amp;quot;number of all cases inferred by our rules&amp;quot; , respectively. The results are shown in table 3, and they are extremely  Our rules derived based on semantics of &amp;quot;-tekuru&amp;quot; and &amp;quot;-teiku&amp;quot; are proven to work correctly in almost all cases. The remarkable point in these results is</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="11" end_page="11" type="metho">
    <SectionTitle>
4 Scene Retrieval System
</SectionTitle>
    <Paragraph position="0"> We develop the scene retrieval system for drama scenes based on an existence/action map. A sub-scene is a unit of retrieval because players on the scene change in a sub-scene by sub-scene manner as you see in an existence/action map. Therefore we have to find the correspondence between a sub-scene and a set of real image frames. The multimedia data we have consist of 1) a sequence of image frames, 2) audio track data, and 3) script of drama including scene directions. The temporal correspondence, or in other words synchronization, of these three types of media data is calculated with DP matching technique we have already developed(Yaginuma, 1993).</Paragraph>
    <Paragraph position="1"> Owing to these correspondences, we identify each set of frames that corresponds to the each part of dialog.</Paragraph>
    <Paragraph position="2"> Then we regard a set of sequential frames between two adjacent lines of dialog as the sub-scene corresponding to the scene direction.</Paragraph>
    <Paragraph position="3"> Based on these structures holding among image frames and sub-scene, we can retrieve an image frame that corresponds to a query, by the following procedure.</Paragraph>
    <Paragraph position="4">  1. Input a query that consists of the time, location, player's name and her/his action.</Paragraph>
    <Paragraph position="5"> 2. Search the sub-scene that matches the condition stated in the input query using the existence/action map.</Paragraph>
    <Paragraph position="6"> 3. Extract a set of frame which correspond to the  searched sub-scene, and display them on the image screen of user's GUI.</Paragraph>
    <Paragraph position="7"> Our system uses Netscape Navigator as a GUI.</Paragraph>
    <Paragraph position="8"> The retrieval system is implemented as JAVA applets which work as a CGI. The following figures are screen images of GUI of our scene retrieval system. Figure 2 is an introductory screen of GUI of our scene retrieval system. In it, introductory scenes of drama videos are displayed in every track.</Paragraph>
    <Paragraph position="9"> Figure 3 is a screen in which several input forms are shown. Dousanushi('agent' in English) input form indicates a list of the players' name, from which we select one of the names. Then scenes in which the player of the selected name exists is searched. Basho(meaning 'location') and 3ikoku(meaning 'time') input forms indicate the list of locations and the list of times, from which we select one value for each as query terms.</Paragraph>
    <Paragraph position="10"> Dousa(meaning 'action') input form indicates the list of verbs. If we select one of them, it comes to be a</Paragraph>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
Scene Direction Based Reference in Drama Scenes 57
</SectionTitle>
      <Paragraph position="0"> term of the whole query, and specifys a verb appearing in the existence/action map. In Keywordl and Keyword2 input forms we can write other keywords of retrieval condition. All these inputs are combined together to be used as one query. Then the retrieval system seeks scenes that meet all these conditions in the query by consulting the existence/action map.</Paragraph>
      <Paragraph position="1"> Figure 4 is the result of retrieval. In the upper area, the contents of the query are shown. In the middle area, retrieved scenes are displayed. In the bottom area, track number that corresponds to the retrived drama scenes is shown. In this example, the query is as follows: player's name is &amp;quot;Yasuura&amp;quot;, the action is &amp;quot;kuru('come')&amp;quot;, the location and time are not specified, and no keywords are given. Then the player whose role name is Yasuura appears in all the retrieved scenes, and he is surely approaching to the camera in all of the retrieved scenes. Namely the system successfully retrieves the scenes that meet the query.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML