XML Viewer - p06-2099

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2099_intro.xml
Size: 16,250 bytes
Last Modified: 2025-10-06 14:03:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2099">
  <Title>Compiling a Lexicon of Cooking Actions for Animation Generation</Title>
  <Section position="3" start_page="0" end_page="774" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The ability to visualize procedures or instructions is important for understanding documents that guide orinstruct us, such ascomputer manuals or cooking recipes. We can understand such documents more easily by seeing corresponding figures or animations. Several researchers have studied the visualization of documents (Coyne and Sproat, 2001), including the generation of animation (Andre and Rist, 1996; Towns et al., 1998). Such animation systems help people to understand instructions in documents. Among the various types of documents, this research focuses on the visualization of cooking recipes.</Paragraph>
    <Paragraph position="1"> Many studies related to the analysis or generation of cooking recipes have been done (Adachi, 1997; Webber and Eugenio, 1990; Hayashi et al., 2003; Shibata et al., 2003). Especially, several researchers have proposed animation generation systems in the cooking domain. Karlin, for example, developed SEAFACT (Semantic Analysis For the Animation of Cooking Tasks), which analyzed verbal modifiers to determine several features of an action, such as the aspectual category of an event, the number of repetitions, duration, speed, and so on (Karlin, 1988). Uematsu developed &amp;quot;Captain Cook,&amp;quot; which generated animations from cooking recipes written in Japanese (Uematsu et al., 2001). However, these previous works did not mention the scalability of the systems. There are many linguistic expressions in the cooking domain, but it is uncertain to what extent these systems can convert them to animations.</Paragraph>
    <Paragraph position="2"> This paper also aims at developing a system to generate animations from cooking recipes written in Japanese. We especially focused on increasing the variety of recipes that could be accepted. After presenting an overview of our proposed system in Subsections 2.1 and 2.2, the more concrete goals of this paper will be described in Subsection 2.3.</Paragraph>
    <Section position="1" start_page="0" end_page="771" type="sub_section">
      <SectionTitle>
2ProposedSystem
2.1 Overview
</SectionTitle>
      <Paragraph position="0"> The overview of our animation generation system is as follows. The system displays a cooking recipe in a browser. As in a typical recipe, cooking instructions are displayed step by step, and sentences or phrases representing a cooking action in the recipe are highlighted. When a user does not understand a certain cooking action, he/she can click the highlighted sentence/phrase. Then the system will show the corresponding animation to help the user understand the cooking instruction.</Paragraph>
      <Paragraph position="1"> Note that the system does not show all procedures in a recipe like a movie, but generates an animation of a single action on demand. Furthermore, we do not aim at the reproduction of recipe sentences in detail. Especially, we will not prepare object data for many different kinds of ingredients.</Paragraph>
      <Paragraph position="2"> For example, suppose that the system has object data for a mackerel, but not for a sardine. When a user clicks the sentence &amp;quot;fillet a sardine&amp;quot; to see the animation, the system will show how to fillet a &amp;quot;mackerel&amp;quot; instead of &amp;quot;sardine&amp;quot;, with a note indicating that the ingredient is different. We believe  that the user will be more interested in &amp;quot;how to fillet&amp;quot; than in the specific ingredient to be filleted. In other words, the animation of the action will be equally helpful as long as the ingredients are similar. Thus we will not make a great effort to prepare animations for many kinds of ingredients. Instead, we will focus on producing the various kinds of cooking actions, to support users in understanding cooking instructions in recipes.</Paragraph>
    </Section>
    <Section position="2" start_page="771" end_page="771" type="sub_section">
      <SectionTitle>
2.2 System Architecture
</SectionTitle>
      <Paragraph position="0"> Figure 1illustrates thearchitecture ofthe proposed system. First, we prepare the lexicon of cooking actions. This is the collection of cooking actions such as &amp;quot;fry&amp;quot;, &amp;quot;chop finely&amp;quot;, etc. The lexicon has enough knowledge to generate an animation for each cooking action. Figure 2 shows an example of an entry in the lexicon. In the figure, &amp;quot;expression&amp;quot; is a linguistic expression for the action; &amp;quot;action plan&amp;quot; is a sequence of action primitives, which are the minimum action units for animation generation. Roughly speaking, the action plan in Figure 2 represents a series of primitive actions, such as cutting and rotating an ingredient, for the basic action &amp;quot;chop finely&amp;quot;. The system will generate an animation according to the action plan in the lexicon. Other features, &amp;quot;ingredient examples&amp;quot; and &amp;quot;ingredient requirement&amp;quot;, will be explained later.</Paragraph>
      <Paragraph position="1"> The process of generating an animation is as follows. First, as shown in Figure 1, the system compares an input sentence and expression of the entries in the lexicon of cooking actions, and finds the appropriate cooking action. This is done by the module &amp;quot;Action Matcher&amp;quot;. Then, the system extracts an action plan from the lexicon and passes it tothe &amp;quot;Animation Generator&amp;quot; module. Finally Animation Generator interprets the action plan and produces the animation.</Paragraph>
    </Section>
    <Section position="3" start_page="771" end_page="772" type="sub_section">
      <SectionTitle>
2.3 Goal
</SectionTitle>
      <Paragraph position="0"> The major goals of this paper are summarized as follows: G1. Construct alarge-scale lexicon of cooking actions null In order to generate animations for various kinds of cooking actions, we must prepare a lexicon containing many basic actions. G2. Handle a variety of linguistic expressions Various linguistic expressions for cooking actions may occur in recipes. It is not realistic to include all possible expressions in the lexicon. Therefore, when a linguistic expression in an input sentence is not included in the lexicon, the system should calculate the similarity between it and the basic action in the lexicon, and find an equivalent or almost similar action.</Paragraph>
      <Paragraph position="1"> G3. Include information about acceptable ingredients in the lexicon Even though linguistic expressions are the same, cooking actions may be different according to the ingredient upon which the action is taken. For example, &amp;quot;cut into fine strips&amp;quot; may stand for several different cooking actions. That is, the action of &amp;quot;cut cucumber into fine strips&amp;quot; may be different than &amp;quot;cut cabbage into fine strips&amp;quot;, because the shapes of cucumber and cabbage are rather different. Therefore, each entry in the lexicon should include information about what kinds of ingredients are acceptable for a certain cooking action.</Paragraph>
      <Paragraph position="2"> As mentioned earlier, the main goal of this research is to increase the scalability of the system, i.e., to develop an animation generation system that can handle various cooking actions. We hope that this can be accomplished through goals G1 and G2.</Paragraph>
      <Paragraph position="3"> In the rest of this paper, Section 3 describes how to define the set of actions to be compiled into the lexicon of cooking actions. This concerns goal G1. Section 4 explains two major features in the lexicon, &amp;quot;action plan&amp;quot;and&amp;quot;ingredient requirement&amp;quot;. The feature ingredient requirement is  tion 6 concludes the paper.</Paragraph>
      <Paragraph position="4"> 3Defining the Set of Basic Actions In this and the following sections, we will explain how to construct the lexicon of cooking actions. The first step in constructing the lexicon is to define the set of basic actions. As mentioned earlier (goal G1 in Subsection 2.3), a large-scale lexicon isrequired forour system. Therefore, the set ofbasic actions should include various kinds of cooking actions.</Paragraph>
    </Section>
    <Section position="4" start_page="772" end_page="773" type="sub_section">
      <SectionTitle>
3.1 Procedure
</SectionTitle>
      <Paragraph position="0"> We referred to three cooking textbooks or manuals (Atsuta, 2004; Fujino, 2003; Takashiro and Kenmizaki, 2004) in Japanese to define the set of basic actions. These books explain the fundamental cooking operations with pictures, e.g., how to cut, roast, or remove skins/seeds for various kinds of ingredients. We extracted the cooking operations explained in these three textbooks, and defined them as the basic actions for the lexicon. In other words, we defined the basic actions according to the cooking textbooks. The reasons why we used the cooking manuals as the standard for the basic actions are summarized as follows:  1. The aim of cooking manuals used here is to comprehensively explain basic cooking operations. Therefore, we expect that we can collect an exhaustive set of basic actions in the cooking domain.</Paragraph>
      <Paragraph position="1"> 2. Cooking manuals are for beginners. The aim of animation generation system is to help people, especially novices, to understand cooking actions in recipes. The lexicon of cooking actions based on the cooking text-books includes many cooking operations that novices may not know well.</Paragraph>
      <Paragraph position="2"> 3. The definition of basic actions does not de- null pend on the module Animation Generator.</Paragraph>
      <Paragraph position="3"> One of the standards for the definition of basic actions is animations generated by the system. That is, we can define basic cooking actions so that each cooking action corresponds to an unique animation. This approach seems to be reasonable for an animation generation system; however, it depends on the module Animation Generator in Figure 1. Many kinds of rendering engines are now available to generate animations. Therefore, Animation Generator can be implemented in various ways. When changing the rendering engine used in Animation Generator, the lexicon of cooking actions must also be changed. So we decided that it would not be desirable to define the set of basic actions according to their corresponding animations.</Paragraph>
      <Paragraph position="4"> In our framework, the definition of basic actions in the lexicon does not depend on Animation Generator. This enables us to use any kind of rendering engine to produce an animation. For example, when we use a poor engine and want to design the system so that it generates the same animation for two or more basic actions, we just describe the same action plan for these actions.</Paragraph>
      <Paragraph position="5"> We manually excerpted 267 basic actions from three cooking textbooks. Although it is just a collection of basic actions, we refer it as the initial  lexicon of cooking actions. Table 1 illustrates several examples of basic actions in the initial lexicon. In the cooking manuals, every cooking operation is illustrated with pictures. &amp;quot;Ingredient examples&amp;quot; indicates ingredients in pictures used to explain cooking actions.</Paragraph>
    </Section>
    <Section position="5" start_page="773" end_page="773" type="sub_section">
      <SectionTitle>
3.2 Preliminary Evaluation
</SectionTitle>
      <Paragraph position="0"> A preliminary experiment was conducted to evaluate the scalability of our initial lexicon of basic actions. The aim of this experiment was to check how many cooking actions appearing in real recipes are included in the initial lexicon.</Paragraph>
      <Paragraph position="1"> First, we collected 200 recipes which are available on web pages  . We refer to this recipe corpus as R a hereafter. Next, we analyzed the sentences in R a and automatically extracted verbal phrases representing cooking actions. We used JUMAN  for word segmentation and part-of-speech tagging, and KNP  for syntactic analysis. Finally, we manually checked whether each extracted verbal phrase could be matched to one of the basic actions in the initial lexicon.</Paragraph>
      <Paragraph position="2"> Table 2 (A) shows the result of our survey. The number of basic actions was 267 (a). Among these actions, 145 (54.3%) actions occurred in R a (a1).</Paragraph>
      <Paragraph position="3"> About half of the actions in the initial lexicon did not occur in the recipe corpus. We guessed that this was because the size of the recipe corpus was not very large.</Paragraph>
      <Paragraph position="4"> The number of verbal phrases in R a was 3977 (b). We classified them into the following five cases: (b1) the verbal phrase corresponded with one of the basic actions in the initial lexicon, and  its linguistic expression was the same as one in the lexicon; (b2) the verbal phrase corresponded with a basic action, but its linguistic expression differed from one in the lexicon; (b3) no corresponding basic action was found in the initial lexicon, (b4) the extracted phrase was not a verbal phrase, caused by error in analysis, (b5) the verbal phrase did not stand for a cooking action. Note that the cases in which verbal phrases should be converted to animations were (b1), (b2) and (b3). The numbers in parentheses (...) indicate the ratio of each case to the total number of verbal phrases, while numbers in square brackets [...] indicate a ratio of each case to the total number of (b1), (b2) and (b3). We expected that the verbal phrases in (b1) and (b2) could be handled by our animation generation system because the initial lexicon contained the corresponding basic actions. On the other hand, our system cannot generate animations for verbal phrases in (b3), which was 42.3% of the verbal phrases our system should handle. Thus the applicability of the initial lexicon was poor.</Paragraph>
    </Section>
    <Section position="6" start_page="773" end_page="774" type="sub_section">
      <SectionTitle>
3.3 Adding Basic Actions from Recipe
Corpus
</SectionTitle>
      <Paragraph position="0"> We have examined what kinds of verbal phrases were in (b3). We found that there were many general verbs, such as &amp;quot;CQ(add)&amp;quot;, &amp;quot;(put in)&amp;quot;, &amp;quot;b(heat)&amp;quot;, &amp;quot; Z(attach)&amp;quot;, &amp;quot;wd (put on)&amp;quot;, etc. Such general actions were not included in the initial lexicon, because we constructed it by extracting basic actions from cooking textbooks, and such general actions are not explained in these books.</Paragraph>
      <Paragraph position="1"> In order to increase the scalability of the lexicon of cooking actions, we selected verbs satisfying the following conditions: (1) no corresponding basic action was found in the lexicon for a verb; (2) a verb occurred more than 10 times in R a . In all, 31 verbs were found and added to the lexicon as new basic actions. It is undesirable to define basic actions in this way, because the lexicon may then depend on aparticular recipe corpus. However, we believe that the new basic actions are very general, and can be regarded as almost independent of with the corpus from which they were extracted.</Paragraph>
      <Paragraph position="2"> In order to evaluate the new lexicon, we prepared another 50 cooking recipes (R b hereafter).</Paragraph>
      <Paragraph position="3"> Then we classified the verbal phrases in R b in the same way as in Subsection 3.2. The results are shown in Table 2 (B). Notice that the ratio</Paragraph>
      <Paragraph position="5"> of the number of verbal phrases contained in the lexicon to the total number of target verb phrases was 94.5% ((b1)62.2% + (b2)31.3%). This is much greater than the ratio in Table 2 (A) (57.7%).</Paragraph>
      <Paragraph position="6"> Therefore, although thesize of test corpus issmall, we hope that the scalability of our lexicon is large enough to generate animations for most of the verbal phrases in cooking recipes.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML