File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0907_intro.xml

Size: 18,357 bytes

Last Modified: 2025-10-06 14:01:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0907">
  <Title>Story understanding through multi-representation model construction</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Story understanding is a fundamental unsolved problem in artificial intelligence and computational linguistics. In order for a computer program to understand a story text, it must be able to make inferences about states and events not explicitly described in the text. To do this it must have knowledge about the world and an ability to reason using this knowledge--in short it must be able to perform commonsense reasoning, itself a fundamental unsolved problem.</Paragraph>
    <Paragraph position="1"> Story understanding has largely been ignored of late.</Paragraph>
    <Paragraph position="2"> We seek to remedy this situation by applying current research on commonsense reasoning to the story understanding problem. In this paper1 we present an implemented model of commonsense reasoning for story understanding that has been applied to the understanding of a children's story.</Paragraph>
    <Paragraph position="3">  We propose that understanding a story consists of building multi-representation models of the states and events described in the story. The representations are concerned with multiple realms such as space, time, needs, and feelings. There may be several representations for a single realm. Space, for example, may be represented at different levels of the spatial semantic hierarchy (Kuipers, 2000) such as topological space and metric space as well as at different levels of granularity such as room-scale and object-scale space. We further propose that models are efficiently constructed using a powerful engine, in particular a satisfiability solver, that operates in conjunction with multiple, rich representations of the commonsense world.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.2 Scope and methodology
</SectionTitle>
      <Paragraph position="0"> We are concerned with in-depth understanding in contrast to information extraction. Since research on common-sense reasoning to date has focused on small benchmark problems, it would be difficult to launch into the problem of in-depth understanding of adult-level stories right away. Instead, we and others have proposed to start by handling children's stories (Hirschman et al., 1999; Mc-Carthy et al., 2002). We have formed a corpus of 15 early reader stories for pre-school and kindergarten students, drawn from the Random House Step into Reading(r) series. In this paper, we treat one of the stories in this corpus. The representations we develop for this story will, we hope, be applicable to the understanding of the remaining 14 stories as well as other early reader stories-though the representations will certainly require elaboration. null Since our primary research focus is on in-depth understanding, we make the simplifying assumption that the narrative text has already been parsed into event calculus formulas (Shanahan, 1997). We manually annotate the narrative text with event calculus formulas, which are similar to the predicate-argument structures produced by semantic parsers (Alshawi, 1992; Beale et al., 1995; Gildea and Jurafsky, 2002). In a complete story understanding program, a semantic parser would feed its surface-level understanding of a story to our program, which would in turn produce a more detailed understanding. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.3 Brief history of story understanding
</SectionTitle>
      <Paragraph position="0"> Starting in the 1960s, a number of programs have been written that are able to read and understand a handful of stories.2 Several programs built in the 1970s were based on the knowledge structures of scripts, plans, and goals (Schank and Abelson, 1977). The BORIS in-depth story understanding program (Dyer, 1983) integrated scripts, plans, and goals with other knowledge structures including emotions, interpersonal relations, spatiotemporal maps, and story themes.</Paragraph>
      <Paragraph position="1"> Starting in the late 1980's, many story understanding researchers, frustrated by the lack of robustness of story understanding programs, shifted their focus from narrow coverage deep understanding to broad coverage shallow understanding or information extraction. It is currently unknown how to produce a deep understanding program with broad coverage. Two routes are apparent: (1) start with a broad coverage shallow understanding program and make it progressively deeper (Riloff (Riloff, 1999) argues for this approach), or (2) start with a narrow coverage deep understanding program and make its coverage progressively broader. In this paper we take the second route.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.4 Model-based story understanding
</SectionTitle>
      <Paragraph position="0"> Cognitive psychologists have argued that the reader of a narrative creates a situation or mental model of the narrative including the goals and personalities of the characters and the physical setting (Bower, 1989). Our earlier story understanding program, ThoughtTreasure (Mueller, 1998), built models of a story consisting of a sequence of time slices, where each time slice is a snapshot of (a) the physical world and (b) the mental world of each story character. The physical world was represented using spatial occupancy arrays and mental states were represented using finite automata.</Paragraph>
      <Paragraph position="1"> In this paper we use the term model in the sense of Tarskian semantics. A model or interpretation of a language maps constant symbols of the language to elements of a domain D, n-ary function symbols to functions from Dn to D, and n-ary predicate symbols to a subset of Dn.</Paragraph>
      <Paragraph position="2"> We confine our attention to finite domains. Time is represented by the integers 0 through a maximum time.</Paragraph>
      <Paragraph position="3"> 2Mueller (Mueller, 2002) provides a more detailed history of story understanding programs.</Paragraph>
      <Paragraph position="4"> A debate over model-based versus proof-based reasoning rages in the fields of artificial intelligence (Levesque, 1986; Davis, 1991) and psychology (Johnson-Laird, 1993; Rips, 1994). The degree to which readers generate inferences and construct mental models during reading is also debated (McKoon and Ratcliff, 1992; Graesser et al., 1994). For the purposes of building and debugging a working story understanding program, the model-based approach has several advantages. First, with a model-based program the consequences of a set of formulas are immediately apparent by inspecting the models. This makes debugging faster than with a proof-based program in which facts are individually considered and proved. Second, model construction may be performed automatically, whereas proof construction often requires human guidance. Third, the process of answering a question about a story is simplified since the program may read the answer directly off the model without having to perform complex reasoning.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.5 Multi-representation story understanding
</SectionTitle>
      <Paragraph position="0"> The view that understanding stories involves multiple representations has been argued by Minsky (Minsky, 1986), who points out that understanding requires knowledge and skills from many realms such as the physical, social, conversational, procedural, sensory, motor, tactile, temporal, economic, and reflective realms. Several previous story understanding programs have used multiple representations. BORIS used 17 types of representation and ThoughtTreasure used five.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.6 Reasoning through satisfiability
</SectionTitle>
      <Paragraph position="0"> Satisfiability solvers take as input a set of boolean variables and a propositional formula over those variables and produce as output zero or more models of the formula--truth assignments for the variables such that the formula is satisfied. Satisfiability solvers may be used to perform a variety of forms of reasoning useful in understanding and answering questions about a story.</Paragraph>
      <Paragraph position="1"> Deduction may be performed in the satisfiability framework by checking that one formula is true in every model of another formula.</Paragraph>
      <Paragraph position="2"> Story understanding has been viewed as an abductive task (Charniak and McDermott, 1985; Hobbs et al., 1993). A satisfiability solver may be used to perform abduction for story understanding by providing the stated information as input to the solver and allowing the solver to find models that include the stated information as well as the unstated information.</Paragraph>
      <Paragraph position="3"> Story understanding tasks such as predicting next events (McKoon and Ratcliff, 1986) require projection.</Paragraph>
      <Paragraph position="4"> A satisfiability solver may be used to perform projection by asserting the initial states and events and allowing the solver to find models of the ensuing states and events.</Paragraph>
      <Paragraph position="5"> Planning consists of taking an initial state and a goal state, and producing a sequence of events such that the goal state results from those events. Kautz and Selman (Kautz and Selman, 1996) have demonstrated the efficiency of planning using satisfiability.</Paragraph>
    </Section>
    <Section position="6" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.7 Satisfiability versus multi-agent systems for
</SectionTitle>
      <Paragraph position="0"> model construction Several previous story understanding programs have used multi-agent systems to build representations. Charniak's early story understanding program (Charniak, 1972) used agents called demons to generate inferences. BORIS used demons to build representations as it parsed a story from left to right.</Paragraph>
      <Paragraph position="1"> Our previous story understanding program ThoughtTreasure used a multi-agent system in which different understanding agents were responsible for maintaining different components of the model while processing a story. The understanding agents interacted with each other in order to decide on a suitable update to the model. Because of the many potential interactions, the understanding agents proved difficult for the programmer to write, maintain, and extend.</Paragraph>
      <Paragraph position="2"> In the present work, instead of attempting to hand code a collection of agents to build models, we use a powerful solution engine to build models automatically given representations of commonsense knowledge.</Paragraph>
    </Section>
    <Section position="7" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.8 The event calculus
</SectionTitle>
      <Paragraph position="0"> We have chosen to express our representations for story understanding in the version of Shanahan's circumscriptive event calculus that uses forced separation (Shanahan, 1997). This language is an extension of many-sorted first-order predicate calculus with explicit time and can be used to express diverse representations. The event calculus predicates important for this paper are as follows: Happens(e;t) represents that an event e happens at time t.</Paragraph>
      <Paragraph position="1"> HoldsAt(f;t) represents that a fluent f holds at time t.</Paragraph>
      <Paragraph position="2"> Initiates(e;f;t) represents that if event e occurs at t then fluent f starts holding after t.</Paragraph>
      <Paragraph position="3"> Terminates(e;f;t) represents that if event e occurs at t then fluent f stops holding after t.</Paragraph>
      <Paragraph position="4"> Reasoning using the event calculus is carried out as follows: If 1 and 2 are conjunctions of Happens and temporal ordering formulas, is a conjunction of Initiates, Terminates, and Releases axioms, is the conjunction of the event calculus axioms ECF1 to ECF5 (Shanahan, 1997), is a conjunction of state constraints, is a conjunction of trajectory axioms, is a conjunction of uniqueness-of-names axioms, and is a conjunction of HoldsAt formulas, then we are interested in the following: null</Paragraph>
      <Paragraph position="6"> Deduction and projection are performed by taking 1, 2, , , , , and as input, and producing as output. Abduction and planning are performed by taking 1, , , , , , and as input, and producing 2 as output.</Paragraph>
    </Section>
    <Section position="8" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.9 The story understanding program
</SectionTitle>
      <Paragraph position="0"> Our story understanding program operates as follows: The main program takes the event calculus narrative and axiomatization as input, formulates deductive or abductive reasoning problems, and sends them to the satisfiability encoder. The satisfiability encoder sends encoded problems back to the main program. The main program sends encoded problems to the satisfiability solver. The satisfiability solver sends solutions to problems back to the main program, which produces models as output. The main program consists of 6332 lines of Python and Java code. The satisfiability encoder consists of 3658 lines of C code. The program uses off-the-shelf satisfiability solvers.</Paragraph>
      <Paragraph position="1"> More specifically, the event calculus narrative provided as input consists of: annotation of the story sentences as Happens and HoldsAt formulas, the structure of room-scale topological space, and (optionally, to reduce the number of models) initial and intermediate events and fluents, represented by Happens and HoldsAt formulas.</Paragraph>
      <Paragraph position="2"> Coreference annotation must be performed on the story sentences, so that unique story entities such as actors and physical objects are represented by unique constants in the above formulas.</Paragraph>
      <Paragraph position="3">  The story handled by our program is an adaptation for early readers of the children's story &amp;quot;The Snowman&amp;quot; by Raymond Briggs.</Paragraph>
      <Paragraph position="4"> It is not yet possible to process the entire Snowman story as a single satisfiability problem--the problem does not fit in memory. We therefore break the story into several segments, where each segment contains one or more time points and each segment follows the previous segment in story time. The following shows how we have divided the Snowman story into segments SNOWMAN1 through SNOWMAN8, along with the manual event calculus annotation of the sentences: SNOWMAN1: This segment models the falling of individual snowflakes.  He makes a pile of snow.</Paragraph>
      <Paragraph position="5"> Happens(HoldSome(James, Snowball1, Snow1), 12) He makes it bigger and bigger.</Paragraph>
      <Paragraph position="6"> Happens(RollAlong(James, Snowball1, Snow1), 13) He puts a big snowball on top.</Paragraph>
      <Paragraph position="7"> Happens(PlaceOn(James, Snowball2, Snowball1), 17) SNOWMAN3: This segment models James going into the house to get a scarf, hat, and orange.</Paragraph>
      <Paragraph position="8"> SNOWMAN4: He adds a scarf and a hat.</Paragraph>
      <Paragraph position="9">  But the snowman has gone.</Paragraph>
      <Paragraph position="10"> 1.11 Remainder of the paper In Section 2, we discuss our method for transforming event calculus reasoning problems into satisfiability problems. In Section 3, we discuss our multi-representation axiomatization of the commonsense knowledge needed to understand the Snowman story. In Section 4, we discuss the processing of the Snowman story by our program using the axiomatization. We conclude with future work. 2 A satisfiability encoding of the event calculus We have implemented a method for encoding event calculus problems in propositional conjunctive normal form, which enables them to be solved using an off-the-shelf satisfiability solver.</Paragraph>
      <Paragraph position="11"> Solving event calculus problems using satisfiability solvers has several advantages over solving those problems using other methods. First, satisfiability solvers are faster at solving event calculus planning problems than planners based on abductive logic programming (Shanahan, 2000; Shanahan and Witkowski, 2002). Second, solving event calculus problems using theorem proving requires computation of circumscription. The rules for computing circumscription are complicated in general (Lifschitz, 1994). One rule is given by Proposition 2 of Lifschitz, which reduces circumscription to predicate completion: IfF(x) does not containP, then the circumscription CIRC[8xF(x))P(x);P] is equivalent to 8xF(x),P(x) Many cases of circumscription in the event calculus reduce directly to simple predicate completion using Proposition 2, but some do not. Notably the circumscription of Happens (= P) in a disjunctive event axiom or compound event axiom (= 8xF(x) )P(x)) cannot be achieved using Proposition 2 because F(x) does contain Happens in those axioms.</Paragraph>
      <Paragraph position="12"> Our encoding method handles a larger subset of the event calculus than the method previously proposed (Shanahan and Witkowski, 2002). The method of Shanahan and Witkowski separately maps into conjunctive normal form each type of event calculus axiom such as effect axioms and precondition axioms. Our encoding method maps arbitrary axioms to conjunctive normal form by applying syntactic transformations. The generality of our method enables it to handle a larger subset of the event calculus. Table 1 provides a comparison of the coverage of the two encodings. Both methods use explanation closure frame axioms (Haas, 1987) to cope with the frame problem instead of circumscription. In our method the frame axioms are extended to allow fluents to be released from the commonsense law of inertia. Neither  method handles continuous time--both support discrete time. Due to space limitations, the complete encoding method cannot be presented here.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML