Temporal Discourse Models for Narrative Structure 
Inderjeet MANI 
Department of Linguistics 
Georgetown University 
ICC 452 
Washington, DC 20057 
im5@georgetown.edu 
James PUSTEJOVSKY 
Department of Computer Science 
Brandeis University 
Volen 258 
Waltham, Massachusetts 02254 
jamesp@cs.brandeis.edu 
 
Abstract 
Getting a machine to understand human 
narratives has been a classic challenge for 
NLP and AI. This paper proposes a new 
representation for the temporal structure of 
narratives. The representation is parsimonious, 
using temporal relations as surrogates for 
discourse relations. The narrative models, 
called Temporal Discourse Models, are tree-
structured, where nodes include abstract 
events interpreted as pairs of time points and 
where the dominance relation is expressed by 
temporal inclusion. Annotation examples and 
challenges are discussed, along with a report 
on progress to date in creating annotated 
corpora. 
1 Introduction 
Getting a machine to understand human narratives 
has been a classic challenge for NLP and AI. 
Central to all narratives is the notion of time and 
the unfolding of events. When we understand a 
story, in addition to understanding other aspects 
such as plot, characters, goals, etc., we are able to 
understand the order of happening of events. A 
given text may have multiple stories; when we 
understand such a text, we are able to tease apart 
these distinct stories. Thus, understanding the story 
from a text involves building a global model of the 
sequences of events in the text, as well as the 
structure of nested stories. We refer to such models 
as Temporal Discourse Models (TDMs).  
 Currently, while we have informal 
descriptions of the structure of narratives, e.g., 
(Bell 1999), we lack a precise understanding of 
this aspect of discourse. What sorts of structural 
configurations are observed? What formal 
characteristics do they have? For syntactic 
processing of natural languages, we have, 
arguably, answers to similar questions. However, 
for discourse, we have hardly begun to ask the 
questions. 
 One of the problems here is that most of 
the information about narrative structure is implicit  
in the text. Thus, while linguistic information in 
the form of tense, aspect, temporal adverbials and 
discourse markers is often present, people use 
commonsense knowledge to fill in information. 
Consider a simple discourse: Yesterday Holly was 
running a marathon when she twisted her ankle. 
David had pushed her. Here, aspectual information 
indicates that the twisting occurred during the 
running, while tense suggests that the pushing 
occurs before the twisting. Commonsense 
knowledge also suggests that the pushing caused 
the twisting.  
 We can see that even for interpreting such 
relatively simple discourses, a system might 
require a variety of sources of linguistic 
knowledge, including knowledge of tense, aspect, 
temporal adverbials, discourse relations, as well as 
background knowledge. Of course, other 
inferences are clearly possible, e.g., that the 
running stopped after the twisting, but when 
viewed as defaults, these latter inferences seem to 
be more easily violated. The need for 
commonsense inferences has motivated 
computational approaches that are domain-
specific, using hand-coded knowledge (e.g., Asher 
and Lascarides 2003, Hitzeman et al. 1995).  
 A number of theories have postulated the 
existence of various discourse relations that relate 
elements in the text to produce a global model of 
discourse, e.g., (Mann and Thompson 1988), 
(Hobbs 1985), (Hovy 1990) and others. In RST 
(Mann and Thompson 1988), (Marcu 2000), these 
relations are ultimately between semantic elements 
corresponding to discourse units that can be simple 
sentences or clauses as well as entire discourses. In 
SDRT (Asher and Lascarides 2003), these relations 
are between representations of propositional 
content, called Discourse Representation 
Structures (Kamp and Reyle, 1993).  
 Despite a considerable amount of very 
productive research, annotating such discourse 
relations has proved problematic. This is due to the 
fact that discourse markers may be absent (i.e., 
implicit) or ambiguous; but more importantly, 
because in many cases the precise nature of these 
discourse relations is unclear. Although (Marcu et 
 In addition to T1, we also have the 
temporal ordering constraints C1:  {Eb < Ec, Ec < 
Ea, Ea < Ed}. These are represented separately 
from the tree. A TDM is thus a pairing of tree 
structures and temporal constraints. More 
precisely, a Temporal Discourse Model for a text is 
a pair <T, C>, where T is a rooted, unordered, 
directed tree with nodes N = E ∪ A, where E is 
the set of events mentioned in the text and A is a 
set of abstract events, and a parent-child ordering 
relation, ⊆ (temporal inclusion). A non-leaf node 
can be textually mentioned or abstract. Nodes also 
have a set of atomic-valued features. Note that the 
tree is temporally unordered left to right. C is a set 
of temporal ordering constraints using the ordering 
relation, < (temporal precedence) as well as (for 
states, clarified below) ‘minimal restrictions’ on 
the above temporal inclusion relation (expressed as 
a ⊆
min
).  
al. 1999) (Carlson et al. 2001) reported relatively 
high levels of inter-annotator agreement, this was 
based on an annotation procedure where the 
annotators were allowed to iteratively revise the 
instructions based on joint discussion.  
 While we appreciate the importance of 
representing rhetorical relations in order to carry 
out temporal inferences about event ordering, we 
believe that there are substantial advantages in 
isolating the temporal aspects and modeling them 
separately as TDMs. This greatly simplifies the 
representation, which we discuss next.   
2 Temporal Discourse Models 
A TDM is a tree-structured syntactic model of 
global discourse structure, where temporal 
relations are used as surrogates for discourse 
relations, and where abstract events corresponding 
to entire discourses are introduced as nodes in the 
tree.  
 In (1) the embedding nodes E0 and E1 
were abstract, but textually mentioned events can 
also create embeddings, as in (2) (example from 
(Spejewski 1988)): 
 We begin by illustrating the basic 
intuition. Consider discourse (1), from (Webber 
1988): 
(2) a. Edmond made his own Christmas 
presents this year. b. First he dried a bunch 
of tomatoes in his oven. c. Then he made a 
booklet of recipes that use dried tomatoes. d. 
He scanned in the recipes from his gourmet 
magazines. e. He gave these gifts to his 
family. 
(1) a. John went into the florist shop.  
b. He had promised Mary some flowers.  
c. She said she wouldn’t forgive him if he 
forgot. d. So he picked out three red roses. 
 The discourse structure of (1) can be 
represented by the tree, T1, shown below.  
 
      T2 =  
                E0 
              E0 
 
 
 
                Ea                Ee 
              Ea         E1                      Ed 
 
 
               Eb    Ec 
 
  
                 
                     Ed         
               Eb       Ec 
             
 Here E0 has children Ea, E1, and Ed, and 
E1 has children Eb and Ec. The nodes with 
alphabetic subscripts are events mentioned in the 
text, whereas nodes with numeric subscripts are 
abstract events, i.e., events that represent abstract 
discourse objects. A node X is a child of node Y iff 
X is temporally included in Y. In our scheme, 
events are represented as pairs of time points. So, 
E0 is an abstract node representing a top-level 
story, and E1 is an abstract node representing an 
embedded story. Note that the mentioned events 
are ordered left to right in text order for notational 
convenience, but no temporal ordering is directly 
represented in the tree. Since the nodes in this 
representation are at a semantic level, the tree 
structure is not necessarily isomorphic to a 
representation at the text level, although T1 
happens to be isomorphic.  
      C2 = {Ea < Ee, Eb < Ec} 
 Note that the partial ordering C can be 
extended using T and temporal closure axioms 
(Setzer and Gaizauskas 2001), (Verhagen 2004), so 
that in the case of <T2, C2>, we can infer, for 
example, that Eb < Ed, Ed < Ee, and so forth.  
 In representing states, we take a 
conservative approach to the problems of 
ramification and change (McCarthy and Hayes 
1969). This is the classic problem of recognizing 
when states (the effects of actions) change as a 
result of actions. Any tensed stative predicate will 
be represented as a node in the tree (progressives 
are here treated as stative). Consider an example 
like John walked home. He was feeling great. 
Here we represent the state of feeling great as 
being minimally a part of the event of walking, 
without committing to whether it extends before or 
after the event. While this is interpreted as an 
overloaded temporal inclusion in the TDM tree, a 
constraint is added to C indicating that this 
inclusion is minimal.  
 This conservative approach results in 
logical incompleteness, however. For example, 
given the discourse Max entered the room. He was 
wearing a black shirt, the system will not know 
whether the shirt was worn after he entered the 
room. States are represented as bounded intervals, 
and participate in ordering relations with events in 
the tree. It is clear that in many cases, a state 
should persist throughout the interval spanning 
subsequent events. This is not captured by the 
current tree representation. Opposition structures 
of predicates and gating operations over properties 
can be expressed as constraints introduced by 
events, however, but at this stage of development, 
we have been interested in capturing a coarser 
temporal ordering representation, very robustly. 
We believe, however, that annotation using the 
minimal inclusion relation will allow us to reason 
about persistence heuristically in the future.  
3 Prerequisites 
Prior work on temporal information extraction has 
been fairly extensive and is covered in (Mani et al. 
2004). Recent research has developed the TimeML 
annotation scheme (Pustejovsky et al. 2002) 
(Pustejovsky et al. 2004), as well as a corpus of 
TimeML-annotated news stories (TimeBank 2004) 
and annotation tools that go along with it, such as 
the TANGO tool (Pustejovsky et al. 2003). 
TimeML flags tensed verbs, adjectives, and 
nominals that correspond to events and states, 
tagging instances of them with standard TimeML 
attributes, including the class of event (perception, 
reporting, aspectual, state, etc.), tense (past, 
present, future), grammatical aspect (perfective, 
progressive, or both), whether it is negated, any 
modal operators which govern it, and its 
cardinality if the event occurs more than once. 
Likewise, time expressions are flagged, and their 
values normalized, so that Thursday in He left on 
Thursday would get a resolved ISO time value 
depending on context  (TIMEX2 2004). Finally, 
temporal relations between events and time 
expressions (e.g., that the leaving occurs during 
Thursday) are recorded by means of temporal links 
(TLINKs) that express Allen-style interval 
relations (Allen 1984).  
 Several automatic tools have been 
developed in conjunction with TimeML, including 
event taggers (Pustejovsky et al. 2003), time 
expression taggers (Mani and Wilson 2000), and 
an exploratory link extractor (Mani et al. 2003). 
Temporal reasoning algorithms have also been 
developed, that apply transitivity axioms to expand 
the links using temporal closure algorithms (Setzer 
and Gaizauskas 2001), (Pustejovsky et al. 2003).  
 However, TimeML is inadequate as a 
temporal model of discourse: it constructs no 
global representation of the narrative structure, 
instead annotating a complex graph that links 
primitive events and times. 
4 Related Frameworks 
Since the relations in TDMs involve temporal 
inclusion and temporal ordering, the mentioned 
events can naturally be mapped to other discourse 
representations used in computational linguistics. 
A TDM tree can be converted to a first-order 
temporal logic representation (where temporal 
ordering and inclusion operators are added) by 
expanding the properties of the nodes. These 
properties include any additional predications 
made explicitly about the event, e.g., information 
from thematic arguments and adjuncts. In other 
words, a full predicate argument representation, 
e.g., as might be found in the PropBank 
(Kingsbury and Palmer 2002), can be associated 
with each node.  
 TDMs can also be mapped to Discourse 
Representation Structures (DRS) (which in turn 
can be mapped to a logical form). Since TDMs 
represent events as pairs of time points (which can 
be viewed as intervals), and DRT represents events 
as primitives, we can reintroduce time intervals 
based on the standard DRT approach (e ⊆ t for 
events,  e O t for states, except for present tense 
states, where t ⊆ e).  
 Consider an example from the Discourse 
Representation Theory (DRT) literature (from 
Kamp and Reyle 1993): 
(3) a. A man entered the White Hart. b. He 
was wearing a black jacket. c. Bill served 
him a beer.  
  The TDM is <T3, C3> below, with 
internal properties of the nodes as shown: 
 
T3 =      E0 
 
             Ea          Ec 
                
     
         Eb 
 
C3 = {Ea < Ec} 
node.properties(Ea): enter(Ea, x, y), 
man(x), y= theWhiteHart, Ea < n 
node.properties(Eb): PROG(wear(Eb, x1, 
y1)), black-jacket(y1), x1=x, Eb < n,  
node.properties(Ec): serve(Ec, x2, y2, z), 
beer(z), x2=Bill, y2=x, Ec < n 
From T3: Eb ⊆ Ea 
From C3: Ea < Ec 
 The DRT representation is shown below 
 (here we have created variables for the 
 reference times): 
 
  
 
 
 
 
 
  
 
 
  
 
  
 Note that we are by no means claiming 
that DRSs and TDMs are equivalent. TDMs are 
tree-structured and DRSs are not, and the inclusion 
relations involving our abstract events, i.e., Ea ⊆ 
E0 and Ec ⊆ E0, are not usually represented in 
DRT. Nevertheless, there are many similarities 
between TDMs and DRT which are worth 
examining for semantic and computational 
properties. Furthermore, SDRT (Asher and 
Lascarides 2003) extends DRT to include 
discourse relations. SDRT and RST both differ 
fundamentally from TDMs, since we dispense with 
rhetorical relations.  
 It should be pointed out, nevertheless, that 
TDMs, as modeled so far, do not represent 
modality and intensional contexts in the tree 
structure. (However, information about modality 
and negation is stored in the nodes based on 
TimeML preprocessing).  One way of addressing 
this issue is to handle lexically derived modal 
subordination (such as believe and want) by 
introducing embedded events, linked to the modal 
predicate by subordinating relations. For example, 
in the sentence John believed that Mary graduated 
from Harvard, the complement event is 
represented as a subtree linked by a lexical 
relation. 
 DLTAG (Webber et al. 2004) is a model 
of discourse structure where explicit or implicit 
discourse markers relating only primitive discourse 
units. Unlike TDMs, where the nodes in the tree 
can contain embedded structures, DLTAG is a 
local model of discourse structure; it thus provides 
a set of binary relations, rather than a tree Like 
TDMs, however, DLTAG models discourse 
structure without postulating the existence of 
rhetorical relations in the discourse tree. Instead, 
the rhetorical relations appear as predicates in the 
semantic forms for discourse markers. In this 
respect, they differ from TDMs, which do not 
commit to specific rhetorical relations.   
 Spejewski (1994) developed a tree-based 
model of the temporal structure of a sequence of 
sentences. Her approach is based on relations of 
temporal coordination and subordination, and is 
thus a major motivation for our own approach. 
However, her approach mixes both reference times 
and events in the same representation, so that the 
parent-child relation sometimes represents 
temporal anchoring, and at other times 
coordination. In the above example of John walked 
home. He was feeling great, her approach would 
represent the “reference time” of the state (of 
feeling great) as being part of the event of walking 
as well as part of the state, resulting in a graph 
rather than a strict tree. Note that our approach 
uses minimality. 
Ea, x, y , Eb, x1, y1, Ec, x2, y2, z, 
t1, t2, t3 
enter(Ea, x, y), man(x), y= 
theWhiteHart 
PROG(wear(Eb, x1, y1)), black-
jacket(y1), x1=x 
serve(Ec, x2, y2, z), beer(z), 
x2=Bill, y2=x 
t1 < n, Ea ⊆ t1, t2 < n, Eb ο  t2, Eb 
⊆ Ea, t3 < n, Ec ⊆ t3, Ea < Ec 
 (Hitzeman et al. 1995) developed a 
computational approach to distinguish various 
temporal threads in discourse. The idea here, based 
on the notion of temporal centering, is that there is 
one ‘thread’ that the discourse is currently 
following. Thus, in (1) above, each utterance is 
associated with exactly one of two threads: (i) 
going into the florist’s shop and (ii) interacting 
with Mary. Hitzeman et al. prefer an utterance to 
continue a current thread which has the same tense 
or is semantically related to it, so that in (1) above, 
utterance d would continue the thread (i) above 
based on tense. In place of world knowledge, 
however, semantic distance between utterances is 
used, presumably based on lexical relationships. 
Whether such semantic similarity is effective is a 
matter for evaluation, which is not discussed in 
their paper. For example, it isn’t clear what would 
rule out (1c) as continuing thread (i). 
 While TDMs do not commit to rhetorical 
relations, our expectation is that they can be used 
as an intermediate representation for rhetorical 
parsing. Thus, when event A in a TDM temporally 
precedes its right sibling B, the rhetorical relation 
of Narration will typically be inferred. When B 
precedes is left sibling A, then Explanation will 
typically be inferred. When A temporally includes 
a child node B, then Elaboration is typically 
inferred, etc. TDMs are thus a useful shallow 
representation that can be a useful first step in 
deriving rhetorical relations; indeed, rhetorical 
relations may be implicit in the human annotation 
of such relations, e.g., when explicit discourse 
markers like “because” indicate a particular 
temporal order. 
5 Annotation Scheme  
The annotation scheme involves taking each 
document that has been preprocessed with time 
expressions and event tags (complying with 
TimeML) and then representing TDM parse trees 
and temporal ordering constraints (the latter also 
compliant with TimeML TLINKS). 
 Each discourse begins with a root abstract 
node. As an annotation convention, (A1) in the 
absence of any overt or covert discourse markers 
or temporal adverbials, a tense shift will license the 
creation of an abstract node, with the event with 
the shifted tense being the leftmost daughter of the 
abstract node. The abstract node will then be 
inserted as the child of the immediately preceding 
text node. In addition, convention (A2) states that 
in the absence of temporal adverbials and overt or 
covert discourse markers, a stative event will 
always be placed as a child of the immediately 
preceding text event when the latter is non-stative. 
Further, convention (A3) states that when the 
previous event is stative, in the absence of 
temporal adverbials and explicit or implicit 
discourse markers, the stative event is a sibling of 
the previous stative (as in a scene-setting fragment 
of discourse). 
 We expect that inter-annotator reliability 
on TDM trees will be quite high, given the 
transparent nature of the tree structure along with 
clear annotation conventions. The Appendices 
provide examples of annotation, to illustrate the 
simplicity of the scheme as well as potential 
problems. 
6  Corpora  
We have begun annotating three corpora with 
Temporal Discourse Model information. The first 
is the Remedia corpus (remedia.com). There are 
115 documents in total, grouped into four reading 
levels, all of which have been tagged by a human 
for time expressions in a separate project by Lisa 
Ferro at MITRE. Each document is short, about 
237 words on average, and has a small number of 
questions after it for reading comprehension.   
 The Brandeis Reading Corpus is a 
collection of 100 K-8 Reading Comprehension 
articles, mined from the web and categorized by 
level of comprehension difficulty. Articles range 
from 50-350 words in length. Complexity of the 
reading task is defined in terms of five basic 
classes of reading difficulty. 
 The last is the Canadian Broadcasting 
Corporation (cbc4kids.ca). The materials are 
current-event stories aimed at an audience of 8-
year-old to 13-year-old students. The stories are 
short (average length around 450 words). More 
than a thousand articles are available. The 
CBC4Kids corpus is already annotated with POS 
and parse tree markup.  
7 Conclusion  
Our assumption so far has been that the temporal 
structure of narratives is tree-structured and 
context-free. Whether the context-free property is 
violated or not remains to be seen.  
 Once the annotation effort is completed, 
we plan to use the annotated corpora in statistical 
parsing algorithms to construct TDMs. This should 
allow features from the corpus to be leveraged 
together to make inferences about narrative 
structure. While such knowledge source 
combination is not by any means guaranteed to 
substitute for commonsense knowledge, it at least 
allows for the introduction of generic, machine 
learning methods for extracting narrative structure 
from stories in any domain. Earlier work in a non-
corpus based (Hitzeman et al. 1995) as well as 
corpus-based setting (Mani et al. 2003) attests to 
the usefulness of combining knowledge sources for 
inferring temporal relations. We expect to leverage 
similar methods in TDM parsing. 
 We believe that the temporal aspect of 
discourse provides a handle for investigating 
discourse structure, thereby simplifying the 
problem of discourse structure annotation. It is 
therefore of considerable theoretical interest. 
Further, being able to understand the structure of 
narratives will in turn allow us to summarize them 
and answer temporal questions about narrative 
structure. 

References 
J. F. Allen. 1984. Towards a General Theory of 
Action and Time.  Artificial Intelligence 23: 
123-154. 
N. Asher and A. Lascarides. 2003. Logics of 
Conversation. Cambridge University Press. 
A. Bell. 1999. News Stories as Narratives. In A. 
Jaworski and N. Coupland, The Discourse 
Reader, Routledge, London and New York, 236-
251. 
L. Carlson, D. Marcu and M. E. Okurowski. 2001. 
Building a discourse-tagged corpus in the 
framework of rhetorical structure theory. In 
Proceedings of the 2nd SIGDIAL Workshop on 
Discourse and Dialogue, Eurospeech 2001, 
Aalborg, Denmark. 
B. Grosz, A. Joshi and S. Weinstein. 1995. 
Centering: A Framework for Modeling the 
Local Coherence of Discourse. Computational 
Linguistics 2(21), pp. 203-225  
J. Hitzeman, M. Moens and C. Grover. 1995. 
Algorithms for Analyzing the Temporal 
Structure of Discourse. In Proceedings of the 
Annual Meeting of the European Chapter of the 
Association for Computational Linguistics, 
Utrecht, Netherlands, 1995, 253-260. 
J. Hobbs. 1985. On the Coherence and Structure of 
Discourse. Report No. CSLI-85-37. Stanford, 
California: Center for the Study of Language 
and Information, Stanford University. 
E. Hovy. 1990. Parsimonious and Profligate 
Approaches to the Question of Discourse 
Structure Relations. In Proceedings of the Fifth 
International Workshop on Natural Language 
Generation.  
H. Kamp and U. Reyle. 1993. Tense and Aspect. 
Part 2, Chapter 5 of From Discourse to Logic, 
483-546. 
P. Kingsbury and M. Palmer. 2002. From 
Treebank to PropBank. In Proceedings of the 
3rd International Conference on Language 
Resources and Evaluation (LREC-2002), Las 
Palmas, Spain.  
I.. Mani, B. Schiffman and J. Zhang. 2003. 
Inferring Temporal Ordering of Events in News. 
Proceedings of the Human Language 
Technology Conference, HLT’03. 
I. Mani and G. Wilson. 2000.  Processing of News. 
Proceedings of the 38th Annual Meeting of the 
Association for Computational Linguistics 
(ACL'2000), 69-76.   
I. Mani, J. Pustejovsky and R. Gaizauskas. 2004. 
The Language of Time: A Reader. Oxford 
University Press, to appear. 
W. Mann and S. Thompson. 1988. Rhetorical 
structure theory: Toward a functional theory of 
text organization. Text, 8(3): 243-281.  
D. Marcu. 2000. The Theory and Practice of 
Discourse Parsing and Summarization. The 
MIT Press. 
D. Marcu, E. Amorrortu and M. Romera. 1999. 
Experiments in constructing a corpus of 
discourse trees. In Proceedings of the ACL 
Workshop on Standards and Tools for Discourse 
Tagging, College Park, MD, 48-57.  
J. McCarthy and P. Hayes. 1969. Some 
philosophical problems from the standpoint of 
artificial intelligence. In B.Meltzer and D. 
Michie, Eds. Machine Intelligence 4.  
J. Pustejovsky, B. Ingria, R. Sauri, J. Castano, J. 
Littman, R. Gaizauskas, A. Setzer, G. Katz and 
I. Mani. 2004. The Specification Language 
TimeML. In I. Mani, J. Pustejovsky and R. 
Gaizauskas. The Language of Time: A Reader. 
Oxford University Press, to appear. 
A. Setzer and R. Gaizauskas.  2001. A Pilot Study 
on Annotating Temporal Relations in Text. ACL 
2001, Workshop on Temporal and Spatial 
Information Processing  
B. Spejewski. 1994. Temporal Subordination in 
Discourse. .Ph.D. Thesis, University of 
Rochester. 
J. Pustejovsky, I. Mani, L. Belanger, B. Boguraev, 
B. Knippen, J. Littman, A. Rumshisky, A. See, 
S. Symonenko, J. Van Guilder, L. Van Guilder, 
M. Verhagen, R. Ingria. 2003. TANGO Final 
Report. timeml.org. 
 J. Pustejovsky, L. Belanger, J. Castano, R. 
Gaizauskas, P. Hanks, R. Ingria, G. Katz, D. 
Radev, A. Rumshisky, A. Sanfilippo, R. Sauri, 
B. Sundheim, M. Verhagen. 2002. TERQAS 
Final Report. timeml.org. 
TIMEBANK. 2004. timeml.org.  
TIMEX2. 2004. timex2.mitre.org. 
B. Webber. 1998. Tense as Discourse Anaphor. 
Computational Linguistics 14(2): 61-73. 
B. Webber, M. Stone, A. Joshi and A. Knott. 2003. 
Computational Linguistics, 29:4, 545-588. 
