Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 45–52,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
A Framework for Annotating Information Structure in Discourse
Sasha Calhouna0 , Malvina Nissima0 , Mark Steedmana0 and Jason Breniera1
a0 Institute for Communicating and Collaborative Systems, University of Edinburgh, UK
Sasha.Calhoun@ed.ac.uk, a2steedman,mnissima3@inf.ed.ac.uk
a1 Department of Linguistics, University of Colorado at Boulder
jbrenier@colorado.edu
Abstract
We present a framework for the integrated
analysis of the textual and prosodic char-
acteristics of information structure in the
Switchboard corpus of conversational En-
glish. Information structure describes the
availability, organisation and salience of
entities in a discourse model. We present
standards for the annotation of informa-
tion status (old, mediated and new), and
give guidelines for annotating informa-
tion structure, i.e. theme/rheme and back-
ground/kontrast. We show that informa-
tion structure in English can only be anal-
ysed concurrently with prosodic promi-
nence and phrasing. This annotation, us-
ing stand-off XML in NXT, can help es-
tablish standards for the annotation of in-
formation structure in discourse.
1 Introduction
We present a framework for the integrated analysis
of the textual and prosodic characteristics of infor-
mation structure in a corpus of conversational En-
glish. Section 2 introduces the corpus as well as the
tools we employ in the annotation process. We pro-
pose two complementary annotation efforts within
this framework. The  rst, information status (old,
mediated, new), expresses the availability of entities
in discourse (Section 3). The second scheme will
 rstly annotate theme/rheme, i.e. how each intona-
tion phrase is organised in the discourse model, and
secondly kontrast: how salient the speaker wishes
to make each entity, property or relation (Section 4).
We will demonstrate that the perception of both of
these is intimately affected by prosodic structure. In
particular, the theme/rheme division affects prosodic
phrasing; and information status and kontrast affect
relative prosodic prominence. Therefore we also
propose to annotate a subset of the corpus for this
prosodic information (Section 5). In conjunction
with existing annotations of the corpus, our inte-
grated framework using NXT will be unique in the
 eld of conversational speech in terms of size and
richness of annotation.
2 Corpus and Tools
The Switchboard Corpus (Godfrey et al., 1992) con-
sists of 2430 spontaneous phone conversations (av-
erage six minutes), between speakers of American
English, for three million words. The corpus is
distributed as stereo speech signals with an ortho-
graphic transcription per channel time-stamped at
the word level. A third of this is syntactically parsed
as part of the Penn Treebank (Marcus et al., 1993)
and has dialog act annotation (Shriberg et al., 1998).
We used a subset of this. In adherence with current
standards, we converted all the existing annotations,
and are producing the new discourse annotations in
a coherent multi-layered XML-conformant schema,
using NXT technology (Carletta et al., 2004).1 This
allows us to search over and integrate information
from the many layers of annotation, including the
1Beside the NXT tools, we also used the TIGER Switch-
board  lter (Mengel and Lezius, 2000) for the XML-
conversion. Using existing markup we automatically selected
and  ltered NPs to be annotated, excluding locative, directional,
and adverbial NPs and dis uencies, and adding possessive pro-
nouns. See (Nissim et al., 2004) for technical details.
45
sound  les. NXT tools can be easily customised
to accommodate different layers of annotation users
want to add, including data sets that have low-level
annotations time-stamped against a set of synchro-
nized signals, multiple, crossing tree structures, and
connection to external corpus resources such as ges-
ture ontologies and lexicons (Carletta et al., 2004).
3 Information Status
Information Status describes how available an en-
tity is in the discourse. We de ne this in terms of
the speaker’s assumptions about the hearer’s knowl-
edge/beliefs, and we express it by the well-known
old/new distinction.2
3.1 Annotation Scheme
Our annotation scheme for the discourse layer
mainly builds on (Prince, 1992) and (Eckert and
Strube, 2001), as well as on related work on
annotation of anaphoric links (Passonneau, 1996;
Hirschman and Chinchor, 1997; Davies et al., 1998;
Poesio, 2000). Prince de nes  old and  new with
respect to the discourse model as well as the hearer’s
point of view. Considering the interaction of both
these aspects, we de ne as new an entity which has
not been previously referred to and is yet unknown
to the hearer, and as mediated an entity that is newly
mentioned in the dialogue but that the hearer can in-
fer from the prior context.3 This is mainly the case
of generally known entities (such as  the sun , or
 the Pope (Lcurrency1obner, 1985)), and bridging (Clark,
1975), where an entity is related to a previously in-
troduced one. Whenever an entity is not new nor
mediated is considered as old.
Because  ner-grained distinctions (e.g. (Prince,
1981; Lambrecht, 1994)) have proved hard to distin-
guish reliably in practice, we organise our scheme
hierarchically: we use the three main classes de-
scribed above as top level categories for which more
speci c subtypes can assigned. This approach pre-
serves a high-level, more reliable distinction while
allowing a  ner-grained classi cation that can be ex-
ploited for speci c tasks.
Besides the main categories, we introduce two
more classes. A category non-applicable is used for
2We follow Prince in using  old rather than  given to refer
to  not-new information, but regard the two as identical.
3This type corresponds to Prince’s (1981; 1992) inferrables.
wrongly extracted markables (such as  course in
 of course ), for idiomatic occurrences, and exple-
tive uses of  it . Traces are automatically extracted
as markables, but are left unannotated. In the rare
event the annotators  nd some fragments too dif -
cult to understand, a category not-understood can be
assigned. Entities marked as non-applicable or not-
understood are excluded from any further annotation.
For all other markables, the annotators must choose
between old, mediated, and new. For the  rst two,
subtypes can also be speci ed: subtype assignment
is encouraged but not compulsory.
New The category new is assigned to entities that
have not yet been introduced in the dialogue and that
the hearer cannot infer from previously mentioned
entities. No subtypes are speci ed for this category.
Mediated Mediated entities are inferrable from
previously mentioned ones, or generally known to
the hearer. We specify nine subtypes: general, bound,
part, situation, event, set, poss, func value, aggrega-
tion.4 Generally known entities such as  the moon 
or  Italy are assigned a subtype general. Most
proper nouns fall into this subclass, but the anno-
tator could opt for a different tag, depending on the
context. Also mediated are bound pronouns, such as
 them in (1), which are assigned a subtype bound.5
(1) [. . . ] it’s hard to raise one child without them
thinking they’re the pivot point of the universe.
A subtype poss is used to mark all kinds of intra-
phrasal possessive relations (pre- and postnominal).
Four subtypes (part, situation, event, and set) are
used to mark instances of bridging. The subtype part
is used to mark part-whole relations for physical ob-
jects, both as intra- and inter-phrasal relations. (This
category is to be preferred to poss whenever appli-
cable.) The occurrence of  the door in (2), for in-
stance, is annotated as mediated/part.
(2) When I come home in the evenings my dog
greets me at the door.
For similar relations that do not involve physical ob-
jects, i.e. if an entity is part of a situation set up by
4Some of the subtypes are inspired by categories developed
for bridging markup (Passonneau, 1996; Davies et al., 1998).
5All examples in this paper are from the Switchboard Cor-
pus. The markable in question is typed in boldface; antecedents
or trigger entities, where present, are in italics. For the sake of
space we do not provide examples for each category (see (Nis-
sim, 2003)).
46
a previously introduced entity, we use the subtype
situation.6,as for the NP  the speci cations in (3).
(3) I guess I don’t really have a problem with cap-
ital punishment. I’m not really sure what the
exact speci cations are for Texas.
The subtype event is applied whenever an entity is
related to a previously mentioned verb phrase (VP).
In (4), e.g.,  the bus is triggered by travelling
around Yucatan.
(4) We were travelling around Yucatan, and the
bus was really full.
Whenever an entity referred to is a subset of, a super-
set of, or a member of the same set as a previously
mentioned entity, the subtype set is applied.
Rarely, an entity refers to a value of a previously
mentioned function, as  zero and  ten in (5). In
such cases a subtype func-value is assigned.
(5) I had kind of gotten used to centigrade temper-
ature [. . . ] if it’s between zero and ten it’s cold.
Lastly, a subtype aggregation is used to classify co-
ordinated NPs. Two old or med entities, for instance
do not give rise to an old coordinated NP, unless it
has been previously introduced as such. A medi-
ated/aggregation tag is assigned instead.
Old An entity is old when it is not new nor medi-
ated. This is usually the case if an entity is coref-
erential with an already introduced entity, if it is
a generic pronoun, or if it is a personal pronoun
referring to the dialogue participants. Six differ-
ent subtypes are available for old entities: identity,
event, general, generic, ident generic, relative. In (6),
for instance,  us would be marked as old because it
corefers with  we , and a subtype identity would also
be assigned.
(6) [. . . ] we camped in a tent, and uh there were
two other couples with us.
In addition, a coreference link is marked up between
anaphor and antecedent, thus creating anaphoric
chains (see also (Carletta et al., 2004)). The subtype
event applies whenever the antecedent is a VP. In (7),
 it is old/event, as its antecedent is the VP  educate
three . As we do not extract VPs as markables, no
link can be marked up.
(7) I most certainly couldn’t educate three. I don’t
know how my parents did it.
6This includes elements of the thematic grid of an already
introduced entity. It subsumes Passonneau’s (1996) class  arg .
Also classi ed as old are personal pronouns refer-
ring to the dialogue participants as well as generic
pronouns. In the  rst case, a subtype general is spec-
i ed, whereas the subtype for the second is generic.
An instance of old/generic is  you in (8).
(8) up here you got to wait until Aug- August until
the water warms up.
In a chain of generic references, the subtype
ident generic is assigned, and a coreference link is
marked up. Coreference is also marked up for rel-
ative pronouns: they receive a subtype relative and
are linked back to their head.
The guidelines contain a decision tree the annota-
tors use to establish priority in case more than one
class is appropriate for a given entity. For example,
if a mediated/general entity is also old/identity the latter
is to be preferred to the former. Similar precedence
relations hold among subtypes.
To provide more robust and reliable clues in an-
notating bridging types (e.g. for distinguishing
between poss and part), we provided replacement
tests and referred to relations encoded in knowledge
bases such as WordNet (Fellbaum, 1998) (for part)
and FrameNet (Baker et al., 1998) (for situation).
3.2 Validation of the Scheme
Three Switchboard dialogues (for a total of 1738
markables) were marked up by two different anno-
tators for assessing the validity of the scheme. We
evaluated annotation reliability by using the Kappa
statistic (Carletta, 1996). Good quality annotation
of discourse phenomena normally yields a kappa
(a4 ) of about .80. We assessed the validity of the
scheme on the four-way classi cation into the three
main categories (old, mediated and new) and the non-
applicable category. We also evaluated the annota-
tion including the subtypes. All cases where at least
one annotator assigned a not-understood tag were ex-
cluded from the agreement evaluation (14 mark-
ables). Also excluded were all traces (222 mark-
ables), which the annotators left unmarked. The
total markables considered for evaluation over the
three dialogues was therefore 1502.
The annotation of the three dialogues yielded
a4 a5a7a6a9a8a11a10a13a12 for the high-level categories, and a4 a5
a6a15a14a16a8a17a8 when including subtypes (a18a19a5a21a20a22a12a16a23a25a24 ;a26a27a5a28a24 ).
7
7a29 stands for the number of instances annotated and
a30 for
47
These results show that overall the annotation is re-
liable and that therefore the scheme has good repro-
ducibility. When including subtypes agreement de-
creases, but backing-off to the high-level categories
is always possible, thus showing the virtues of a hi-
erarchically organised scheme. Reliability tests for
single categories showed that mediated and new are
more dif cult to apply than old, for which agree-
ment was measured ata4a31a5a32a6a9a33a16a23a25a24 , although still quite
reliable (a4 a5a34a6a9a8a16a23a17a23 and a4 a5a34a6a15a14a16a33a11a10 , respectively).
Agreement for non-applicable wasa4a31a5a32a6a9a8a11a10a13a35 .
The annotators found the decision tree very useful
when having to choose between more than one ap-
plicable subtype, and we believe it has a signi cant
impact on the reliability of the scheme.
The scheme was then applied for the annotation of
a total of 147 Switchboard dialogues. This amounts
to 43358 sentences with 69004 annotated markables,
35299 of which are old, 23816 mediated and 9889
new (8127 were excluded as non-applicable, and 160
were not understood), and 16324 coreference links.
In Section 6 we use this scheme to annotate the
Pie-in-the-Sky text.
3.3 Related Work
To our knowledge, (Eckert and Strube, 2001) is
the only other work that explicitly refers to IS an-
notation. They also use a Prince’s (1992)-based
old/med/new distinction for annotating Switchboard
dialogues. However, their IS annotation is specif-
ically designed for salience ranking of candidate
antecedents for anaphora resolution, and not de-
scribed in detail. They do not report  gures on
inter-annotator agreement so that a proper compar-
ison with our experiment is not feasible. Among
the schemes that deal with annotation of anaphoric
NPs, our scheme is especially comparable with
DRAMA (Passonneau, 1996) and MATE (Davies
et al., 1998). Both schemes have a hierarchical
structure. In DRAMA, types of inferrables can be
speci ed, within a division into conceptual (prag-
matically determined) vs. linguistic (based on ar-
gument structure) inference. No annotation experi-
ment with inter-annotator agreement  gures is how-
ever reported. MATE provides subtypes for bridg-
ing relations, but they were not applied in any anno-
the number of annotators. Unless otherwise speci ed, a29a31a36
a37a39a38a41a40a43a42 and
a30
a36
a42 hold for all
a44 scores reported in Section 3.
tation excercise, so that reliability and distribution
of categories are only based on the  core scheme 
(true coreference). For a detailed comparison of our
approach with related efforts on the annotation of
anaphoric relations, see (Nissim et al., 2004).
4 Information Structure
We have seen that information status describes how
available an entity is in a discourse. Generally old
entities are available, and new entities are not. In
prosody we  nd that newness is highly correlated
with pitch accenting, and oldness with deaccent-
ing (Cutler et al., 1997). However, this is only
one aspect of information structure. We also need
to describe how speakers signal the organisation
and salience of elements in discourse. Building on
the work of (Vallduv· & Vilkuna, 1998), as devel-
oped by (Steedman, 2000), we de ne two notions,
theme/rheme structure and background/kontrast.
Theme/rheme structure guides how an element  ts
into the discourse model: if it relates back it is the-
matic; if it advances the discourse it is rhematic.
Steedman claims that intonational phrases can mark
information units (theme and rheme - though not
all boundaries are realised and a unit may contain
more than one phrase). The pitch contour associated
with nuclear accents in themes is distinct from that
in rhemes (which he identi es as L+H*LH% and
H*LH% re ToBI (Beckman and Elam, 1997)), so
that, where present, such boundaries disambiguate
information structure. (See (9)).8
(9) (Q) Personally, I love hyacinths.
What kind of bulbs grow well in your area?
(A)
(In MY AREA)
Bkgd Kont. Bkgd (Theme)
(it is the DAFFODIL)
Bkgd Kont. (Rheme)
The second dimension, kontrast, relates to salience.9
We expect new entities to be salient and old entities
not. Therefore, if an old element is salient, or a new
one especially salient, an extra meaning is implied.
8Annotation is as in Section 3. Words in SMALL CAPS
are accented, parentheses indicate intonation phrases, including
boundary tones if present. See website to hear some examples
from this section.
9We use kontrast to distinguish it from the everyday use of
contrast and the sometimes con icting uses of contrast in the
literature. Annotators, however, will not be given this term.
48
These are largely subsumed by kontrast, i.e. distin-
guishing an element from alternatives made avail-
able by the context (See (9)).
4.1 Annotation Scheme
As we have seen, in English, information structure
is primarily conveyed by intonation. We therefore
think it is vital for annotators to listen to the speech
while annotating this structure.
4.1.1 Theme/Rheme
We have claimed that prosodic phrasing can divide
utterances into information units. However, often
theme material is entirely background, i.e., mutually
known and without contrasting alternatives. There-
fore, for both model theoretic and practical pur-
poses, it is the same as background of the rheme.
Accordingly, we work with a test for themehood,
de ning the rheme as any prosodic phrase that is
not identi able as a theme.
Annotators will mark each prosodic phrase as a
theme if it only contains information which links the
utterance to the preceding context, i.e. setting up
what they’re saying in relation to what’s been said
before. In their opinion, even if this is not the tune
the speaker used, it must sound appropriate if they
say it with a highly marked tune, such as L+H*
LH%. For example, in (10), the phrase  where I
lived links  was a town called Newmarket to the
statement the speaker lived in England (accenting
not shown). It would be appropriate to utter it with
an L+H* accent on  Where and/or  lived, , and a
 nal LH%. So it is a theme. The same accent on
 town and/or  Newmarket sounds inappropriate,
and it advances the discussion, so it is a rheme.
(10) I lived over in England for four years
(Where I lived) (Theme)
(was a town called Newmarket) (Rheme)
4.1.2 Background/Kontrast
Although there is a clear link between prosodic
prominence and kontrast, there are a number of dis-
agreements about how this works which this annota-
tion effort seeks to resolve. Some, including (Steed-
man, 2000), have claimed that kontrast within theme
and kontrast within rheme are marked by categor-
ically distinct pitch accents. Another view is that
kontrast, also called contrastive focus or topic, only
applies to themes that are contrastive; the head of
a rheme phrase always attracts a pitch accent, it is
therefore redundant to call one part kontrastive. Fur-
ther, some consider kontrast within a rheme phrase
only occurs when there is a clear alternative set, i.e.
the distinction between broad and narrow focus, as
in (9) where daffodil contrasts with other bulbs the
speaker might grow. Again, there is controversy on
whether there is an intonational difference between
broad and narrow focus (Calhoun, 2004a). If these
distinctions are marked prosodically, it is disputed
whether this is with different pitch accents (Steed-
man), or by the relative height of different accents in
a phrase (Rump and Collier, 1996; Calhoun, 2004b).
Rather than using the abstract notion of kontrast
directly, annotators will identify discourse scenar-
ios which commonly invoke kontrast (drawing on
functions of emphatic accents from (Brenier et al.,
2005)).10 This addresses the disagreements above,
while making our annotation more constrained and
robust. In each case, using the full discourse context
including the speech, annotators mark each content
word (noun, verb, adjective, adverb and demonstra-
tive pronoun) for the  rst category that applies. If
none apply, they mark it as background.
correction The speaker’s intent is to correct or
clarify another just used by them or the other
speaker. In (11), e.g., the speaker wishes to clarify
whether her interlocutor really meant  hyacinths .
(11) (now are you sure they’re HYACINTHS) (be-
cause that is a BULB)
contrastive The speaker intends to contrast the
word with a previous one which was (a) a cur-
rent topic; (b) semantically related to the contrastive
word, such that they belong to a natural set. In (12),
B contrasts recycling in her town  San Antonio ,
with A’s town  Garland , from the set places where
the speakers live.
(12) (A) I live in Garland, and we’re just beginning
to build a real big recycling center...
(B) (YEAH there’s been) (NO emphasis on recy-
cling at ALL) (in San ANTONIO)
10Emphasis can occur for two major reasons, both identi ed
by Brenier: emphasis of a particular word or phrase, i.e. kon-
trast, or emphasis over a larger span of speech, conveying af-
fective connotations such as excitement, which is not included
here. (Ladd, 1996).
49
subset The speaker highlights one member of a
more general set that has been mentioned and is a
current topic. In (13), the speaker introduces  three
day cares , and then gives a fact about each.
(13) (THIS woman owns THREE day cares) (TWO in
Lewisville) (and ONE in Irving) (and she had to
open the SECOND one up) (because her WAIT-
ING list was) (a YEAR long)
adverbial The speaker uses a focus-sensitive ad-
verb, i.e. only, even, always or especially to high-
light that word, and not another in the natural set.
The adverb and/or the word can be marked. In (14),
B didn’t even like the  previews of ‘The Hard
Way’, let alone the movie.
(14) (A) I like Michael J Fox, though I thought he
was crummy in ‘The Hard Way’.
(B) (I didn’t even like) (the PREVIEWS )
answer The word (or its syntactic phrase, e.g. an
NP) and no other,  lls to an open proposition set
up in the context. It must make sense if they had
only said that word or phrase. In (15), A sets up the
 blooms she can’t identify, and B answers  lily .
(15) (A) We have these blooms, I’m not sure what
they are but they come in all different colours
yellow, purple, white...
(B) (I BET you) (that that’s a LILY)
Again, in Section 6 we apply the scheme to the
Pie-in-the-Sky text.
4.2 Related Work
Annotator agreement for pitch accents and prosodic
boundaries, re ToBI, is about 80% and 90% respec-
tively (Pitrelli et al., 1994). Automatic performance,
using acoustic and textual features, is now above
85% accuracy (Shriberg et al., 2000). However, this
does not distinguish prosodic events which occur for
structural or rhythmical reasons from those which
mark information structure (Ladd, 1996). (Heldner
et al., 1999) try to predict focal accents. They de-
 ne this minimally as the most prominent in a three-
word phrase. (Hirschberg, 1993) got 80-98% accu-
racy using only text-based features. However, her
de nition of contrast was not as thorough as ours.
(Hedberg and Sosa, 2001) looked at marking of rat-
i ed, unrati ed (old and new) and contrastive topics
and foci (theme and rheme) with ToBI pitch accents.
(Baumann et al., 2004) annotated a simpler informa-
tion structure and prosodic events in a small German
corpus.
5 Information Structure and Prosodic
Structure
Much previous work, not corpus-based, draws a di-
rect correspondence between information structure,
prosodic phrasing and pitch accent type. However
in real speech there are many non-semantic in u-
ences on prosody, including phrase length, speaking
rate and rhythm. Information structure is rather a
strong constraint on the realisation of prosodic struc-
ture (Calhoun, 2004a). Contrary to the assumption
of ToBI, this structure is metrical, highly structured
and linguistically relevant both within and across
prosodic phrases (Ladd, 1996; Truckenbrodt, 2002).
One of our main aims is to test how such ev-
idence can be reconciled with theories presented
earlier about the relationship between information
structure and prosody. Local prominence levels have
been shown to aid in the disambiguation of focal ad-
verbs, anaphoric links, and global discourse struc-
tures marked as elaboration, continuation, and con-
trast (Dogil et al., 1997). Global measures of promi-
nence level have been linked to topic structure, cor-
rections, and turn-taking cues (Ayers, 1994). (Bre-
nier et al., 2005) found that emphatic accents re-
alised special discourse functions such as assess-
ment, clari cation, contrast, negation and protest in
child-directed speech. Most of these functions can
be seen as conversational implicatures of kontrast,
i.e. if an element is unexpectedly highlighted, this
implies an added meaning. Brenier found that while
pitch accents can be detected using both acoustic
and textual cues; textual features are not useful in
detecting emphatic pitch accents, showing there is
added meaning not available from the text.
As noted in Section (4.2), inter-annotator agree-
ment for the identi cation of prosodic phrase bound-
aries with ToBI is reasonably good. We will there-
fore label ToBI break indices 3 and 4 (con ated)
(Beckman and Elam, 1997). Annotators will also
mark the perceived level of prosodic prominence on
each word using a de ned scale. We are currently
running a pilot experiment to identify a reasonable
number of gradations of prosodic prominence, from
completely unstressed and/or reduced to highly em-
phatic, to use for the  nal annotation.
50
[But [[[Yemen’ s]a45a47a46a49a48a41a50a49a51a41a46a53a52a16a46a55a54a57a56a41a58 president]a45a59a46a53a48a60a50a62a61a22a63a57a64a53a64 ]]a65 a63a49a52a11a66a67a54a68a56a69a64a70a66a67a71a73a72a41a46 says]a74a76a75a47a77a79a78a80a77 [[the
FBI]a63a68a58a9a48a60a50a49a71a73a48a41a46a55a52a11a66a67a71a73a66a67a81 has told [him]a63a68a58a9a48a60a50a49a71a73a48a41a46a55a52a11a66a67a71a73a66a67a81 ]a74a82a75a83a77a79a78a80a77 [ [the explosive material]a45a47a46a49a48a41a50a57a64a53a46a55a66
could only have come from [[[the U.S.]a45a59a46a53a48a41a50a49a51a60a46a55a52a16a46a55a54a57a56a60a58, [israel]a45a59a46a53a48a41a50a49a51a60a46a55a52a16a46a55a54a57a56a60a58, or [[two arab
countries]a45a47a46a49a48a41a50a57a64a53a46a55a66 ]a45a47a46a49a48a41a50a68a56a39a51a41a51a39a54a57a46a55a51a41a56a41a66a67a71a67a63a53a52 .]a84 a48a39a72a41a46a55a54a57a85a62a71a67a56a41a58]a86a87a75a47a77a79a78a80a77 [And to [[a former federal bomb
investigator]a52a11a46a53a88 ,]a65 a63a53a52a11a66a67a54a57a56a60a64a55a66a67a71a73a72a41a46 ]a74a82a75a83a77a79a78a80a77 [[that description]a63a49a58a15a48a41a50a68a46a53a72a41a46a55a52a11a66 suggests]a74a82a75a83a77a79a78a80a77 [[a power-
ful military-style plastic explosive C-4]a45a59a46a53a48a41a50a57a64a53a46a53a66]a84 a52a17a64a70a88a82a46a53a54 [[that]a63a68a58a15a48a41a50a49a54a57a46a53a58a15a56a39a66a67a71a73a72a41a46 can be cut or molded into
[different shapes]a52a16a46a53a88 . ]a86a87a75a47a77a79a78a80a77
Figure 1: Annotation of Pie-in-the-Sky sentences with Information Structure
6 Pie-in-the-Sky annotation
 Pie in the Sky is a joint effort to annotate two
sentences with as much semantic/pragmatic in-
formation as possible (see http://nlp.cs.nyu.
edu/meyers/pie-in-the-sky.html). Information
structure is one of the desired annotation layers.
And, as standards are not yet established, our pro-
posal contributes to de ning annotation guidelines
for this structure. Figure 1 report the Pie-in-the-sky
sentences enriched with our annotation. The context
prior to these sentences is as follows:
 a 12-year-old boy reports seeing a man launch a rubber boat
from a car parked at the harbor. fbi of cials  nd what they be-
lieve may be explosives in the car. yemeni police trace the car
to a nearby house. the fbi  nds traces of explosives on clothes
found neighbors say they saw two men who they describe as
 arab-looking living there for several weeks. police also  nd
a second house where authorities believe two others may have
assembled the bomb, possibly doing some welding. passports
found in one of the houses identify the men as from a privilege
convenience province noted for lawless tribes. but the docu-
ments turn out to be fakes. meantime, analysts at the fbi crime
lab try to discover what the bomb was made from. no con-
clusions yet, u.s. of cials say. but a working theory, plastic
explosive. 
We identi ed 14 NPs markable for information sta-
tus (see Figure 1).11 Most annotations were straight-
forward. Some comments though:  Yemen is an-
notated as med/general, although it could also be
med/sit as  Yemeni was previously mentioned. Our
decision tree was used for such cases.  The explo-
sive material is med/set not old/identity since it refers
to the kind of explosive used rather than to a speci c
entity previously mentioned.
In the absence of any prosodic annotation in the
transcript, these sentences are slightly ambiguous
as to information structure. The most likely in-
terpretation is given in Figure 1.12 For example,
 Yemen’s President contrasts with  US of cials ,
11Square brackets are used to mark annotation boundaries.
12Kontrast is marked with the relevant category, unmarked
words are background.
in the set of people talking about what the bomb is
made of. Since both words are contrastive, either
or both could have L+H* accents, whereas  say 
could not. The inclusion of the latter in the theme
is consistent with the possibility of a rising bound-
ary LH% after it.  The FBI has told him is the-
matic because it links  Yemen’s president ’s opinion
to the previous discourse. It also would sound ap-
propriate with an L+H*LH% tune. As can be seen,
although theme/rheme and prosodic phrase bound-
aries align, in both cases the VP is split between in-
formation/intonation phrases. The independence of
information structure and intonation structure from
traditional surface structure is a major reason behind
our use of ‘stand-off’ markup.
7 Applications and Future Work
Once completed, the annotations we have presented,
along with those existing for syntax, dis uencies
and dialog-acts on the same portion of Switchboard,
will create a corpus of conversational speech unique
in terms of size and richness of annotation. In con-
junction with the NXT tools, this resource would
optimally lend itself to detailed and rich analysis
of diverse linguistic phenomena, the ultimate goal
of the Pie in the Sky project. It will be useful for
a large range of NLP applications, including para-
phrase analysis and generation, topic detection, in-
formation extraction and speech synthesis in dia-
logue systems.
Website Example sound  les available at
http://homepages.inf.ed.ac.uk/s0199920/pieinsky.html.
Acknowledgements Part of this work was funded by Scottish
Enterprise (The Edinburgh-Stanford Link Paraphrase Analysis
for Improved Generation and Sounds of Discourse). We would
like to thank David Beaver, Jean Carletta, Shipra Dingare, Flo-
rian Jaeger, Dan Jurafsky, Vasilis Karaiskos and Bob Ladd for
valuable help and discussion.
51

References
G. M. Ayers. 1994. Discourse functions of pitch range in spon-
taneous and read speech. In J. Venditti, editor, OSU Working
Papers in Linguistics, volume 44, pages 1 49.
C. F. Baker, C. J. Fillmore, and J. B. Lowe. 1998. The Berkeley
FrameNet project. In C. Boitet and P. Whitelock, editors,
Proc. COLING-ACL, pages 86 90.
E. Bard, D. Robertson, and A. Sorace. 1996. Magnitude esti-
mation of linguistic acceptability. Language, 72(1):32 68.
S. Baumann, C. Brinckmann, S. Hansen-Schirra, G-J. Krui-
jff, I. Kruijff-Korbayov·a, S. Neumann, and E. Teich. 2004.
Multi-dimensional annotation of linguistic corpora for inves-
tigating information structure. In Proc. NAACL/HLT  Fron-
tiers in Corpus Annotation , Boston, MA.
M. Beckman and G. Elam. 1997. Guidelines for ToBI La-
belling.. The OSU Research Foundation, v.3.0.
P. Boersma and D. Weenink. 2003. Praat:doing phonetics by
computer. http://www.praat.org.
J. M. Brenier, D. M. Cer, and D. Jurafsky. 2005. Emphasis
detection in speech using acoustic and lexical features. In
LSA Annual Meeting, Oakland, CA.
S. Calhoun. 2004a. Overloaded ToBI and what to do about
it: An argument for function-based phonological intonation
categories. In Univ. of Edinburgh Ling. Postgrad. Conf..
S. Calhoun. 2004b. Phonetic dimensions of intonational cate-
gories - L+H* and H*. In Prosody 2004, Nara, Japan.
J. Carletta, S. Dingare, M. Nissim, and T. Nikitina. 2004. Us-
ing the NITE XML Toolkit on the Switchboard Corpus to
study syntactic choice: a case study. In Proc. of LREC2004,
Lisbon.
J. Carletta. 1996. Assessing agreement on classi cation tasks:
the kappa statistic. Comp. Ling., 22(2):249 254.
H. H. Clark. 1975. Bridging. In R. Schank and B. Nash-
Webber, eds, Theoretical Issues in NLP. MIT Press, Cam-
bridge, MA.
A. Cutler, D. Dahan, and W. van Donselaar. 1997. Prosody in
the comprehension of spoken language: A literature review.
Lang. and Sp., 40(2):141 201.
S. Davies, M. Poesio, F. Bruneseaux, and L. Romary. 1998.
Annotating coreference in dialogues: Proposal for a scheme
for MATE, http://www.hcrc.ed.ac.uk/ poesio/anno_
manual.html.
G. Dogil, J. Kuhn, J. Mayer, G. Mhler, and S. Rapp. 1997.
Prosody and discourse structure: Issues and experiments. In
Proc. of the ESCA Workshop on Intonation: Theory, Models
and Applications, pages 99 102, Athens, Greece.
M. Eckert and M. Strube. 2001. Dialogue acts, synchronising
units and anaphora resolution. J. of Semantics, 17(1):51 89.
C. Fellbaum, editor. 1998. WordNet: An Electronic Lexical
Database. MIT Press, Cambridge, MA.
J. Godfrey, E. Holliman, and J. McDaniel. 1992. SWITCH-
BOARD: Telephone speech corpus for research and devel-
opment. In Proc. ICASSP-92, pages 517 520.
N. Hedberg and JM. Sosa. 2001. The prosodic structure of
topic and focus in spontaneous english dialogue. In LSA
Workshop on Topic and Focus, Santa Barbara.
M. Heldner, E. Strangert, and T. Deschamps. 1999. A focus de-
tector using overall intensity and high frequency emphasis.
In Proc. ICPhS-99, vol 2, 1491 1493, San Francisco.
J. Hirschberg. 1993. Pitch accent in context: Predicting intona-
tional prominence from text. AI, 63:305 340.
L. Hirschman and N. Chinchor. 1997. MUC-7 coreference task
de nition. In Proc. of 7a89a67a90 Conf. on Message Understanding.
D. R. Ladd. 1996. Intonational Phonology. CUP, UK.
K. Lambrecht. 1994. Information structure and sentence form.
Topic, focus, and the mental representation of discourse ref-
erents. Camb. U. Press, UK.
S. Lcurrency1obner. 1985. De nites. J. of Semantics, 4:279 326.
M. Marcus, B. Santorini, and MA. Marcinkiewicz. 1993.
Building a large annotated corpus of english: The Penn tree-
bank. Comp. Ling., 19:313 330.
A. Mengel and W. Lezius. 2000. An XML-based encod-
ing format for syntactically annotated corpora. In Proc.
LREC2000,121 126.
M. Nissim. 2003. Annotation scheme for information status in
dialogue. HCRC, University of Edinburgh. Unpub. ms.
M. Nissim, S. Dingare, J. Carletta, and M. Steedman. 2004.
An annotation scheme for information status in dialogue. In
Proc. LREC2004, Lisbon.
R. Passonneau. 1996. Instructions for applying discourse ref-
erence annotation for multiple applications (DRAMA). Un-
pub. ms..
J. Pitrelli, M. Beckman, and J. Hirschberg. 1994. Evaluation of
prosodic transcription labelling reliability in the ToBI frame-
work. In Proc. of the 3a91a93a92 Intl. Conf. on Spoken Lge. Proc.,
vol. 2, pages 123 126.
M. Poesio. 2000. The GNOME annotation scheme man-
ual (v.4), http://www.hcrc.ed.ac.uk/ gnome/
anno_manual.html.
E. F. Prince. 1981. Toward a taxonomy of given-new infor-
mation. In P. Cole, ed., Radical Pragmatics. Acad. Press,
NY.
E. Prince. 1992. The ZPG letter: subjects, de niteness, and
information-status. In S. Thompson and W. Mann, eds., Dis-
course description: diverse analyses of a fund raising text,
pages 295 325. John Benjamins, Philadelphia/Amsterdam.
H.H. Rump and R. Collier. 1996. Focus conditions and the
prominence of pitch-accented syllables. Lang. and Sp.,
39:1 17.
E. Shriberg, P. Taylor, R. Bates, A. Stolcke, K. Ries, D. Ju-
rafsky, N. Coccaro, R. Martin, M. Meteer, and C.V. Ess-
Dykema. 1998. Can prosody aid the automatic classi ca-
tion of dialog acts in conversational speech? Lang. and Sp.,
41(3-4):439 487.
E. Shriberg, A. Stolcke, D. Hakkani-Tcurrency1ur, and G. Tcurrency1ur. 2000.
Prosody-based automatic segmentation of speech into sen-
tences and topics. Sp. Comm., 32(2):127 154.
M. Steedman. 2000. Information Structure and the Syntax-
Phonology Interface. LI, 31(4):649-689.
H. Truckenbrodt. 2002. Upstep and embedded register levels.
Phonology, 19:77 120.
E. Vallduv· and M. Vilkuna. 1998. On Rheme and Kontrast.
Syntax and Semantics, 29:79-108.
