Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 1–4,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Introduction to
Frontiers in Corpus Annotation II
Pie in the Sky
Adam Meyers
New York University
meyers@cs.nyu.edu
1 Introduction
The Frontiers in Corpus Annotation workshops are op-
portunities to discuss the state of the art of corpus annota-
tion in computational linguistics. Corpus annotation has
pushed the enitre  eld in new directions by providing new
task de nitions and new standards of analysis. At the  rst
Frontiers in Corpus Annotation workshop at HLT-NAACL
2004 we compared assumptions underlying different an-
notation projects in light of both multilingual applications
and the pursuit of merged representations that incorporate
the result of various annotation projects.
Beginning September, 2004, several researchers have
been collaborating to produce detailed semantic anno-
tation of two dif cult sentences. The effort aimed to
produce a single uni ed representation that goes beyond
what may currently be feasible to annotate consistently
or to generate automatically. Rather this  pie in the sky 
annotation effort was an attempt at de ning a future goal
for semantic analysis. We decided to use the  Pie in the
Sky annotation effort (http://nlp.cs.nyu.edu/meyers/pie-
in-the-sky.html) as a theme for this year’s workshop.
Consequently this theme has been brought out in many
of the papers contained in this volume.
The  rst 4 papers (Pustejovsky et al., 2005; E. W.
Hinrichs and S. Kcurrency1ubler and K. Naumann, 2005; Bies
et al., 2005; Dinesh et al., 2005) all discuss some as-
pect of merging annotation. (Pustejovsky et al., 2005)
describes issues that arise for merging argument struc-
tures for verbs, nouns and discourse connectives, as well
as time and anaphora representations. (E. W. Hinrichs
and S. Kcurrency1ubler and K. Naumann, 2005) focuses on the
merging of syntactic, morphological, semantic and ref-
erential annotation. (E. W. Hinrichs and S. Kcurrency1ubler and
K. Naumann, 2005) also points out that the  Pie in the
Sky representation lacks syntactic features. This brings
to light an important point of discussion: should linguis-
tic analyses be divided out into separate  levels cor-
responding to syntax, morphology, discourse, etc. or
should/can a single representation represent all such  lev-
els ? As currently conceived,  Pie in the Sky is in-
tended to be as  language neutral as possible  this may
make adding a real syntactic level dif cult. However, ar-
guably, surface relations can be added on as features to
Pie in the Sky, even if we delete or ignore those features
for some (e.g., language neutral) purposes. Still, other
papers present further dif culties for maintaining a sin-
gle representation that covers multiple modes of analysis.
(Bies et al., 2005) discusses possible con icts between
named entity analyses and syntactic structure and (Di-
nesh et al., 2005) discusses a con ict between discourse
structure and syntactic structure. I think it is reasonable to
assume that some such con icts will be resolvable, e.g.,
I believe that the named entity con icts point to short-
comings of the original Penn Treebank analysis. How-
ever, the discourse structure/syntactic structure con icts
may be harder to solve. In fact, some annotation projects,
e.g., the Prague Dependency Treebank (Haji cov·a and Ce-
plov·a, 2000), assume that multiple analyses or  levels 
are necessary to describe the full range of phenomena.
The 5th through 7th papers (Inui and Okumura, 2005;
Calhoun et al., 2005; Wilson and Wiebe, 2005) investi-
gate some additional types of annotation that were not
part of the distributed version of Pie in the Sky, but which
could be added in principle. In fact, with help from
the authors of (Calhoun et al., 2005), I did incorporate
their analysis into the latest version (number 6) of the Pie
in the Sky annotation. Furthermore, it turns out that
some units of Information Structure cross the boundaries
of the syntactic/semantic constituents, thus raising the
sort of dif culties discussed in the previous paragraph.
Speci cally, information structure divides sentences into
themes and rhemes. For the sample two sentences, the
rheme boundaries do correspond to syntactic units, but
the theme boundaries cross syntactic boundaries, forming
units made up of parts of multiple syntactic constituents.
(Palmer et al., 2005; Xue, 2005) (the eighth and
1
eleventh papers) make comparisons of annotated phe-
nomena across English and Chinese. It should be pointed
out that seven of the papers at this workshop are pre-
dominantly about the annotation of English, one is about
German annotation and one is about Japanese annotation.
These two are the only papers at the workshop that explic-
itly discuss attempts to apply the same annotation scheme
across two languages.
(McShane et al., 2005; Poesio and Artstein, 2005) (the
ninth and tenth papers) both pertain to issues about im-
proving the annotation process. (Poesio and Artstein,
2005) discusses some better ways of assessing inter-
annotator agreement, particularly when there is a gray
area between correct and incorrect annotation. (McShane
et al., 2005) discusses the issue of human-aided annota-
tion (human correction of a machine-generated analysis)
as it pertains to a single-integrated annotation scheme,
similar in many ways to  Pie in the Sky , although it has
been in existence for a lot longer.
2 Issues for Discussion
These papers raise a number of important issues for dis-
cussion, some of which I have already touched on.
Question 1: Should the community annotate lots of
individual phenomena independently of one another or
should we assume an underlying framework and per-
form all annotation tasks so they are compatible with that
framework?
Some of the work presented describes the annotation
of fairly narrow linguistic phenomena. Pie in the Sky can
be viewed as a framework for unifying these annotation
schemata into a single representation (a Uni ed Linguis-
tic Annotation framework in the sense of (Pustejovsky et
al., 2005)). Other work presented assumes that the in-
tegrated framework is the object of the annotation rather
than the result of merging annotations (E. W. Hinrichs
and S. Kcurrency1ubler and K. Naumann, 2005; McShane et al.,
2005). There are pros and cons to both approaches.
When researchers decide to annotate one small piece
of linguistic analysis (verb argument structure, noun ar-
gument structure, coreference, discourse structure, etc.),
this has the following potential advantages: (1) explor-
ing one phenomenon in depth may provide a better char-
acterization of that phenomenon. If individual phenom-
ena are examined with this level of care, perhaps we will
end up with a better overall analysis; (2) a very focused
task de nition for the annotator may improve interanno-
tator agreement; and (3) it is sometimes easier to ana-
lyze a phenomenon in isolation, especially if there is not
a large literature of previous work about it  indeed, try-
ing to integrate this new phenomenon before adequately
understanding it may unduly bias one’s research. How-
ever, by ignoring a more complete theory, these anno-
tation projects run the risk of task-based biases, e.g.,
classifying predication as coreference or coreference as
argument-hood. While an underlying all-inclusive the-
ory could be a useful roadmap, unifying the results of
several annotation efforts (and resolving inconsistencies)
may yield the same result (as suggested in (Pustejovsky
et al., 2005)) while maintaining the advantages of inves-
tigating the phenomena separately. On the other hand, as
this merging process has not come to completion yet, the
jury is still out.
Let’s say that, for the sake of argument, the reader ac-
cepts the research program where individual annotation
efforts are slowly merged into one  Pie in the Sky type
system. There is still another obvious question that arises:
Question 2: Why make up a brand new system like
 Pie in the Sky when there are so many existing frame-
works around? For example, Head-Driven Phrase Struc-
ture Grammar (Pollard and Sag, 1994) assumes a fairly
large feature structure that would seem to accommodate
every possible level of linguistic analysis (although in
practice most authors in that framework only work on the
syntactic and semantic portion of that feature structure).
Our initial motivation for starting fresh is that we
wanted the framework to use the minimal features nec-
essary to represent the input annotation systems and to
extend them as much as possible. In addition, part of the
experiment was an aim to keep features in a somewhat
language-neutral form and it is not clear that there are ex-
isting frameworks that both share this bias and are suf -
ciently expressive for our purposes. However, ultimately
it might be bene cial to convert  Pie in the Sky to one
or more pre-existing frameworks.
So far, we have limited the scope of  Pie in the Sky 
to semantic and (recently) some discourse information as
well. However, there are some cases where we found it
necessary to include syntactic information, e.g., although
heads are semantic arguments of adjective modi ers, the
surface relation between the head of the noun phrase and
its constituents is important for determining other parts
of meaning. For example, although explosive would bear
the same argument relation to powerful in both (a) The
explosive is powerful and (b) the powerful explosive, the
interpretation of (b) requires that powerful be part of the
same unit as explosive, e.g., for the proper interpretation
of He bought a powerful explosive. Thus it may seem
like a good idea to ultimately  ll out  Pie in the Sky into
a larger framework. However, we would still want to be
able to pick out the language-neutral components of the
analysis from the language-speci c ones.
Question 3: D. Farwell, a member of the workshop
committee, has pointed out that there are levels within se-
mantics. The question is how should these multiple levels
be handled? The annotated examples did not include phe-
nomena such as metaphor, metonymy or idiomaticity that
may have multiple interpretations: literal and intended.
2
For example, an adequate interpretation of I love listen-
ing to Mozart would require Mozart to be decomposed
into music by Mozart (although arguably the representa-
tion of some of the complex discourse references were of
this  avor).
3 What’s in the Latest Pie in the Sky
Analysis
As of this writing, the latest  Pie in the Sky analysis
includes: (1) argument structure of all parts of speech
(verbs, nouns, adjectives, determiners, conjunctions, etc.)
using the PropBank/NomBank/Discourse Treebank argu-
ment labels (ARG0, ARG1, ARG2, a0a1a0a2a0 ), reminiscent of
Relational Grammar of the 1970s and 1980s (Perlmut-
ter, 1984), (2) some more speci cally labeled FrameNet
(Baker et al., 1998) roles for these same constituents;
(3) morphological and part of speech features; (4) point-
ers to gazetteers, both real and hypothetical (thanks to
B. Sundheim); (5) Veracity/According-To features based
on NYU’s proposed FactBank annotation scheme; (6)
various coreference features including some based on a
proposed extension to NomBank; (7) temporal features
based on Timex2 (Ferro et al., 2002) and TimeML (Puste-
jovsky et al., 2004); and (8) Information Structure fea-
tures based on (Calhoun et al., 2005). For more de-
tail, please see: http://nlp.cs.nyu.edu/meyers/pie-in-the-
sky.html
4 The Future of  Pie in the Sky 
After this workshop, we plan to retire the current two
 Pie in the Sky sentences and start again with some new
text. I observed the following obstacles during this ex-
periment: (1) annotation projects were somewhat hesi-
tant to volunteer their time (so we are extremely grateful
to all projects that did so.); (2) the target material was not
long enough for some annotation approaches to be able
to really make their mark, e.g., two sentences are not so
interesting for discourse purposes.; and (3) partially due
to its length, some interesting phenomena were not well-
represented (idioms, metonymy, etc.)
The lack of volunteers may, in part, be related to the
scale of the project. We built the project up slowly and
invited people to join in, rather than posting a request for
annotations to an international list. Initially, this was nec-
essary just to make the project possible to manage. Addi-
tionally, inadequacies of the data were probably barriers
for projects that focused on discourse phenomena or phe-
nomena that was not well-represented by our data. Nev-
ertheless, using more data may place too heavy a burden
on annotation projects and this could make projects hesi-
tant to participate.
With these issues in mind, I note that several sites an-
notated two longer documents for the recent U.S. Govern-
ment sponsored Semantic Annotation Planning Meeting
at the University of Maryland. This success was, in part,
due to the chance for annotation sites to attract govern-
ment interest in funding their projects. While we will not
attempt to duplicate this workshop, I believe that there
is an underlying issue that is very important. The  eld
really needs a single test corpus for all new annotation
projects.
This test corpus would meet a number of important
needs of the annotation community: (1) it would pro-
vide a testbed for new annotation schemata; (2) it would
provide a large corpus that is annotated in a fairly com-
plete framework  this way focused annotation projects
may be able to more easily write speci cations in light
of where their particular set of phenomena  t into some
larger framework; and (3) it would provide a steady  ow
of input annotation in order to produce a single uni ed
annotation framework.
To make this idea a reality, we need to obtain a con-
sensus on what people would like to annotate. Addi-
tionally, we need volunteers to translate this same cor-
pus into other languages, as we would inevitably choose
an English corpus. Of course, if we could  nd a suitable
text that was already translated in multiple languages, this
would save time. The perfect text would be article length
(loosely de ned); include dif cult to handle phenomena
(idioms, metonymy, etc.); include a wide range of an-
notatable linguistic phenomena and not have copyright
restrictions which would hamper the project. It would,
of course, be helpful if the annotation community would
provide input on which text to choose  this would avoid
a situation where one could not annotate the test text be-
cause the target phenomenon is not represented there.
In summary, I have used this introduction to both sum-
marize how the papers of this workshop  t together, to
propose some unifying themes for discussion, and to pro-
pose an agenda for how to proceed after the workshop is
over. We hope to see some of these ideas come to fruition
before  Frontiers in Corpus Annotation III. 

References
Collin F. Baker, Charles J. Fillmore, and John B. Lowe.
1998. The Berkeley FrameNet Project. In Proceedings
of Coling-ACL98: The 17th International Conference
on Computational Linguistics and the 36th Meeting of
the Association for Computational Linguistics, pages
86 90.
A. Bies, S. Kulick, and M. Mandel. 2005. Parallel En-
tity and Treebank Annotation. In ACL 2005 Workshop:
Frontiers in Corpus Annotation II: Pie in the Sky.
S. Calhoun, M. Nissim, M. Steedman, and J. Brenier.
2005. A Framework for Annotating Information Struc-
ture in Discourse. In ACL 2005 Workshop: Frontiers
in Corpus Annotation II: Pie in the Sky.
N. Dinesh, A. Lee, E. Miltsakaki, R. Prasad, A. Joshi,
and B. Webber. 2005. Attribution and the (Non-
)Alignment of Syntactic and Discourse Arguments of
Connectives. In ACL 2005 Workshop: Frontiers in
Corpus Annotation II: Pie in the Sky.
E. W. Hinrichs and S. Kcurrency1ubler and K. Naumann. 2005. A
Uni ed Reprewsentation for Morphological, Syntac-
tic, Semantic, and Referential Annotations. In ACL
2005 Workshop: Frontiers in Corpus Annotation II:
Pie in the Sky.
L. Ferro, L. Gerber, I. Mani, B. Sundheim, and G. Wil-
son. 2002. Instruction Manual for the Annotation of
Temporal Expressions. MITRE Washington C3 Cen-
ter, McLean, Virginia.
Eva Haji cov·a and Mark’eta Ceplov·a. 2000. Deletions
and Their Reconstruction in Tectogrammatical Syntac-
tic Tagging of Very Large Corpora. In Proceedings of
Coling 2000: The 18th International Conference on
Computational Linguistics, pages 278 284.
T. Inui and M. Okumura. 2005. Investigating the Char-
acteristics of Causal Relations in Japanese Text. In
ACL 2005 Workshop: Frontiers in Corpus Annotation
II: Pie in the Sky.
M. McShane, S. Nirenburg, S. Beale, and T. O’Hara.
2005. Semantically Rich Human-Aided Machine An-
notation. In ACL 2005 Workshop: Frontiers in Corpus
Annotation II: Pie in the Sky.
M. Palmer, N. Xue, O. Babko-Malaya, J. Chen, and
B. Snyder. 2005. A Parallel Proposition Bank II for
Chinese and English. In ACL 2005 Workshop: Fron-
tiers in Corpus Annotation II: Pie in the Sky.
David. M. Perlmutter. 1984. Studies in Relational Gram-
mar 1. The University of Chicago Press, Chicago.
M. Poesio and R. Artstein. 2005. The Reliability of
Anaphoric Annotation, Reconsidered: Taking Ambi-
guity into Account. In ACL 2005 Workshop: Frontiers
in Corpus Annotation II: Pie in the Sky.
Carl Pollard and Ivan A. Sag. 1994. Head-Driven Phrase
Structure Grammar. University of Chicago Press and
CSLI Publications, Chicago and Stanford.
J. Pustejovsky, B. Ingria, R. Sauri, J. Castano, J. Littman,
R. Gaizauskas, A. Setzer, G. Katz, and I. Mani. 2004.
The Speci cation Language TimeML. In I. Mani,
J. Pustejovsky, and R. Gaizauskas, editors, The Lan-
guage of Time: A Reader. Oxford University Press,
Oxford.
J. Pustejovsky, A. Meyers, M. Palmer, and M. Poe-
sio. 2005. Merging PropBank, NomBank, TimeBank,
Penn Discourse Treebank and Coreference. In ACL
2005 Workshop: Frontiers in Corpus Annotation II:
Pie in the Sky.
T. Wilson and J. Wiebe. 2005. Annotating Attributions
and Private States. In ACL 2005 Workshop: Frontiers
in Corpus Annotation II: Pie in the Sky.
N. Xue. 2005. Annotating Discourse Connectives in the
Chinese Treebank. In ACL 2005 Workshop: Frontiers
in Corpus Annotation II: Pie in the Sky.
