Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language
Processing (HLT/EMNLP), pages 859–866, Vancouver, October 2005. c©2005 Association for Computational Linguistics
Cross-linguistic Projection of Role-Semantic Information
Sebastian Padó
Computational Linguistics
Saarland University
Saarbrücken, Germany
pado@coli.uni-sb.de
Mirella Lapata
School of Informatics
University of Edinburgh
Edinburgh, UK
mlap@inf.ed.ac.uk
Abstract
This paper considers the problem of auto-
matically inducing role-semantic annota-
tions in the FrameNet paradigm for new
languages. We introduce a general frame-
work for semantic projection which ex-
ploits parallel texts, is relatively inexpen-
siveandcanpotentiallyreducetheamount
of effort involved in creating semantic re-
sources. We propose projection models
that exploit lexical and syntactic informa-
tion. Experimental results on an English-
German parallel corpus demonstrate the
advantages of this approach.
1 Introduction
Shallow semantic parsing, the task of automatically
identifying the semantic roles conveyed by senten-
tial constituents, has recently attracted much atten-
tion, partly because of its increasing importance for
potential applications. For instance, information ex-
traction (Surdeanu et al., 2003), question answer-
ing (Narayanan and Harabagiu, 2004) and machine
translation (Boas, 2002) could stand to benefit from
broad coverage semantic processing.
The FrameNet project (Fillmore et al., 2003)
has played a central role in this endeavour by
providing a large lexical resource based on se-
mantic roles. In FrameNet, meaning is represented
by frames, schematic representations of situations.
Semantic roles are frame-specific, and are called
frameelements.Thedatabaseassociatesframeswith
lemmas (verbs, nouns, adjectives) that can evoke
them (called frame-evoking elements or FEEs), lists
the possible syntactic realisations of their seman-
tic roles, and provides annotated examples from the
British National Corpus (Burnard, 1995). The avail-
ability of rich annotations for the surface realisation
of semantic roles has triggered interest in semantic
parsing and enabled the development of data-driven
models (e.g., Gildea and Jurafsky, 2002).
Frame: DEPARTING
THEME The officer left the house.
The plane leaves at seven.
His departure was delayed.
SOURCE We departed from New York.
He retreated from his opponent.
The woman left the house.
Frame
Elements
FEEs
abandon.v, desert.v, depart.v, departure.n,
emerge.v, emigrate.v, emigration.n, escape.v,
escape.n, leave.v, quit.v, retreat.v, retreat.n,
split.v, withdraw.v, withdrawal.n
Table 1: Example of FrameNet frame
Table 1 illustrates an example from the FrameNet
database, the DEPARTING frame. It has two roles, a
THEME which is the moving object and a SOURCE
expressing the initial position of the THEME. The
frameelementsarerealisedbydifferentsyntacticex-
pressions. For instance, the THEME is typically an
NP, whereas the SOURCE is often expressed by a
prepositional phrase (see the expressions in boldface
in Table 1). The DEPARTING frame can be evoked
by abandon, desert, depart, and several other verbs
as well as nouns (see the list of FEEs in Table 1).
Although recent advances in semantic parsing1
have greatly benefited from the availability of the
English FrameNet, unfortunately such resources are
largely absent for other languages. The English
FrameNet (Version 1.1) contains 513 frames cov-
ering 7,125 lexical items and has been under de-
velopment for approximately six years. Although
FrameNets are currently under construction for Ger-
man, Spanish, and Japanese, these resources are still
in their infancy and of limited value for modelling
purposes. Methods for acquiring FrameNets from
corpora automatically would greatly reduce the hu-
man effort involved and facilitate their development
for new languages.
In this paper, we propose a method which em-
ploys parallel corpora for acquiring frame elements
1Approaches to modelling semantic parsing are too numer-
ous to list; see Carreras and Màrquez (2005) for an overview.
859
and their syntactic realisations (see the upper half of
Table 1) for new languages. Our method leverages
the existing English FrameNet to overcome the re-
source shortage in other languages by exploiting the
translational and structural equivalences present in
aligned data. The idea underlying our approach can
be summarised as follows: (1) given a pair of sen-
tences E (English) and L (new language) that are
translations of each other, annotate E with seman-
tic roles; and then (2) project these roles onto L. In
this manner, we induce semantic structure on the L
side of the parallel text, which can then serve as data
for training a statistical semantic parser for L that is
independent of the parallel corpus.
We first assess if the main assumption of semantic
projection is warranted (Section 3), namely whether
frames and semantic roles exhibit a high degree of
parallelism across languages. Then we propose two
broad classes of projection models that utilise lexi-
cal and syntactic information (Section 4), and show
experimentally that roles can be projected from En-
glish onto German with high accuracy (Section 5).
We conclude the paper by discussing the implica-
tions of our results and future work (Section 6).
2 Related work
A number of recent studies exploit parallel cor-
pora for cross-linguistic knowledge induction. In
this paradigm, annotations for resource-rich lan-
guages like English are projected onto another lan-
guage through aligned parallel texts. Yarowsky et
al. (2001) propose several projection algorithms for
deriving monolingual tools (ranging from part-of-
speech taggers, to chunkers and morphological anal-
ysers) without additional annotation cost. Hwa et
al. (2002) assess the degree of syntactic parallelism
in dependency relations between English and Chi-
nese. Their results show that, although assuming di-
rectcorrespondenceisoftentoorestrictive,syntactic
projection yields good enough annotations to train
a dependency parser. Smith and Smith (2004) ex-
plore syntactic projection further by proposing an
English-Korean bilingual parser integrated with a
word translation model.
Previous work has primarily focused on the pro-
jection of morphological and grammatico-syntactic
information. Inducing semantic resources from low
density languages still poses a significant challenge
to data-driven methods. The challenge is recognised
by Fung and Chen (2004) who construct a Chinese
FrameNet by mapping English FrameNet entries to
concepts listed in HowNet2, an on-line ontology for
Chinese, however without exploiting parallel texts.
The present work extends previous approaches on
annotation projection by inducing FrameNet seman-
tic roles from parallel corpora. Analogously to Hwa
etal.(2002),weinvestigatewhetherthereareindeed
semantic correspondences between two languages,
since there is little hope for projecting meaningful
annotations in nonparallel semantic structures. Sim-
ilarly to Fung and Chen (2004) we automatically in-
ducesemanticroleannotationsforatargetlanguage.
In contrast to them, we resort to parallel corpora as a
source of semantic equivalence. Thus, we avoid the
needforatargetconceptdictionaryinadditiontothe
English FrameNet. We propose a general framework
forsemanticprojectionthatcanincorporatedifferent
knowledge sources. To our knowledge, the frame-
work and its application to semantic role projection
are novel.
3 Creation of a Gold Standard Corpus
SampleSelection. To evaluate the output of our
projection algorithms, we created a gold standard
corpus of English-German sentence pairs with man-
ual FrameNet frame and role annotations. The sen-
tences were sampled from Europarl (Koehn, 2002),
a corpus of professionally translated proceedings of
the European Parliament. Europarl is available in
11 languages with up to 20 million words per lan-
guage aligned at the document and sentence level.
Recall that frame projection is only meaningful if
the same frame is appropriate for both sentences in
a projection pair. This constrains sample selection
for two reasons: first, FrameNet is as yet incom-
plete with respect to its coverage. So, a randomly
selected sentence pair may evoke novel frames or
novel senses of already existing frames (e.g., the
“greeting” sense of hail which is currently not listed
in FrameNet). Second, due to translational variance,
there is no a priori guarantee that words which are
mutual translations evoke the same frame. For ex-
ample, the English verb finish is often translated
in German by the adverb abschließend, which ar-
guably cannot have a role set identical to finish. Re-
lying solely on the English FrameNet database for
sampling would yield many sentence pairs which
are either inappropriate for the present study (be-
cause they do not evoke the same frames) or simply
problematic for annotation since they are outside the
2See http://www.keenage.com/zhiwang/e_zhiwang.
html.
860
present coverage of the database.
For the above reasons, our sample selection pro-
cedure was informed by two existing resources,
the English FrameNet and SALSA, a FrameNet-
compatible database for German currently under de-
velopment (Erk et al., 2003). We first used the pub-
licly available GIZA++ (Och and Ney, 2003) soft-
ware to induce English-German word alignments.
Next, we gathered all German-English sentences
in the corpus that had at least one pair of aligned
words (we,wg), which were listed in FrameNet and
SALSA, respectively, and had at least one frame
in common. These sentences exemplify 83 frame
types, 696 lemma pairs, and 265 unique English and
178 unique German lemmas. Sentence pairs were
grouped into three bands according to their frame
frequency (High, Medium, Low). We randomly se-
lected 380 pairs from each band. The total sample
consisted of ,140 sentence pairs.
This procedure produces a realistic corpus sample
for the role projection task; similar samples can be
drawn for new language pairs using either existing
bilingual dictionaries (Fung and Chen, 2004) or au-
tomaticallyconstructedsemanticlexicons(Padóand
Lapata, 2005).
Annotation. Two annotators, with native-level
proficiency in German and English, manually la-
belledtheparallelcorpuswithsemanticinformation.
Their task was to identify the frame for a given pred-
icate in a sentence, and assign the corresponding
roles. They were provided with detailed guidelines
that explained the task using multiple examples.
During annotation, they had access to parsed ver-
sions of the sentences in question (see Section 5 for
details), and to the English FrameNet and SALSA.
The annotation proceeded in three phases: a train-
ing phase (40 sentences), a calibration phase (100
sentences), and a production mode phase (1000 sen-
tences). In the calibration phase, sentences were
doubly annotated to assess inter-annotator agree-
ment. In production mode, sentences were split into
two distinct sets, each of which was annotated by a
single coder. We ensured that no annotator saw both
parts of any sentence pair to guarantee independent
annotation of the bilingual data. Each coder anno-
tated approximately the same amount of data in En-
glish and German.
Table 2 shows the results of our inter-annotator
agreement study. In addition to the widely used
Kappa statistic, we computed a number of different
agreement measures: the ratio of frames common
Measure English German All
Frame Match 0.90 0.87 0.88
Role Match 0.95 0.95 0.95
Span Match 0.85 0.83 0.84
Kappa 0.86 0.90 0.87
Table 2: Monolingual inter-annotation agreement on
the calibration set
Measure Precision Recall F-score
Frame Match 0.72 0.72 0.72
Role Match 0.91 0.92 0.91
Table 3: Cross-lingual semantic parallelism between
English and German
between two sentences (Frame Match), the ratio of
common roles (Role Match), and the ratio of roles
with identical spans (Span Match). As can be seen,
annotators tend to agree in frame assignment; dis-
agreements are mainly due to fuzzy distinctions be-
tween frames (e.g., between AWARENESS and CER-
TAINTY). As can be seen from Table 2, annotators
agree in what roles to assign (Role Match is 0.95 for
both English and German); agreeing on their exact
spans is a harder problem.
Semantic Parallelism. Since we obtained par-
allel FrameNet annotations for English and German,
we were able to investigate the degree of semantic
parallelismbetweenthetwolanguages.Morespecif-
ically, we treated the German annotation as gold
standardagainst whichwe comparedthe Englishan-
notations. To facilitate comparisons with the output
of our automatic projection methods (see Section 4),
we measured parallelism using precision and recall.
Frames and frame roles were counted as matching if
theywereannotatedinasentence,regardlessoftheir
spans. The results are shown in Table 3.
The cross-lingual data exhibit more than twice the
amount of frame differences than monolingual data
(compare Tables 2 and 3). This indicates that frame
disambiguation methods must be employed in auto-
matic role projection to ensure that two aligned to-
kens evoke the same frame. However, frame disam-
biguation is outside the scope of the present paper.
On the positive side, role agreement is rela-
tively high (0.91 F-score). This indicates that in
cases where frames match across languages, seman-
tic roles could be accurately transferred (provided
that these languages diverge little in their argument
structure). This observation offers support for the
861
projection approach put forward in this paper. Note,
however, that a practical projection system could at-
tain this level of performance only if it could employ
an oracle to recover annotators’ decisions about the
span of roles. We can obtain a more realistic upper
bound for an automatic system from the monolin-
gualRoleSpanagreementfigure(F-score0.84).The
latter represents a ceiling for the agreement we can
expect from sentences annotated by different anno-
tators.
4 Projection of Semantic Information
In this section, we formalise the semantic projection
task and give the details of our modelling approach.
All models discussed here project semantic annota-
tions from a source language to a target language.
As explained earlier, our present study is only con-
cerned with the projection of roles between match-
ing frames.
4.1 Problem Formulation
We assume that we are provided with source and tar-
get sentences represented as sets of entities es ∈ Es
and et ∈ Et. These entities can be words, con-
stituents, phrases, or other groupings. In addition,
we are given the semantic annotation of the source
sentences from which we can directly read off the
source semantic role assignment as :R→2Es, where
R is the set of semantic roles. The goal of the pro-
jection is to specify the target semantic role assign-
ments at : R→2Et, which are unknown.3
Clearly, effecting the projection requires estab-
lishing some form of match between the source and
target entities. We therefore formalise projection as
a function which maps the source role assignment
and a set of matches M ⊆ Es×Et onto a new target
role assignment:
proj : (As×M)→(R→2Et) (1)
By way of currying, we can state the new target role
assignment as a function which directly computes a
set of target entities, given the source role assign-
ment, a set of entity matches, and a role:
at : (As×M×R)→2Et (2)
According to this formalisation, the crucial part of
semantic projection is to identify a correct and ex-
haustive set of entity matches. Obviously, this raises
3Without loss of generality, we limit ourselves to one frame
per sentence, as does FrameNet.
r ∈R Semantic role
ts ∈Ts,tt ∈Tt Source, target tokens
al ∈Al : Ts →2Tt Word alignment
as ∈As : R→2Ts Source role assignment
at : (As×Al×R)→2Tt Projected target role as-
signment
Table 4: Notation and signature summary for word-
based projection
the question of what linguistic information is appro-
priateforestablishingM.Unfortunately,anyattempt
to compute a match based on categorical data de-
rived from linguistic analyses (e.g., parts of speech,
phrase types or grammatical relations), needs to em-
pirically derive cross-linguistic similarities between
categories, a task which must be repeated for every
new language pair, and requires additional data.
Rather than postulating an ad hoc similarity func-
tion, we use word alignments to derive informa-
tion about semantic roles in the target language. Our
first model family (Section 4.2) relies exclusively
on this knowledge source. Although potentially use-
ful as a proxy for semantic equivalence, automati-
cally induced alignments are often noisy, thus lead-
ing to errors in annotation projection (Yarowsky et
al., 2001). For example, function words commonly
divergeacrosslanguagesandaresystematicallymis-
aligned; furthermore, alignments are restricted to
single words rather than word combinations. This
observation motivates a second model family with a
bias towards linguistically meaningful entities (Sec-
tion 4.3). Such entities can be constituents derived
from the output of a parser or non-recursive syntac-
tic structures (i.e., chunks).
In this paper we compare simple word align-
mentmodelsagainstmoreresourceintensivemodels
that utilise constituent-based information and exam-
ine whether syntactic knowledge significantly con-
tributes to semantic projection.
4.2 Word-based Projection Model
The first model family uses source and target word
tokens as entities for projection. In this framework,
projection models can be defined by deriving the set
of matches M directly from word alignments. The
resulting signatures are shown in Table 4.
Our first projection model assigns to each role
r with source span s(r) the set of all target tokens
which are aligned to a token in the source span:
aw(as,al,r)= [
ts∈as(r)
al(ts) (3)
862
John and Mary left
Johann und Maria gingen
Departing
Departing
Figure1:Wordalignment-basedsemanticprojection
of Role THEME (shadowed), Frame DEPARTING
The main shortcoming of this model is that it cannot
capture an important linguistic property of semantic
roles, namely that they almost always cover contigu-
ous stretches of text. We can repair non-contiguous
projections by applying a “convex complementing”
heuristic to the output of (3), which fills all holes
in a sequence of tokens, without explicit recourse to
syntacticinformation.Wedefinetheconvexcomple-
menting heuristic as:
acw(as,al,r)={tt | min(i(at1))≤i(tt)
≤max(i(at1))} (4)
where i returns the index of a token t.
The two models just described are illustrated in
Figure 1. The frame DEPARTING is introduced by
left and gingen inEnglishandGerman,respectively.
For simplicity, we only show the edges correspond-
ing to the THEME role. In English, the THEME is re-
alised by the words John and Mary. The dotted lines
show the available word alignments. The projection
of the THEME role according to (3) consists only
of the tokens {Johann, Maria} (shown by the plain
black lines); the convex complementing heuristic in
model (4) adds the token und, resulting in the (cor-
rect) convex set {Johann, und, Maria}.
4.3 Constituent-based Projection Model
Our second model family attempts to make up for
errors in the word alignment by projecting from and
to constituents. In this study, our constituents are ob-
tained from full parse trees (see Section 5 for de-
tails).Modelswhichusenon-recursivestructuresare
also possible; however, we leave this to future work.
The main difference from word-based projection
modelsistheintroductionofconstituentinformation
as an intermediate level; we thus construct a con-
stituent alignment for which only a subset of word
alignments has to be accurate. The appropriate sig-
naturesandnotationforconstituent-basedprojection
are summarised in Table 5.
In order to keep the model as flexible as pos-
sible, and to explore the influence of different de-
sign decisions, we model constituent-based projec-
tion as two independently parameterisable subtasks:
first we compute a real-valued similarity function
betweensourceandtargetconstituents;then,weem-
ploy the similarity function to align relevant con-
stituents and project the role information.
Similarity functions. In principle, any function
which matches the signature in Table 5 could be
used. In practice, the use of linguistic knowledge
runs into the problem of defining similarity between
category-based representations discussed above. For
thisreason,welimitourselvestotwosimplesimilar-
ity functions based on word overlap: Given source
and target constituents cs and ct, we define the word
overlap ow of cs with ct as the proportion of tokens
within ct aligned to tokens within cs. Let yield(c)
denote the set of tokens in the yield of a constituent
c, then:
ow(cs,ct)= |(
S
ts∈yield(cs)al(ts))∩yield(ct)|
|yield(ct)| (5)
Since the asymmetry of this overlap measure leads
to high overlap scores for small target constituents,
we define word overlap similarity, as the product of
two constituents’ mutual overlap:
sim(cs,ct)= o(cs,ct)·o(ct,cs) (6)
Simple word-based overlap has one undesired char-
acteristic: larger constituents tend to be less similar
because of missing alignments (e.g., between func-
tion words). Since content words are arguably more
important for the role projection task, we define a
second overlap measure, content word overlap owc,
which takes only nouns, verbs and adjectives into
account. Let yieldc(c) denote the set of tokens in the
yield of c that are content words, then:
owc(cs,ct)= |(
S
ts∈yieldc(cs)al(ts))∩yieldc(ct)|
|yieldc(ct)| (7)
Constituent alignment. Considerable latitude
is available in interpreting a similarity function to
derive a constituent alignment. Due to space limita-
tions, we demonstrate two basic models.
Our first forward constituent alignment model
(afc), aligns source constituents that form the span
863
r ∈R Semantic role
cs ∈Cs,ct ∈Ct Source and target con-
stituents
yield :C →T Yield of a constituent
yieldc :C →T Content word yield of a
constituent
al ∈Al : Ts →2Tt Word alignment
as ∈As : R→2Cs Source role assignment
sim :Cs×Ct →R+ Constituent similarity
at : As×Sim×R→2Ct Projected target role as-
signment
Table 5: Notation and signature summary for
constituent-based projection
of a role to a single target constituent. We compute
the similarity of a target constituent ct to a set of
source constituents cs ∈ as(r) by taking the product
similarityforeachsourceandtargetconstituentpair:
afc(as,sim,r)= argmax
ct∈Ct ∏cs∈as(r)
sim(cs,ct) (8)
This projection model forces the target role assign-
ment to be a function, i.e., it makes the somewhat
simplifying assumption that each role corresponds
to a single target constituent.
Our second backward constituent alignment
model (abc) proceeds in the opposite direction: it it-
erates over target constituents and attempts to de-
termine their most similar source constituent for
each ct. If the aligned source constituent is labelled
with a role, it is projected onto ct:
abc(as,sim,r)={ct|(argmax
cs∈Cs
sim(cs,ct))∈as(r)}
(9)
In general, abc allows for more flexible role pro-
jection: it will sometimes decide not to project a
role at all (if the source constituents are dissimilar
to any target constituents), or it can assign a role
to more than one target constituent; however, this
means that there is less control over what is pro-
jected, and wrong alignments can lead to wrong re-
sults more easily.
Finally, if no word alignments are found for
complete source or target constituents, the maxi-
mal similarity rating in abc or abf will be zero.
This is often the case for semantically weak single-
word constituents such as demonstrative pronouns
(e.g., [That] is right./ [Das] ist richtig.). When we
observe this phenomenon, we heuristically skip un-
aligned constituents (zero skipping).
Figure 2 contrasts the two constituent-based pro-
jection models using the frame QUESTIONING as
He asked all of them
Er fragte alle von ihnen
NP
3
PP
2
NP
1
NP
4
PP
5
NP
6
Questioning
Questioning
NP1 PP2 NP3
NP4 0.33 0.5 1
PP5 0.67 1 0.5
NP6 0.33 0 0
Figure 2: Constituent-based semantic projection of
role ADDRESSEE (shadowed), frame QUESTION-
ING. Below: Constituent similarity matrix.
an example. Again, we only show one role, AD-
DRESSEE, indicated by the shadowed box in Fig-
ure 2. Note that the object NP in German was mis-
parsed as an NP and a PP, a relatively frequent er-
ror. The difference between the two decision proce-
dures can be explained straightforwardly by look-
ing at the table below the graph, which shows the
similarity matrix for the constituents according to
equation (6). In this table, the source constituents
(indices 1–3) correspond to columns, and the tar-
get constituents (indices 4–6) to rows. The align-
ment model in (8) iterates over labelled source con-
stituents (here only NP1) and chooses the row with
the highest value as the target constituent for a can-
didate role. In our case, this is the PP5 (cell in bold-
face). In contrast, model (9) iterates over all target
constituents (i.e., rows) and checks if the most sim-
ilar source constituent bears a role label. Since NP1
is the most similar constituent for NP6 (underlined
cell), (9) assigns the QUESTIONING role to NP6.
5 Experiments
Evaluation Framework. We implemented the
models described in the previous section and used
them to project semantic information from En-
glish onto German. For the constituent-based mod-
els, constituent information was obtained from the
output of Collins’ parser (1997) for English and
Dubey’s parser (2004) for German. Words were
864
Model Precision Recall F-score
w 0.41 0.40 0.41
cw 0.46 0.45 0.46
Upper bound 0.85 0.84 0.84
Table 6: Results for word-based projection models
aligned using the default setting4 of GIZA++ (Och
and Ney, 2003), a publicly available implementa-
tion of the IBM models and HMM word alignment
models. We evaluated the projected roles against the
“gold standard” roles obtained from the manual an-
notation (see Section 3). We also compared our re-
sults to the upper bound given by the inter-annotator
agreement on the calibration data set.
Results. Table 6 shows our results for the word-
based projection models. The simplest word-based
model (aw), obtains an F-score of 0.41. This is a
good result considering that the model does not ex-
ploit any linguistic information (e.g., parts of speech
or syntactic structure). It also supports our hypothe-
sis that word alignments are useful for the role pro-
jection task. The convex complementing heuristic
(acw) delivers an F-score increase of five points over
the “words only” model, simply by making up for
holes in the word alignment.
We evaluated eight instantiations of the
constituent-based projection models; the results are
shown in Table 7. The best model (in boldface) uses
forward constituent alignment, content word-based
overlap similarity, and zero skipping. We observe
that backward constituent alignment-based models
(1–4) perform similarly to word-based projection
models (the F-score ranges between 0.40 and 0.45).
However, they obtain considerably higher precision
(albeit lower recall) than the word-based models.
This may be an advantage if the projected data
is destined for training target-language semantic
parsers. This precision/recall pattern appears to be
a direct result of abc, which only projects a role
from cs to ct if cs “wins” against all other source
constituents, thus resulting in reliable, but overly
cautious projections, which cannot not be further
improved by zero skipping.
The forward constituent alignment models (5–8)
show consistently higher performance than word-
based models and models 1–4, indicating that the
stronger assumptions made by forward alignment
4The training scheme involved five iterations of Model 1,
five iterations of the HMM model, five iterations of Model 3,
and five iterations of Model 4.
Model al o 0-skip Precision Recall F-score
1 bc w no 0.70 0.33 0.45
2 bc w yes 0.70 0.33 0.45
3 bc wc no 0.65 0.32 0.42
4 bc wc yes 0.65 0.32 0.42
5 fc w no 0.61 0.60 0.60
6 fc w yes 0.66 0.60 0.63
7 fc wc no 0.62 0.60 0.61
8 fc wc yes 0.70 0.60 0.65
Upper bound 0.85 0.84 0.84
Table 7: Results for constituent-based projection
models (al: constituent alignment model; o: overlap
measure; 0-skip: zero skipping)
are justified in the data. In addition, we also find
that we can increase precision by concentrating on
reliable alignments. This is achieved by using the
zero skipping heuristic (compare the odd vs. even-
numbered models in Table 7) and by computing
overlap on content words (compare Models 6 vs. 8,
and 5 vs. 7).
We used the χ2 test to examine whether the dif-
ferences observed between the two classes of mod-
els are statistically significant. The best constituent-
based model significantly outperforms the best
word-based model both in terms of precision
(χ2 = 114.47, p < 0.001) and recall (χ2 = 400.40,
p < 0.001). Both projection models perform signifi-
cantly worse than humans (p < 0.001).
Discussion. Our results confirm that constituent
information is important for the semantic projection
task. Our best model adopts a conservative strat-
egywhichenforcesaone-to-onecorrespondencebe-
tween roles and target constituents. This strategy
leads to high precision, however recall lags behind
(see Model 8 in Table 7). Manual inspection of the
projection output revealed that an important source
of missing roles are word alignments gaps. Such
gaps are not only due to noisy alignments, but also
reflect genuine structural differences between trans-
lated sentences. Consider the following (simplified)
example for the STATEMENT frame (introduced by
say) and its semantic role STATEMENT (introduced
by we):
(10) We
Wir
claim
behaupten
X
X
and
und
we
—
say
sagen
Y
Y
The word alignment correctly aligns the German
pronoun wir with the first English we and leaves
865
the second occurrence unaligned. Since there is no
corresponding German word for the second we, pro-
jection of the SPEAKER role fails. In future work,
this problem could be handled with explicit identi-
fication of empty categories (see Dienes and Dubey,
2003).
6 Conclusions
In this paper, we argue that parallel corpora show
promise in relieving the lexical acquisition bottle-
neck for low density languages. We proposed se-
mantic projection as a means of obtaining FrameNet
annotations automatically without additional human
effort. We examined semantic parallelism, a prereq-
uisite for accurate projection, and showed that se-
mantic roles can be successfully projected for pred-
icate pairs with matching frame assignments. Sim-
ilarly to previous work (Hwa et al., 2002), we find
that some mileage can be gained by assuming di-
rect correspondence between two languages. How-
ever, linguistic knowledge is key in obtaining mean-
ingful projections. Our experiments show that the
use of constituent information yields substantial im-
provements over relying on word alignment alone.
Nevertheless, the word-based models offer a good
starting point for low-density languages for which
parsers are not available. Their output could be fur-
ther post-processed manually or automatically using
bootstrapping techniques (Riloff and Jones, 1999).
We have presented a general, flexible framework
for semantic projection which can be easily applied
to other languages. An important direction for fu-
ture work lies in the assessment of more shallow
syntacticinformation(i.e.,chunks)whichcanbeob-
tained more easily for new languages, and generally
in the integration of more linguistic knowledge to
guide projection. Finally, we will incorporate into
our projection approach automatic semantic role an-
notations for the source language and investigate the
potentialoftheprojectedannotationsfortrainingse-
mantic parsers for the target language.
Acknowledgements. The authors acknowledge
the support of DFG (Padó; grant PI-154/9-2) and
EPSRC (Lapata; grant GR/T04540/01). Thanks to
B. Kouchnir and P. Kreischer for their annotation.
References
H. C. Boas. 2002. Bilingual framenet dictionaries for
machine translation. In Proceedings of LREC 2002,
1364–1371, Las Palmas, Canary Islands.
L. Burnard, 1995. The Users Reference Guide for the
British National Corpus. British National Corpus
Consortium, Oxford University Computing Service,
1995.
X. Carreras, L. Màrquez, eds. 2005. Proceedings of the
CoNLL shared task: Semantic role labelling, 2005.
M. Collins. 1997. Three generative, lexicalised models
for statistical parsing. In Proceedings of ACL/EACL
1997, 16–23, Madrid, Spain.
P. Dienes, A. Dubey. 2003. Antecedent recovery: Exper-
iments with a trace tagger. In Proceedings of EMNLP
2003, 33–40, Sapporo, Japan.
A. Dubey. 2004. Statistical parsing for German: Mod-
elling syntactic properties and annotation differences.
Ph.D. thesis, Saarland University.
K. Erk, A. Kowalski, S. Padó, M. Pinkal. 2003. Towards
a resource for lexical semantics: A large German cor-
pus with extensive semantic annotation. In Proceed-
ings of ACL 2003, 537–544, Sapporo, Japan.
C. J. Fillmore, C. R. Johnson, M. R. Petruck. 2003.
Background to FrameNet. International Journal of
Lexicography, 16:235–250.
P. Fung, B. Chen. 2004. BiFrameNet: Bilingual frame
semantics resources construction by cross-lingual in-
duction. In Proceedings of COLING 2004, 931–935,
Geneva, Switzerland.
D. Gildea, D. Jurafsky. 2002. Automatic labeling of se-
mantic roles. Computational Linguistics, 28(3):245–
288.
R. Hwa, P. Resnik, A. Weinberg, O. Kolak. 2002. Eval-
uation translational correspondance using annotation
projection. In Proceedings of ACL 2002, 392–399,
Philadelphia, PA.
P. Koehn. 2002. Europarl: A multilingual corpus for
evaluation of machine translation. Draft.
S. Narayanan, S. Harabagiu. 2004. Question answering
based on semantic structures. In Proceedings of COL-
ING 2004, 693–701, Geneva, Switzerland.
F. J. Och, H. Ney. 2003. A systematic comparison of
various statistical alignment models. Computational
Linguistics, 29(1):19–52.
S. Padó, M. Lapata. 2005. Cross-lingual bootstrapping
for semantic lexicons. In Proceedings of AAAI 2005,
Pittsburgh, PA.
E. Riloff, R. Jones. 1999. Learning dictionaries for in-
formation extraction by multi-level bootstrapping. In
Proceedings of AAAI 1999, Orlando, FL.
D. A. Smith, N. A. Smith. 2004. Bilingual parsing with
factored estimation: Using English to parse Korean.
In Proceedings of EMNLP 2004, 49–56, Barcelona,
Spain.
M. Surdeanu, S. Harabagiu, J. Williams, P. Aarseth.
2003. Using predicate-argument structures for infor-
mation extraction. In Proceedings of ACL 2003, 8–15,
Sapporo, Japan.
D. Yarowsky, G. Ngai, R. Wicentowski. 2001. Inducing
multilingual text analysis tools via robust projection
across aligned corpora. In Proceedings of HLT 2001,
161–168.
866
