Corpus-based Induction of an LFG Syntax-Semantics Interface
for Frame Semantic Processing
Anette Frank
Language Technology Lab
DFKI GmbH
Stuhlsatzenhausweg 3
66123 Saarbr¨ucken, Germany
Anette.Frank@dfki.de
Jiˇr´ı Semeck´y
Institute of Formal and Applied Linguistics
Charles University
Malostransk´e n´amˇest´ı 25
11800 Prague, Czech Republic
semecky@ufal.ms.mff.cuni.cz
Abstract
We present a method for corpus-based induc-
tion of an LFG syntax-semantics interface for
frame semantic processing in a computational
LFG parsing architecture. We show how to
model frame semantic annotations in an LFG
projection architecture, including special phe-
nomena that involve non-isomorphic mappings
between levels. Frame semantic annotations
are ported from a manually annotated corpus
to a “parallel” LFG corpus. We extract func-
tional descriptions from the frame-annotated
LFG corpus, to derive general frame assign-
ment rules that can be applied to new sentences.
We evaluate the results by applying the induced
frame assignment rules to LFG parser output.1
1 Introduction
There is a growing insight that high-quality NLP
applications for information access are in need of
deeper, in particular, semantic analysis. A bottle-
neck for semantic processing is the lack of large
domain-independent lexical semantic resources.
There are now efforts for the creation of large lex-
ical semantic resources that provide information on
predicate-argument structure. FrameNet (Baker et
al., 1998), building on Fillmore’s theory of frame
semantics, provides definitions of frames and their
semantic roles, a lexical database and a manually
annotated corpus of example sentences. A strictly
corpus-based approach is carried out with ‘Prop-
Bank’ – a manual predicate-argument annotation on
top of the Penn treebank (Kingsbury et al., 2002).
First approaches for learning stochastic models
for semantic role assignment from annotated cor-
pora have emerged with Gildea and Jurafsky (2002)
and Fleischman et al. (2003). While current com-
petitions explore the potential of shallow parsing
1The research reported here was conducted in a coopera-
tion project of the German Research Center for Artificial Intel-
ligence, DFKI Saarbr¨ucken with the Computational Linguistics
Department of the University of the Saarland at Saarbr¨ucken.
for role labelling, Gildea and Palmer (2002) empha-
sise the role of deeper syntactic analysis for seman-
tic role labelling. We follow this line and explore
the potential of deep syntactic analysis for role la-
belling, choosing Lexical Functional Grammar as
underlying syntactic framework. We aim at a com-
putational interface for frame semantics processing
that can be used to (semi-)automatically extend the
size of current training corpora for learning stochas-
tic models for role labelling, and – ultimately – as a
basis for automatic frame assignment in NLP tasks,
based on the acquired stochastic models.
We discuss advantages of semantic role assign-
ment on the basis of functional syntactic analy-
ses as provided by LFG parsing, and present an
LFG syntax-semantics interface for frame seman-
tics, building on a first study in Frank and Erk
(2004). In the present paper we focus on the corpus-
based induction of a computational LFG interface
for frame semantics from a semantically annotated
corpus. We describe the methods used to derive an
LFG-based frame semantic lexicon, and discuss the
treatment of special (since non-isomorphic) map-
pings in the syntax-semantics interface. Finally, we
apply the acquired frame assignment rules in a com-
putational LFG parsing architecture.
The paper is structured as follows. Section 2
gives some background on the semantically anno-
tated corpus we are using, and the LFG resources
that provide the basis for automatic frame assign-
ment. In Section 3 we discuss advantages of deeper
syntactic analysis for a principle-based syntax-
semantics interface for semantic role labelling. We
present an LFG interface for frame semantics which
we realise in a modular description-by-analysis ar-
chitecture. Section 4 describes the method we apply
to derive frame assignment rules from corpus anno-
tations: we port the frame annotations to a “paral-
lel” LFG corpus and induce general LFG frame as-
signment rules, by extracting syntactic descriptions
for the frame constituting elements. We use LFG’s
functional representations to distinguish local and
non-local role assignments. The derived frame as-
SPD requests that coalition talk about reform
Figure 1: SALSA/TIGER frame annotation
signment rules are reapplied to the original syntac-
tic LFG corpus to control the results. In Section 5
we apply and evaluate the frame projection rules in
an LFG parsing architecture. In Section 6 we sum-
marise our results and discuss future directions.
2 Corpus and Grammar Resources
Frame Semantic Corpus Annotations The basis
for our work is a corpus of manual frame annota-
tions, the SALSA/TIGER corpus (Erk et al., 2003).2
The annotation follows the FrameNet definitions of
frames and their semantic roles.3 Underlying this
corpus is a syntactically annotated corpus of Ger-
man newspaper text, the TIGER treebank (Brants
et al., 2002). TIGER syntactic annotations consist
of relatively flat constituent graph representations,
with edge labels that indicate functional informa-
tion, such as head (HD), subject (SB), cf. Figure 1.
The SALSA frame annotations are flat graphs
connected to syntactic constituents. Figure 1 dis-
plays frame annotations where the REQUEST frame
is triggered by the (discontinuous) frame evoking el-
ement (FEE) fordert ... auf (requests). The seman-
tic roles (or frame elements, FEs) are represented as
labelled edges that point to syntactic constituents in
the TIGER syntactic annotation: the noun SPD for
the SPEAKER, Koalition for the ADDRESSEE, and
the PP zu Gespr¨ach ¨uber Reform for the MESSAGE.
LFG Grammar Resources We aim at a computa-
tional syntax-semantics interface for frame seman-
tics, to be used for (semi-)automatic corpus annota-
tion for training of stochastic role assignment mod-
els, and ultimately as a basis for automatic frame as-
signment. As a grammar resource we chose a wide-
coverage computational LFG grammar for German
(developed at IMS, University of Stuttgart). This
German LFG grammar has already been used for
semi-automatic syntactic annotation of the TIGER
corpus, with reported coverage of 50%, and 70%
2http://www.coli.uni-sb.de/lexicon
3See http://www.icsi.berkeley.edu/~framenet
precision (Brants et al., 2002). The grammar runs
on the XLE grammar processing platform, which
provides stochastic training and online disambigua-
tion packages. Currently, the grammar is further ex-
tended, and will be enhanced with stochastic disam-
biguation, along the lines of (Riezler et al., 2002).
LFG Corpus Resource Next to the German LFG
grammar, (Forst, 2003) has derived a ‘parallel’ LFG
f-structure corpus from the TIGER treebank, by ap-
plying methods for treebank conversion. We make
use of the parallel treebank to induce LFG frame an-
notation rules from the SALSA/TIGER annotations.
3 LFG for Frame Semantics
Lexical Functional Grammar (Bresnan, 2001)
assumes multiple levels of representation. Most
prominent are the syntactic representations of
c(onstituent)- and f(unctional)-structure. The corre-
spondence between c- and f-structure is defined by
functional annotations of rules and lexical entries.
This architecture can be extended to semantics pro-
jection (Halvorsen and Kaplan, 1995).
LFG f-structure representations abstract away
from surface-syntactic properties, by localising ar-
guments in mid- and long-distance constructions,
and therefore allow for uniform reference to syntac-
tic dependents in diverse syntactic configurations.
This is important for the task of frame annotation,
as it abstracts away from aspects of syntax that are
irrelevant to frame (element) assignment.
In (1), e.g., the SELLER role can be uniformly as-
sociated with the local SUBJect of sell, even though
it is realized as (a.) a relative pronoun of come that
controls the SUBJect of sell, (b.) an implicit second
person SUBJ, (c.) a non-overt SUBJ controlled by
the OBLique object of hard, and (d.) a SUBJ (we) in
VP coordination.
(1) a. The woman who had come in to sell flowers
overheard their conversation.
b. Don’t sell the factory to another company.
c. It would be hard for him to sell newmont shares.
d. .. we decided to sink some of our capital, buy a
car, and sell it again before leaving.
LFG Semantics Projection for Frames As in a
standard LFG projection architecture, we define a
frame semantics projection  f from the level of f-
structure. We define the  f –projection to introduce
elementary frame structures, with attributes FRAME,
FEE (frame-evoking element), and frame-specific
role attributes. Figure 2 displays the  f–projection
for the sentence in Figure 1.4
4The MESSAGE role is coindexed with a lower frame, the
frame projection introduced by the noun Gespr¨ach.
2
66
66
66
66
4
PRED ‘AUFFORDERNh(SUBJ)(OBJ)(OBL)i’
SUBJ
 
PRED ‘SPD’ 
OBJ
 
PRED ‘KOALITION’ 
OBL
2
66
4
PRED ‘ZUh(OBJ)i’
OBJ
2
4
PRED ‘GESPR ¨ACH’
ADJ
 
PRED ‘ ¨UBERh(OBJ)i’
OBJ
 
PRED ‘REFORM’ 
 
3
5
3
77
5
3
77
77
77
77
5
 f
2
66
64
FRAME REQUEST
FEE AUFFORDERN
SPEAKER [ ]
ADDRESSEE [ ]
MESSAGE [ ]
3
77
75
2
64
FRAME CONVERSATION
FEE GESPR ¨ACH
INTERLOCUTOR 1 [ ]
TOPIC [ ]
3
75
Figure 2: LFG projection architecture for Frame Semantics
auffordern V,
("PRED)=‘AUFFORDERNh("SUBJ)("OBJ)("OBL)i’
...( 
f (") FRAME) = REQUEST
( f (") FEE) = (" PRED FN)
( f (") SPEAKER) =  f (" SUBJ)
( f (") ADDRESSEE) =  f (" OBJ)
( f (") MESSAGE) =  f(" OBL OBJ)
Figure 3: Frame projection by co-description
Figure 3 states the lexical entry for the REQUEST
frame.  f is a function of f-structure. The verb
auffordern introduces a node  f (") in the semantics
projection of ", its local f-structure, and defines its
attributes FRAME and FEE. The frame elements are
defined as  f–projections of the verb’s SUBJ, OBJ
and OBL OBJ functions. E.g. the SPEAKER role,
referred to as ( f(") SPEAKER), the SPEAKER at-
tribute in the projection  f (") of ", is defined as
identical to the  f–projection of the verb’s SUBJ,
 f (" SUBJ).
Frames in Context The projection of frames in
context can yield connected frame structures. In
Figure 2, Gespr¨ach fills the MESSAGE role of
REQUEST, but it also introduces a frame of its
own, CONVERSATION. Thus, the CONVERSATION
frame, by coindexation, is an instantiation, in con-
text, of the MESSAGE of REQUEST.
Co-description vs. description-by-analysis In
the co-description architecture we just presented
f- and s-structure equations jointly determine the
valid analyses of a sentence. Analyses that do
not satisfy both f- and s-structure constraints are
inconsistent and ruled out.
An alternative to co-description is semantics
construction via description-by-analysis (DBA)
(Halvorsen and Kaplan, 1995). Here, semantics
is built on top of fully resolved f-structures. F-
structures that are consistent with semantic mapping
constraints are semantically enriched – remaining
analyses are left untouched.
Both models are equally powerful – yet while co-
pred(X,auffordern),
subj(X,A), obj(X,B), obl(X,C), obj(C,D)
==>
+’sf ::’(X,SemX), +frame(SemX,request),
+fee(X,auffordern),
+’sf ::’(A,SemA), +speaker(SemX,SemA),
+’sf ::’(B,SemB), +addressee(SemX,SemB),
+’sf ::’(D,SemD), +message(SemX,SemD).
Figure 4: Frame projection by DBA (via transfer)
description integrates the semantics projection into
the grammar and parsing process, DBA keeps it as a
separate module. Thus, with DBA, semantics does
not interfere with grammar design and can be de-
veloped separately. The DBA approach also facili-
tates the integration of external semantic knowledge
sources (such as word senses or named entity types).
DBA by transfer We realise the DBA approach
by way of a term-rewriting transfer system that is
part of the XLE grammar processing platform. The
system represents f-structures as sets of predicates
which take as arguments variables for f-structure
nodes or atomic values. Transfer is defined as a
sequence of ordered rules. If a rule applies to an
input set of predicates, it defines a new output set.
This output set is input to the next rule in the cas-
cade. A rule applies if all terms on its left-hand side
match some term in the input set. The terms on the
right hand side (prefixed ’+’) are added to the in-
put set. There are obligatory (==>) and optional
(?=>) rules. Optional rules introduce two output
sets: one results from application of the rule, the
other is equal to the input set.
Figure 4 displays a transfer rule that corresponds
to the co-description lexical entry of Figure 3. For
matched f-structure nodes (pred, subject, object,
oblique object) it defines a  f–projection (by pred-
icate ’s::f ’) with new s-structure nodes. For these,
we define the frame information (FRAME, FEE) and
the linking of semantic roles (e.g., the  f–projection
SemA of the SUBJ is defined as the SPEAKER role
of the head’s semantic projection SemX).
Frame FeeID Role(s) FeID(s)
Request 2 (from f2, 8g) Speaker 1Addressee 3
Message 501
Figure 5: Core frame information for Fig. 1
% projecting frame information for FEE
project fee(FeeID, Frame) ::
ti-id(X,FeeID), pred(X,Pred) ==>
+’s::’(X,S X), +frame(S X,Frame), +fee(S X,Pred).
% semantic projection for (each) FE of FEE
project fe of fee(FeeID, Frame, FeID, Role) ::
ti-id(X,FeeID), ’s::’(X,S X), frame(S X,Frame),
ti-id(Y,FeID), pred(Y,Pred) ==>
+’s::’(Y,S Y), +Role(S X,S Y), +rel(S Y,Pred).
Figure 6: SALSA-2-LFG-TIGER transfer
4 Corpus-based induction of an LFG
frame semantics interface
4.1 Porting SALSA annotations to LFG
A challenge for corpus-based induction of a syntax-
semantics interface for frame assignment is the
transposition of the corpus annotations from a given
syntactic annotation scheme to the target syntactic
framework. The basis for our work are annotations
of the SALSA/TIGER corpus (Erk et al., 2003), en-
coded in an XML annotation scheme that extends
the syntactic TIGER XML annotation scheme.
The TIGER treebank has been converted to a par-
allel LFG f-structure corpus (Forst, 2003). The
SALSA/TIGER and LFG-TIGER corpora could be
used to learn corresponding syntactic paths in the
respective structures. Thus, we could establish
the paths of frame constituting elements in the
SALSA/TIGER corpus, and port the annotations to
the corresponding path in the LFG-TIGER corpus.
However, we could apply a more precise method,
by exploiting the fact that the LFG-TIGER cor-
pus preserves the original TIGER constituent iden-
tifiers, as f-structure features TI-ID (see Fig. 7). We
use these ’anchors’ to port the SALSA annotations
to the parallel LFG-TIGER treebank. Thus, in a
first step we extend the latter to an LFG corpus with
frame semantics projection. From the extended cor-
pus we induce general LFG frame assignment rules.
This will be described in more detail in Section 4.2.
Porting annotations by transfer For each sen-
tence we extract the constituent identifiers of frame
constituting elements in the SALSA XML annota-
tions (cf. Figure 5). This information is coded into
transfer rules, where we refer to the corresponding
TI-ID features in the f-structure as anchors to project
the frame information for a given frame annotation
Figure 7: LFG-TIGER f-structure (w/ TI-ID)
Figure 8: Frame projection from f-str of Fig. 7
instance. The first transfer rule (template) in Figure
6 defines the semantic projection of the FEE, where
the correct f-structure location is referenced by the
feature TI-ID. Subsequent rules – one for each role
to be assigned – define the given semantic role as an
argument of the FEE’s semantic projection, again
using the TI-IDs of the FEE and FE as anchors.
We generate these frame projection rules for each
sentence in the SALSA/TIGER corpus, and apply
them to the corresponding f-structure in the LFG-
TIGER corpus. The result is an LFG corpus with
frame semantic anntations (cf. Figures 7 and 8).
The basic structure of frame-inducing rules in
Figure 6 was refined to account for special cases:
Coordination For frame elements that corre-
spond to coordinated constituents, as in Figure 9, we
project a semantic role that records a set of semantic
predicates (REL), one for each of the conjuncts.
Beamten, Politikern und Gesch¨aftsleuten wird
Schmiergeld bezahlt – Clerks, politicians and
businessmen are payed bribes
Figure 9: Frame with coordinated RECVR role
Underspecification The SALSA annotation
scheme allows for underspecification, to represent
unresolved word sense ambiguities or optionality
(Erk et al., 2003). In a given context, a predicate
may evoke alternative frames (i.e. word senses),
where it is impossible to decide between them.
E.g. the verb verlangen (demand) may convey
the meaning of REQUEST, but also COMMERCIAL
TRANSACTION. Such cases are annotated with
Figure 10: Underspecification as disjunction
4 Artikel gingen ¨uber die Ladentheke–4 items were sold
Figure 11: Multiword expressions
alternative frames, which are marked as elements of
an ‘underspecification group’. Underspecification
may also affect frame elements of a single frame.
A motion (Antrag), e.g., may be both MEDIUM and
SPEAKER of a REQUEST. Finally, a constituent
may or may not be interpreted as a frame element
of a given frame. It is then represented as a single
element of an underspecification group.
We model underspecification as disjunction,
which is encoded by optional transfer rules that cre-
ate alternative (disjunctive) contexts. Optionality is
modeled by a single optional rule. Figure 10 dis-
plays the result of underspecified frame element as-
signment in an f-structure chart (Maxwell and Ka-
plan, 1989). Context c1 displays the reading where
Antrag is assigned the SPEAKER role, alternatively,
in context c2, it is assigned the role MEDIUM.
In a symbolic account disjunction doesn’t cor-
rectly model the intended meaning of underspecifi-
cation. Yet, a stochastic model for frame assignment
should render the vagueness involved in underspec-
ification by close stochastic weights. Thus, under-
specified annotation instances provide alternative
frames in the training data and can be used for fine-
grained evaluation of frame assignment models.
Multiword Expressions The treatment of mul-
tiword expressions (idioms, support constructions)
requires special care. For idioms, the constituting
elements are annotated as multiple frame evoking
elements (cf. Figure 11 for ¨uber die Ladentheke
gehen – go over the counter (being sold)). We de-
fine semantic projections for the individual compo-
nents: the main frame evoking predicate (FEE) and
the idiom-constituting words, which are recorded in
a set-valued feature FEE-MWE. Otherwise, idioms
are treated like ordinary main verbs. E.g., like sell,
the expression triggers a COMMERCE SELL frame
with the appropriate semantic roles, here GOODS.
Asymmetric Embedding Another type of non-
isomorphism between syntactic and semantic rep-
Figure 12: Asymmetric embedding (example (2))
resentation occurs in cases where distinct syntactic
constituents are annotated as instantiation of a sin-
gle semantic role. In (2), PP and NP are annotated
as the MESSAGE of a STATEMENT, since they jointly
convey its content. Projecting distinct constituents
to a single semantic node can, however, lead to in-
consistencies, especially if both constituents inde-
pendently project semantic frames.
(2) Der Geschaeftsfuehrer gab [PP MO als Grund
fuer die Absage] [NP OBJ Terminnoete] an.
The director mentioned [time conflicts] [as a
reason for cancelling the appointment]
In the SALSA annotations asymmetric embedding
at the semantic level is the typical pattern for such
double-constituent annotations. I.e., for (2), we
assume a target frame structure where the MES-
SAGE of STATEMENT points to the PP – which it-
self projects a frame REASON with semantic roles
CAUSE for Terminn¨ote, and EFFECT for Absage.
Such multiple-constituent annotations arise in
cases where frame annotations are partial: since
corpus annotation proceeds frame-wise, in (2) the
REASON frame may not have been treated yet.
Moreover, annotators are in general not shown com-
plete(d) sentence annotations.
We account for these cases by a simulation of
functional uncertainty equations, which accommo-
date for a potential embedded frame within either
one of the otherwise re-entrant constituents. We ap-
ply a transfer rule set that embeds one (or the other)
of the two constituent projections as an embedded
role of an unknown frame, to be evoked by the re-
spective ’dominating’ node. We introduce an ’un-
known’ role ROLE for the embedded constituent,
which is to be interpreted as a functional uncertainty
path over variable semantic roles.
Figure 12 displays the alternative (hypothetical)
frame structures for (2), where the second one –
with FRAME instantiated to REASON and ROLE to
CAUSE – corresponds to the actual reading.
Overview of data Our current data set comprises
12436 frame annotations for 11934 sentences. Ta-
ble 1 gives frequency figures for the special phe-
coord usp mwe asym >dbl all
abs 467 395 1287 421 97 12436
in % 3.76 3.18 10.34 3.39 0.78 100
Table 1: Overview of special annotation types
nomena: coordination, underspecification, multi-
word expressions and double constituents (asym).5
We successfully ported 11713 frame annotations
to the LFG-TIGER corpus, turning it into an LFG
corpus with frame annotations.
4.2 Inducing frame projection rules
From the enriched corpus we extract lexical frame
assignment rules that – instead of node identifiers –
use f-structure descriptions to identify constituents
and map them to frame semantic roles. These rules
can then be applied to the f-structure output of free
LFG parsing, i.e. to novel sentences.
We designed an algorithm for extracting f-struc-
ture paths between pairs of f-structure nodes that
correspond to the s-structure of the frame evoking
element and one of its semantic roles, respectively.
Table 2 gives an example for the frame projection
in Figure 13. Starting from the absolute f-structure
path (f-path) for (the f-structure projecting to) the
FEE MITTEILEN we extract relative f-paths leading
to the roles MESSAGE and SPEAKER. The f-path for
the MESSAGE ("OBJ) is local to the f-structure that
projects to the FEE. For the SPEAKER we identify
two paths: one local, the other non-local. The local
f-path ("SUBJ) leads to the local SUBJ of mitteilen
in Figure 13. By co-indexation with the SUBJ of
versprechen we find an alternative non-local path,
which we render as an inside-out functional equa-
tion ((XCOMP") SUBJ).
Since f-structures are directed acyclic graphs, we
use graph accessibility to distinguish local from
non-local f-paths. In case of alternative local and
non-local paths, we choose the local one. From al-
ternative non-local paths, we chose the one(s) with
shortest inside-out subexpression.
Generating frame assignment rules We ex-
tracted f-path descriptions for frame assignment
from the enriched LFG-TIGER corpus. We com-
piled 9707 lexicalised frame assignment rules in the
format of Figure 4. The average number of distinct
assignment rules per FEE is 8.38. Abstracting over
the FEEs, we obtain 7317 FRAME-specific rules,
with an average of 41.34 distinct rules per frame.
Due to the surface-oriented TIGER annotation
format, the original annotations contain a high num-
ber of non-local frame element assignments that
5Role assignment to more than two constituents (>dbl) con-
stitute a rather disparate set of data we do not try to cover.
2
4
FRAME COMMUNICATION
FEE MITTEILEN
SPEAKER [ ]
MESSAGE [ ]
3
52
66
66
66
4
PRED VERSPRECHEN
SUBJ
 
PRED SPD
 
OBJ2
 
PRED W ¨AHLER
 
XCOMP
2
4
PRED MITTEILEN
SUBJ [ ]
OBJ
 
PRED BESCHLUSS
 
3
5
3
77
77
77
5
SPD verspricht W¨ahlern, Beschl¨usse mitzuteilen
SPD promises voters to report decisions
Figure 13: Local and non-local frame elements
absolute f-path relative f-path
FEE XCOMP PRED "
MSG XCOMP OBJ "OBJ local
SPKR SUBJ (XCOMP")SUBJ nonlocal
XCOMP SUBJ "SUBJ local
Table 2: Local and nonlocal path equations
are localised in LFG f-structures. The f-paths ex-
tracted from the enriched LFG corpus yield 12.82%
non-local (inside-out) vs. 87.18% local (outside-in)
frame element assignment rules.
As an alternative rule format, we split frame as-
signment into separate rules for projection of the
FEE and the individual FEs. This allows assignment
rules to apply in cases where the f-structure does not
satisfy the functional constraints for some FE. This
yields improved robustness, and accounts for syn-
tactic variability when applied to new data. For this
rule format, we obtain 960 FEE assignment rules,
and 8261 FEE-specific FE assignment rules. Ab-
stracting over the FEE, this reduces to 4804 rules.6
4.3 Reapplying frame assignment rules
We reapplied the induced frame assignment rules to
the original syntactic LFG-TIGER corpus, to con-
trol the results. The results are evaluated against the
frame-enriched LFG-TIGER corpus that was cre-
ated by explicit node anchoring (Sec. 4.1). We ap-
plied ’full frame rules’ that introduce FEE and all
FEs in a single rule, as well as separated FEE and
FE rules. We applied all rules for a given frame to
any sentences that had received the same frame in
the corpus. We obtained 93.98% recall with 25.95%
precision (full frame rules), and 94.98% recall with
45.52% precision (split rules), cf. Table 3.a. The
low precision is due to overgeneration of the more
general abstracted rules, which are not yet con-
trolled by statistical selection. We measured an am-
biguity of 8.46/7.83 frames per annotation instance.
6In the future we will experiment with assignment rules that
are not conditioned to FEEs, but to frame-specific syntactic de-
scriptions, to assign frames to ’unknown’ lexical items.
full frame rules FEE and FE rules
rec prec amb rec prec amb
(a) 93.98 25.95 8.46 94.98 45.52 7.83
(b) 52.21 6.93 13.35 76.41 18.32 9.00
Table 3: Evaluation of annotation results:
(a) on TIGER corpus, (b) on LFG parses
5 Applying frame assignment rules in an
LFG parsing architecture
We finally apply the frame assignment rules to orig-
inal LFG parses of the German LFG grammar. The
grammar produces f-structures that are compatible
with the LFG-TIGER corpus, thus the syntactic
constraints can match the parser’s f-structure output.
In contrast to the LFG-TIGER corpus, the grammar
delivers f-structures for alternative syntactic analy-
ses. We don’t expect frame projections for all syn-
tactic readings, but where they apply, they will cre-
ate ambiguity in the semantics projection.
We applied the rules to the parses of 6032 corpus
sentences. Compared to the LFG-TIGER corpus we
obtain lower recall and precision (Table 3.b) and a
higher ambiguity rate per sentence. Drop in preci-
sion and higher ambiguity are due to the higher am-
biguity in the syntactic input. Moreover, we now ap-
ply the complete rule set to any given sentence. The
rules can thus apply to new annotation instances,
and create more ambiguity. The drop in recall is
mainly due to overgenerations by automatic lem-
matisation and functional assignments to PPs in the
TIGER-LFG corpus, which are not matched by the
LFG parser output. These mismatches will be cor-
rected by refinements of the TIGER-LFG treebank.
6 Summary and Future Directions
We presented a method for corpus-based induction
of an LFG syntax-semantics interface for frame se-
mantic processing. We port frame annotations from
a manually annotated corpus to an LFG parsing ar-
chitecture that can be used to process unparsed text.
We model frame semantic annotations in an LFG
projection architecture, including phenomena that
involve non-isomorphic mappings between levels.
In future work we will train stochastic mod-
els for disambiguation of the assigned frame se-
mantic structures. We are especially interested in
exploring the potential of deeper, functional syn-
tactic analyses for frame assignment, in conjunc-
tion with additional semantic knowledge (e.g. word
senses, named entities). We will set up a bootstrap-
ping cycle for learning increasingly refined stochas-
tic models from growing training corpora, using
semi-supervised learning methods. We will explore
multi-lingual aspects of frame assignment, using
English FrameNet data and an English LFG gram-
mar with comparable f-structure output. Finally, we
will investigate how similar methods can be applied
to syntactic frameworks such as HPSG, which al-
ready embody a level of semantic representation.
Acknowledgements We thank the IMS Stuttgart for
allowing us to use the German LFG grammar. Spe-
cial thanks go to Martin Forst who provided us with
the TIGER-LFG corpus and added special features
to support our work. Finally, thanks go to Dick
Crouch who greatly enhanced the transfer system.

References
C. F. Baker, C. J. Fillmore, and J. B. Lowe. 1998.
The Berkeley FrameNet project. In Proceedings of
COLING-ACL 1998, Montr´eal, Canada.
S.Brants, S.Dipper, S.Hansen, W.Lezius, G.Smith. 2002.
The TIGER Treebank. In Proc. of the Workshop on
Treebanks and Linguistic Theories, Sozopol, Bulgaria.
J. Bresnan. 2001. Lexical-Functional Syntax. Blackwell
Publishers, Oxford.
K. Erk, A. Kowalski, S. Pad´o, and M. Pinkal. 2003.
Towards a Resource for Lexical Semantics: A Large
German Corpus with Extensive Semantic Annotation.
In Proceedings of the ACL 2003, Sapporo, Japan.
M. Fleischman, N. Kwon, and E. Hovy. 2003. Maxi-
mum entropy models for FrameNet classification. In
Proceedings of EMNLP’03, Sapporo, Japan.
M. Forst. 2003. Treebank Conversion – Establishing a
testsuite for a broad-coverage LFG from the TIGER
treebank. In A. Abeill´e, S. Hansen, and H. Uszkoreit
(eds), Proceedings of the 4th International Workshop
on Linguistically Interpreted Corpora, Budapest.
A. Frank and K. Erk. 2004. Towards an LFG Syntax–
Semantics Interface for Frame Semantics Annotation.
In A. Gelbukh (ed), Computational Linguistics and In-
telligent Text Processing, Springer, Heidelberg.
D. Gildea and D. Jurafsky. 2002. Automatic labeling of
semantic roles. Computational Linguistics, 28(3).
D. Gildea and M. Palmer. 2002. The Necessity of Pars-
ing for Predicate Argument Recognition. In Proceed-
ings of ACL’02, Philadelphia, PA.
P.-K. Halvorsen and R.M. Kaplan. 1995. Projec-
tions and Semantic Description in Lexical-Functional
Grammar. In M. Dalrymple, R.M. Kaplan, J.T.
Maxwell, A. Zaenen (eds), Formal Issues in Lexical-
Functional Grammar, CSLI Lecture Notes, Stanford.
P. Kingsbury, M. Palmer, and M. Marcus. 2002. Adding
semantic annotation to the Penn TreeBank. In Pro-
ceedings of the HLT Conference, San Diego.
J. T. III Maxwell and R. M. Kaplan. 1989. An overview
of disjunctive constraint satisfaction. In Proceedings
of IWPT, pages 18–27.
S. Riezler, T. H. King, R. M. Kaplan, R. Crouch, J. T. III
Maxwell, and M. Johnson. 2002. Parsing the Wall
Street Journal using a Lexical-Functional Grammar
and Discriminative Estimation Techniques. In Pro-
ceedings of the ACL’02, Philadelphia, PA.
