Robust Semantic Construction 
Michael Schiehlen* 
Institute tbr Comlmtational Linguistics, University of Stuttgart, 
Azenbergstr. 12, 70174 Stuttgart 
mike@adler, ims. uni-stuttgart, de 
1 Introduction 
Recent years have seen a surge ill interest f~r 
robust fiat analysis, i.e. NLP systems with fairly 
limited supl)ly of linguistic knowledge but with 
vast coverage. The paper describes a module 
that serves as a back-end to such fiat analysis 
methods and transforms their output into full 
semantic representations as constructed by deep 
analysis methods. In particular, the module has 
been designed so as to process input fl'om 
• tree banks 
• a statistic context-free parser trained on 
these tree banks 
• a finite-state parser 
• a traditional feature-structure parser 
The semantic representations which the mod- 
ule constructs m'e so-called Verbmobil Inter- 
~&ce \[\[~Ol'lllS (\\[Irl~s) (BOS et al., 1998), (l)uilding 
on Reyle's Underspecified Discourse R,epresen- 
tation Structures (1993), see an example, in Fig- 
ure 1). Although in principle othe, r representa- 
tions could be constructed as well, VITs seem 
to be a particularly good choice: They Call 1oe 
implemented as sets of coustraints so that se- 
mantic construction (SC) reduces to collecting 
the constraints and unifying some variables in 
these constraints. Furthermore VITs are sup- 
ported by ml abstract data type (Dorlla, 2000). 
Several daunting prol)lems had to be t'aced in 
tile design of the module. 
* This work was fimded by the German Federal Min- 
istry of Education, Science, Research and Technology 
(BMBF) in the ti'amework of the Verbmobil Project un- 
der Grant 01 IV 1(}1 U. Many thanks are duc to M. Emele 
and the colleagues in Verbmobil. 
week( y ) 
next( y ) 
decl(~7\] ) \ 
~1 ~ maybe(\[~ ) / 
// should(el,~\]) 1 
into( e, y ) 
move(e, x ) pron(x) 
Figure 1: VIT for So maybe, u~'e should move, 
into idle next u,e, ek. 
Context-Free Input. The tree banks provid- 
ing tile input structures (which have been built 
in the Verbmobil project) only encode context- 
free trees to facilitate the training of a statisti- 
cal parser. This means that non-local depen- 
dencies are either left out (e.g. topicalization 
in English) or treated by flattening out sub- 
trees into rules (e.g. head-movenmnt ill Ger- 
man). The latter strategy can create a vast 
amount of rules: Sin(:e Gelunan head-nmvement 
connects a clause-initial and a clause-final posi- 
tion, every clause frame gives rise to a new rule. 
To thee this challenge some adjustments had to 
be made: 
(1) Predicate-argument structure is indispens- 
able for SC but presupposes reconstruction of 
long distance dependencies ("movement"). If 
syntax cmmot supl)ly it, SC has to retrieve it; 
on its own (see Section 5.2). 
(2) The sheer bulk of rules prohibits manual tag: 
ging of syntactic rules with semantic rules. In- 
1101 
stead, syntax has to provide pertinent informa- 
tion in its rules so that SC can determiim the 
semantic operations required. 
Robustness. Since the tree banks have been 
constructed by hand, errors are prone to crop 
Ul). Likewise, flat analysis methods cannot be 
expected to deliver input of the same quality 
as deep traditional parsers. Finally, grammars 
and semantic formalism will often difl'er in their 
subcategorization assmnptions: The verb move 
e.g. subcategorizes for hito in the tree bank (see 
Figure 2) but not in tile VIT tbnnalism (see Fig- 
ure 3). 
To handle this problem, the syntax-semantics- 
interface should be dismantled as far as possible: 
Only the most indispensable information should 
be taken over fl'om syntax. By neglecting all tile 
rest the system stands a good chance of skipping 
syntax errors. Furthermore in many cases deci- 
sions made ill syntax need to be overturned in 
semantics (e.g. the complement/adjunct specifi- 
cations), hnportant semantic information is of- 
ten deterlnined only in SC or in subsequent dis- 
ambiguation lnodules that have access to larger 
stores of context. This approach eases the bur- 
den on syntactic analysis and potentially yields 
more reliable results. 
Diverse Input. A SC module should be able 
to handle input from a variety of grammars and 
convert it into an independellt tbrmat of seman- 
tic representation. Thus, a common syntax- 
semantics-interface (or inore precisely an inter- 
face between syntax and SC) must be defined 
onto which every type of' input is mapped. 
2 Design Principles 
To cope with tile problems mentioned, tradi- 
tional SC techniques (Montague, 1973) (Pereira 
and Shieber, 1987) (Bos el; al., 1996) cannot 
be used. Instead, the fbllowing ideas were ex- 
ploited. 
Modularity and Underspeeification. A 
major problem in SC is the treatment of mn- 
biguity. Often the local rule context available in 
SC does not give enough ilffonnation to resolve 
such ambiguities. In these cases, underspecifi- 
cation should be used to defer the resolution of 
choices. Thus, the described module builds a 
lexically and scopally underspecified represen- 
tation. Subsequently the lexical ambiguities are 
resolved by disambiguation modules. 
COMIZ_ .... ~ItD 
AIkJ/~D- 
/ADJ/~"J tD / ---~ ~Bj/ ~HD ~. 
/ / ~7 ¢D2""C~ OMP / / / '7 Ime'---<OMP \ 
/ / / / -7 m)~C-OMI_'__\ 
SPI,,/~ D 
RB RB PP MD VB IN DT JJ NN 
So maybe we should move into the next week 
Figure 2: Example for an application tree. 
Modularity and Syntax-Semantics- 
Interface. To facilitate modularity a syntax- 
semantics-interface is explicitly defined. The 
input of every parser is mapped onto an 
interface structure called application twe, see 
Figure 2. In this way input Dora various sources 
can be processed with a minimum of effort. 
Semantic Database. Great emphasis is laid 
on an external database of semantic predicates 
(Heinecke and Worm, 1996). This database as- 
sociates lemmas with predicate nmnes, seman- 
tic classes and subcategorization frames (see the 
entry in Figure 3). 
3 System Overview 
The process of SC can be split into two phases 
(see Figure 4). In the first, phase an application 
I;ree is traversed and simultaneously an under- 
specified semantic representt~tion is lmilt (com- 
positional semantic construction, see section 5). 
In the second phase the semantic representation 
is partially (lisambiguated (see section 6). The 
two phases are preceded by a step which me.- 
diates between the actual output of the syntax 
and the syntax-semantics-interface. 
4 Syntax-Semantics-Interface 
Traditionally, the content of the syntax- 
semantics-interface is somewhat contentious. 
While syntax-oriented approaches try to inte- 
grate a good part of SC already into the pars- 
ing process (cf. the construction of f-structure 
in LFG), other al)proaches put tile main focus 
on semantics (e.g. Montague Grammar). To 
achieve a high degree of flexibility, a modular 
SC system has to settle for the lowest, common 
denonfinator of all input sources. The follow- 
ing information seems to be minimally required 
from the syntax. 
1102 
Lemma PredName 
move move 
SemClass SyntFrame Sort 
vi3 argl:subj,arg3:obj move sit 
Figure 3: Entryinthe semantic database. 
ArgSorts 
agentive,entity 
contezt-frec trec 
; 
\[ preprocessing step \] 
; 
application tree 
4 
compositional \] 
semantic construction <- semantic lexicon 
semantic represcv, tation 
; 
noncolnpositional \] 
semantic COllstruction <-- idiom lexicon 
4 
sc',nantic re.l)r(',~em, tation 
Figure 4: System overview. 
(l) The parser should d(~livcr a tree tbr the 
parsed string whi(:h the SC system then ('an con- 
vert into a hierarchical stmmian'e of senmnti(: o\])- 
(;rations (an applicatiml, tree). 
(2) Every word in the input string should 1)e 
syntactically classified, i.e. assigned a .~!lntac- 
tic cateflo'r!/ or \])art of Sl)(;e.(:h tag. \¥e will as- 
Sllllle that the parser assigns every word exactly 
one category. (Lexical underslmcification could 
conceivatfy b(; used to deal with multiple cate- 
gories.) Then morphological analysis (either in 
synt;ax or SC) maps l;he word cate~rory pair to a 
morphological lemma and a set of morphologi- 
e::fl features. SC records the feal;llres in the VIT 
while it uses l;he lemma as a key to the seman- 
tic lexicon. In (:ase the lemma is unknown in 
the semantic lexicon, the, syst, eln uses the syn~ 
tacti(: category to automati(:ally asso(:iate ~ new 
1)redicate and semantic (:lass with the lemma. 
(3) Every rule used in the tree shouhl speci~y for 
each of the categories on its right-hand side ex- 
actly one grammatical role (GR). It7 the grmmnar 
does not do this, ORs must be deternfined in the 
prel)roeessing step (e.g. determiners in NPs are 
specifiers). Gll.s are llsed to COlttro\]. the choice of 
s,.;nmntic el)orations. The set of CHs emlfloyed 
is; inspired by HPSG (Pollard and Sag, 1994:): 
Head, Complement, Adjunct, Om~:j'anct, Spcci- 
- Z--_Z Z-_Z Z--_Z S--Z --_7_Z ----_Z ----_- - 
Y _ _ _/ff'- 
\[ houk,(o, ) 7 " 
i ! 
Figure 5: ()peration of a(1.immtion. 
tier, Part of a Multi-Word Lezcme. The corm- 
spending semm~tic operations are Complemcn- 
ration, Adjunction, Coordination, Spec~i/ication, 
and Predicate Form, ation. Except for Coordina- 
tion and Predicate Formation all operations are 
l)ilmry. A nile without a head is considered el- 
lil)ti('al and an abstract 1)redicate for the missing 
head is inserted in semantics. 
5 Compositional Semantic 
Construction Process 
Conll)ositional SC follows the at)plieation tree 
(the context-free backbone) and detc,:nfincs the 
in'edicate-argunmnt structure (th(', subcatego- 
rization paths). 
5.1 Senmntic Construction on the 
ConsGtuent Structure 
Figure 5 shows two adjun(:tion operations: In 
the firs/: one, the inl:erseetive a(kiunct into the 
next week is adjoined to move. In the second 
one, maybe is adjoined to the clause. The pic- 
ture makes clem' what the data structure for 
a partial result should look like: a set of con- 
straints and some pointers to variables in these 
constraints (e.g. the partial result for maybe 
would be { maybe(l~,lq),12 < hl,ll C- la }~ 
and (12, la)). Since only finitely many pointers 
J ln a VIT, every predicate is referenced over a base 
label (e.g. lj for maybe). The constraint 12 < ha says 
that; the box 12 is subordinated to box lq, while l: C la 
sl;a|;es t, hat; predicate l: is in box la. 
1103 
are involved, they can be collected in a record. 
All partial results are classified into six seman- 
tic types according to the pointers they allow 
for: nhead (nominal head, ibr nouns), vhead 
(verbal head, for verbs), adj (adjuncts ~, for ad- 
verbs, adjectives, subclanses, PPs, also preposi- 
tions and subordinating co~\junctions), ncomp 
(nonfinal complements, for pronouns, NPs, also 
determiners), vcomp (verbal complements, for 
sentences and complement clauses, also sentence 
moods and complementizers), cnj (tbr coordi- 
nating conjunctions). 
Semantic operations expect arguments of spe- 
cific semantic types: Complementation com- 
bines heads with complements, Adjunction com- 
bines heads with adjuncts. Specification con- 
verts an incoml)lete ncomp (i.e. a deternfiner) 
and a nhead into a complete ncomp. Coordi- 
nation combines a cnj with a series of partial 
results of equal type. 
If a type clash occurs, a "type raising" opera- 
tion is invoked. Such operations usually insert 
specific abstract predicates that represent pho- 
netically empty words or elided materiM still to 
be retrieved by ellipsis resolution in a later step. 
Figure 6 gives a concise description of these op- 
erations a. Consider some type-raising exmnples: 
(1) I will be here Monday. 
udef 
unspcc rood 
(nhead -+ ncomp) 
(ncomp -~ adj) 
(2) I will come if necessary. 
star (adj --+ vhead) 
(3) Afternoon might work or early morniT~,g. 
abstr_ tel (ncomp -+ vhead) 
5.2 Semantic Construction on the 
Predicate-Argument Structure 
While the application tree (Figure 2) states that 
the pronoun we is the subject of should, in the 
2VITs provide a lexical underspecification class for 
intersective (e.g. into the vext week and scopal adjuncts 
(e.g. maybe). Thus, SC has to handle intersecLive and 
seopal adjunction in parallel. 
3In Figure 6 the following names are used for newly 
inserted predicates: udef (mill determiner), unspcc_ rood 
(mill preposition), stat (auxiliary verb be), abstr_nom 
(nominal ellipsis), abstr_ tel (verbal ellipsis), dccl (declar- 
ative sentence mood), poss (relation expressed by ge.ni- 
tive), de.f (definite quantifier). 
comp(abstrrcl,) a,..-¢@ 
I1~\\% -. oo %.~'% ,,; _//,o., , 
~ . . _ 
Figure 6: Type-raising operations. 
semantic representation (Figure 1) we is the sub- 
ject of move. So in this case head and seman- 
tic subject m:e not in the same local rule. To 
retrace such non-local dependencies, a slash de- 
vice is used to store the pertinent information 
(the argument variable and the box label of the 
head) and propagate it through the application 
tree in search for a licenser. If a subcategorized 
element occurs without a subcategorizing head 
(as occurs often in fl'agmentary input), an ellip- 
tical element is assumed: 
(4) I mean if you --~ absh'_ 'tel with subject you 
6 Noncompositional Semantic 
Construction 
In noncompositional SC idioms are recognized 
m~d a higher level of abstraction is achieved. 
Technically, noncompositional SC is about 
transforming VITs. Thus, for implementation 
the VIT transfer package of Dorna and Emele 
((1996)) is used. Linguistically, the component 
performs the following tasks: 
• recognition of multi-word lexemes that 
arc not designated as such by syntax 
(e.g. greeting expressions good night, com- 
paratives more comfortable) 
• recognition and comput~tion of clock times 
(e.g. a qm~rter to ten) and (late expressions 
• recognition of titles (e.g. Frau Miiller) 
• partial dismnbiguation of sentence mood 
(e.g. who did it; is recognized as a question) 
• distribution of conjoined material, if re- 
quired by the level of abstraction aimed for 
1104 
(e.g. clock tilnes between a. quarter to and 
half l)aSt to, n, (late expressions Monday the 
third and tenth) 
• eoml)ositional morphology for German 
(e.g. Sl;it'tmuseum -: museum with the 
ualne Stilt) 
7 Summary 
The paper has presented a module d capable 
o:\[' tlandling inlmt from tlat, analysis lnetho(ls 
and transforming them into full-fledged seman- 
tic representations. The module works rolmstly 
and currently has a throughput of about 98% 
on Verbmobil tree bank input (i.e. it gener- 
al:es 21,222 English and 26,789 German VITx.) 
The remaining 2% are due to errors in the SC 
module, errors in the tree bank, or coordination 
1)tel)ictus 1)etween SC and tree 1)ank. 
Evaluation of the module is COnll)lieated 1) 3, the 
effort involved in mmmally constructing a siz- 
able set of input structures and correspondillg 
semantic rel)resentations, l?urthermore, t;t1(: VIT 
formalism has been in constant flux over the 
last; years with the correct outlmt representa- 
tions changing almost monthly. It is, however, 
envisaged to perform an evaluation on('e dust 
has settled. 
The approach described adds in two respects to 
the rot)ustnexs of the overall system. First, l;he 
flat analysis lmrxerx used are very rolmst as con- 
terns low-level inconsistencies such as agreement 
failures or missing function words (prepositions, 
determiners, COmlflementizers, etc.). Second, 
the data analysed in the Verbmobil tree banks 
m:e exclusively xpoken . Hence, the tree 
banks encode analyxes tbr phenomena such as 
fl'agmentary inlmt , truncated or elliptical sell- 
tencex, etc. The described module, gives seman- 
tic analyses for all of these constructions. (Usu- 
ally an abstract predicate is incorl)orated which 
given a hint to subsequent modules that aim to 
piece together partial utterances.) 
Another perspective of this work is that it pro- 
vides a first step towards a real corlms seman- 
tics by converting large sets of data into seman- 
tic representations. Due to the abstraction they 
embody, semantic representations are a valuable 
too\] for content queries to the. processed corpora. 
More immediately, the semmltic representations 
4More inibrmation can be found in Schiehlen (1999). 
generated by the described module have been 
used as text and training data tbr applications 
requiring abstract input, such as transfer in ma- 
chine translation and generation. 

References 

3ohan Bos, Bjgrn Gambit(k, Cln'istiml Lieske, 
Yoshiki Mori, Manii'ed Pinkal, and Km'sten L. 
Worm. 1996. Coml)ositional Semantics in Verb- 
mobil. In Proceedings of the 16th, Interna- 
tional Cm~:ferencc. on ComFutational Linguistics 
(COLING '96'), Copenhagen, l)emnark. 

Johan Box, Bianka Buschbeck-Wolf, Michael 
Dorna, and C.3. I{upp. 1998. Managing intbr- 
ination at linguistic interfaces. In Proceedings 
of th, e IT(h, Intcrnatio'nal CoT~ference on Com,- 
p'u, tational Linguistics (COLING '98), Montreal, 
Ca.lmda. 

Michael l)orna and Martin C. Einele. 1996. 
Senmntic-Based Transfer. In Proceedings of 
the. 16th, I'lder'nationaI Co'n:fi'.rence on Computa- 
tional Linguistic,~ (COLING '96), Cot)enhagen , 
Denmark. 

Michael Dorna. 2000. A Library Package for 
the Verbmobil Interface Term. Verbmobil E,e- 
port 238, Institut fiir maschinelle Sprachverar- 
l)eitm~g. 

.\]ohannes Heinecke and Karsten Worm. 1996. 
A Lexical Semantic Database for Verbmol)il. \[51 
Proceedings d the/tth, internatio'na, l Cov:fere',,ce 
on Comp'u, tational Linguistics (COMPLEX '96), 
Bud~l)est , I{ungary. 

Eichard Montague. 1973. The, Proper Treat- 
ment; of Quantification. In Jaako Hintikka, 
Julius Moravcsik, and Patrick Suppes, editors, 
Approaches to Natural Lang,uage , pages 221 
242. Reidel, l)ordreeht. 

Ferlmndo C.N. Pereira and Stuart M. Shieber. 
1987. Prolog and Natural-Language Analysis. 
CSLI Lecture Notes. Center for the Study of 
Language and hfformation, Stanford, Calitbr- 
nia. 

Carl Pollard and Ivan Sag. 1994. Itead- 
Driven Phrase Structure Grammar. University 
of Chicago Press, Chicago. 

Uwe Reyle. 1993. Dealing with Ambiguities 
by Underspeeification: Construction, I{epresen- 
tation and Deduction. Journal of Semantics, 
10(2):123-179. 

Michael Sdfiehlen. 1999. Semantikkonstruktion. 
Ph.D. thesis, Universitgt Stuttgart. 
