Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL),
pages 9–16, Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
A Statistical Semantic Parser that Integrates Syntax and Semantics
Ruifang Ge Raymond J. Mooney
Department of Computer Sciences
University of Texas, Austin
TX 78712, USA
CUgrf,mooneyCV@cs.utexas.edu
Abstract
We introduce a learning semantic parser,
SCISSOR, that maps natural-language sen-
tences to a detailed, formal, meaning-
representation language. It first uses
an integrated statistical parser to pro-
duce a semantically augmented parse tree,
in which each non-terminal node has
both a syntactic and a semantic label.
A compositional-semantics procedure is
then used to map the augmented parse
tree into a final meaning representation.
We evaluate the system in two domains,
a natural-language database interface and
an interpreter for coaching instructions in
robotic soccer. We present experimental
results demonstrating that SCISSOR pro-
duces more accurate semantic representa-
tions than several previous approaches.
1 Introduction
Most recent work in learning for semantic parsing
has focused on “shallow” analysis such as seman-
tic role labeling (Gildea and Jurafsky, 2002). In this
paper, we address the more ambitious task of learn-
ing to map sentences to a complete formal meaning-
representation language (MRL). We consider two
MRL’s that can be directly used to perform useful,
complex tasks. The first is a Prolog-based language
used in a previously-developed corpus of queries to
a database on U.S. geography (Zelle and Mooney,
1996). The second MRL is a coaching language for
robotic soccer developed for the RoboCup Coach
Competition, in which AI researchers compete to
provide effective instructions to a coachable team of
agents in a simulated soccer domain (et al., 2003).
We present an approach based on a statisti-
cal parser that generates a semantically augmented
parse tree (SAPT), in which each internal node in-
cludes both a syntactic and semantic label. We aug-
ment Collins’ head-driven model 2 (Collins, 1997)
to incorporate a semantic label on each internal
node. By integrating syntactic and semantic inter-
pretation into a single statistical model and finding
the globally most likely parse, an accurate combined
syntactic/semantic analysis can be obtained. Once a
SAPT is generated, an additional step is required to
translate it into a final formal meaning representa-
tion (MR).
Our approach is implemented in a system called
SCISSOR (Semantic Composition that Integrates
Syntax and Semantics to get Optimal Representa-
tions). Training the system requires sentences an-
notated with both gold-standard SAPT’s and MR’s.
We present experimental results on corpora for both
geography-database querying and Robocup coach-
ing demonstrating that SCISSOR produces more ac-
curate semantic representations than several previ-
ous approaches based on symbolic learning (Tang
and Mooney, 2001; Kate et al., 2005).
2 Target MRL’s
We used two MRLs in our experiments: CLANG and
GEOQUERY. They capture the meaning of linguistic
utterances in their domain in a formal language.
9
2.1 CLANG: the RoboCup Coach Language
RoboCup (www.robocup.org) is an interna-
tional AI research initiative using robotic soccer
as its primary domain. In the Coach Competition,
teams of agents compete on a simulated soccer field
and receive advice from a team coach in a formal
language called CLANG. In CLANG, tactics and
behaviors are expressed in terms of if-then rules.
As described in (et al., 2003), its grammar consists
of 37 non-terminal symbols and 133 productions.
Below is a sample rule with its English gloss:
((bpos (penalty-area our))
(do (player-except our {4})
(pos (half our))))
“If the ball is in our penalty area, all our players
except player 4 should stay in our half.”
2.2 GEOQUERY: a DB Query Language
GEOQUERY is a logical query language for a small
database of U.S. geography containing about 800
facts. This domain was originally chosen to test
corpus-based semantic parsing due to the avail-
ability of a hand-built natural-language interface,
GEOBASE, supplied with Turbo Prolog 2.0 (Borland
International, 1988). The GEOQUERY language
consists of Prolog queries augmented with several
meta-predicates (Zelle and Mooney, 1996). Below
is a sample query with its English gloss:
answer(A,count(B,(city(B),loc(B,C),
const(C,countryid(usa))),A))
“How many cities are there in the US?”
3 Semantic Parsing Framework
This section describes our basic framework for se-
mantic parsing, which is based on a fairly stan-
dard approach to compositional semantics (Juraf-
sky and Martin, 2000). First, a statistical parser
is used to construct a SAPT that captures the se-
mantic interpretation of individual words and the
basic predicate-argument structure of the sentence.
Next, a recursive procedure is used to composition-
ally construct an MR for each node in the SAPT
from the semantic label of the node and the MR’s
has2
VP−bowner
player the ball
NN−player CD−unum NP−null
NN−null
VB−bowner
S−bowner
NP−player
DT−null
PRP$−team
our
Figure 1: An SAPT for a simple CLANG sentence.
Function:BUILDMRB4C6BNC3B5
Input: The root node C6 of a SAPT;
predicate-argument knowledge, C3, for the MRL.
Notation: CG
C5CA
is the MR of node CG.
Output: C6
C5CA
BV
CX
BMBP the CXth child node of C6BNBD AKCXAKD2
BV
CW
BP GETSEMANTICHEAD(C6) // see Section 3
BV
CW
C5CA
BP BUILDMR(BV
CW
BNC3)
for each other child BV
CX
where CX BIBP CW
BV
CX
C5CA
BP BUILDMR(BV
CX
BNC3)
COMPOSEMR(BV
CW
C5CA
, BV
CX
C5CA
BNC3) // see Section 3
C6
C5CA
BP BV
CW
C5CA
Figure 2: Computing an MR from a SAPT.
of its children. Syntactic structure provides infor-
mation of how the parts should be composed. Am-
biguities arise in both syntactic structure and the se-
mantic interpretation of words and phrases. By in-
tegrating syntax and semantics in a single statistical
parser that produces an SAPT, we can use both se-
mantic information to resolve syntactic ambiguities
and syntactic information to resolve semantic ambi-
guities.
In a SAPT, each internal node in the parse tree
is annotated with a semantic label. Figure 1 shows
the SAPT for a simple sentence in the CLANG do-
main. The semantic labels which are shown after
dashes are concepts in the domain. Some type con-
cepts do not take arguments, like team and unum
(uniform number). Some concepts, which we refer
to as predicates, take an ordered list of arguments,
like player and bowner (ball owner). The predicate-
argument knowledge, C3, specifies, for each predi-
cate, the semantic constraints on its arguments. Con-
straints are specified in terms of the concepts that
can fill each argument, such as player(team, unum)
and bowner(player). A special semantic label D2D9D0D0
is used for nodes that do not correspond to any con-
cept in the domain.
Figure 2 shows the basic algorithm for build-
ing an MR from an SAPT. Figure 3 illustrates the
10
player the ball
N3−bowner(_)N7−player(our,2)
N2−null
      null      null
N4−player(_,_)    N5−team
our
N6−unum
2
N1−bowner(_)
has
N8−bowner(player(our,2))
Figure 3: MR’s constructed for each SAPT Node.
construction of the MR for the SAPT in Figure 1.
Nodes are numbered in the order in which the con-
struction of their MR’s are completed. The first
step, GETSEMANTICHEAD, determines which of a
node’s children is its semantic head based on hav-
ing a matching semantic label. In the example, node
N3 is determined to be the semantic head of the
sentence, since its semantic label, bowner, matches
N8’s semantic label. Next, the MR of the seman-
tic head is constructed recursively. The semantic
head of N3 is clearly N1. Since N1 is a part-of-
speech (POS) node, its semantic label directly de-
termines its MR, which becomes bowner( ). Once
the MR for the head is constructed, the MR of all
other (non-head) children are computed recursively,
and COMPOSEMR assigns their MR’s to fill the ar-
guments in the head’s MR to construct the com-
plete MR for the node. Argument constraints are
used to determine the appropriate filler for each ar-
gument. Since, N2 has a null label, the MR of N3
also becomes bowner( ). When computing the MR
for N7, N4 is determined to be the head with the
MR: player( , ). COMPOSEMR then assigns N5’s
MR to fill the team argument and N6’s MR to fill
the unum argument to construct N7’s complete MR:
player(our, 2). This MR in turn is composed with
the MR for N3 to yield the final MR for the sen-
tence: bowner(player(our,2)).
For MRL’s, such as CLANG, whose syntax does
not strictly follow a nested set of predicates and ar-
guments, some final minor syntactic adjustment of
the final MR may be needed. In the example, the
final MR is (bowner (player our CU2CV)). In the fol-
lowing discussion, we ignore the difference between
these two.
There are a few complications left which re-
quire special handling when generating MR’s,
like coordination, anaphora resolution and non-
compositionality exceptions. Due to space limita-
tions, we do not present the straightforward tech-
niques we used to handle them.
4 Corpus Annotation
This section discusses how sentences for training
SCISSOR were manually annotated with SAPT’s.
Sentences were parsed by Collins’ head-driven
model 2 (Bikel, 2004) (trained on sections 02-21
of the WSJ Penn Treebank) to generate an initial
syntactic parse tree. The trees were then manually
corrected and each node augmented with a semantic
label.
First, semantic labels for individual words, called
semantic tags, are added to the POS nodes in the
tree. The tag null is used for words that have no cor-
responding concept. Some concepts are conveyed
by phrases, like “has the ball” for bowner in the pre-
vious example. Only one word is labeled with the
concept; the syntactic head word (Collins, 1997) is
preferred. During parsing, the other words in the
phrase will provide context for determining the se-
mantic label of the head word.
Labels are added to the remaining nodes in a
bottom-up manner. For each node, one of its chil-
dren is chosen as the semantic head, from which it
will inherit its label. The semantic head is chosen
as the child whose semantic label can take the MR’s
of the other children as arguments. This step was
done mostly automatically, but required some man-
ual corrections to account for unusual cases.
In order for COMPOSEMR to be able to construct
the MR for a node, the argument constraints for
its semantic head must identify a unique concept
to fill each argument. However, some predicates
take multiple arguments of the same type, such as
point.num(num,num), which is a kind of point that
represents a field coordinate in CLANG.
In this case, extra nodes are inserted in the tree
with new type concepts that are unique for each ar-
gument. An example is shown in Figure 4 in which
the additional type concepts num1 and num2 are in-
troduced. Again, during parsing, context will be
used to determine the correct type for a given word.
The point label of the root node of Figure 4 is the
concept that includes all kinds of points in CLANG.
Once a predicate has all of its arguments filled, we
11
,
0.5 , −RRB−
−RRB−−null  
−LRB− 0.1
CD−num CD−num
−LRB−−point.num
PRN−point
CD−num1 CD−num2
Figure 4: Adding new types to disambiguate argu-
ments.
use the most general CLANG label for its concept
(e.g. point instead of point.num). This generality
avoids sparse data problems during training.
5 Integrated Parsing Model
5.1 Collins Head-Driven Model 2
Collins’ head-driven model 2 is a generative, lexi-
calized model of statistical parsing. In the following
section, we follow the notation in (Collins, 1997).
Each non-terminal CG in the tree is a syntactic label,
which is lexicalized by annotating it with a word,
DB, and a POS tag, D8
D7DDD2
. Thus, we write a non-
terminal as CGB4DCB5, where X is a syntactic label and
DC BP CWDBBND8
D7DDD2
CX. CGB4DCB5 is then what is generated by
the generative model.
Each production C4C0CB B5 CAC0CB in the PCFG is
in the form:
C8B4CWB5AXC4
D2
B4D0
D2
B5BMBMBMC4
BD
B4D0
BD
B5C0B4CWB5CA
BD
B4D6
BD
B5BMBMBMCA
D1
B4D6
D1
B5
where C0 is the head-child of the phrase, which in-
herits the head-word CW from its parent C8. C4
BD
BMBMBMC4
D2
and CA
BD
BMBMBMCA
D1
are left and right modifiers of C0.
Sparse data makes the direct estimation of
C8B4CAC0CBCYC4C0CBB5 infeasible. Therefore, it is decom-
posed into several steps – first generating the head,
then the right modifiers from the head outward,
then the left modifiers in the same way. Syntactic
subcategorization frames, LC and RC, for the left
and right modifiers respectively, are generated be-
fore the generation of the modifiers. Subcat frames
represent knowledge about subcategorization prefer-
ences. The final probability of a production is com-
posed from the following probabilities:
1. The probability of choosing a head constituent
label H: C8
CW
B4C0CYC8BNCWB5.
2. The probabilities of choosing the left and right
subcat frames LC and RC: C8
D0CR
B4C4BVCYC8BNC0BNCWB5
and C8
D6CR
B4CABVCYC8BNC0BNCWB5.
has2our player the
PRP$−team NN−player CD−unum
NN−nullDT−null
NP−player(player) VP−bowner(has)
NP−null(ball)
ball
S−bowner(has)
VB−bowner
Figure 5: A lexicalized SAPT.
3. The probabilities of generat-
ing the left and right modifiers:
C9
CXBPBDBMBMD1B7BD
C8
D6
B4CA
CX
B4D6
CX
B5CYC0BNC8BNCWBNA1
CXA0BD
BNCABVB5 A2
C9
CXBPBDBMBMD2B7BD
C8
D0
B4C4
CX
B4D0
CX
B5CYC0BNC8BNCWBNA1
CXA0BD
BNC4BVB5.
Where A1 is the measure of the distance from
the head word to the edge of the constituent,
and C4
D2B7BD
B4D0
D2B7BD
B5 and CA
D1B7BD
B4D6
D1B7BD
B5 are CBCCC7C8.
The model stops generating more modifiers
when CBCCC7C8 is generated.
5.2 Integrating Semantics into the Model
We extend Collins’ model to include the genera-
tion of semantic labels in the derivation tree. Un-
less otherwise stated, notation has the same mean-
ing as in Section 5.1. The subscript D7DDD2 refers to
the syntactic part, and D7CTD1 refers to the semantic
part. We redefine CG and DC to include semantics,
each non-terminal CG is now a pair of a syntactic la-
bel CG
D7DDD2
and a semantic label CG
D7CTD1
. Besides be-
ing annotated with the word, DB, and the POS tag,
D8
D7DDD2
, CG is also annotated with the semantic tag,
D8
D7CTD1
, of the head child. Thus, CGB4DCB5 now consists of
CG BP CWCG
D7DDD2
BNCG
D7CTD1
CX, and DC BP CWDBBND8
D7DDD2
BND8
D7CTD1
CX. Fig-
ure 5 shows a lexicalized SAPT (but omitting D8
D7DDD2
and D8
D7CTD1
).
Similar to the syntactic subcat frames, we also
condition the generation of modifiers on semantic
subcat frames. Semantic subcat frames give se-
mantic subcategorization preferences; for example,
player takes a team and a unum. Thus C4BV and CABV
are now: CWC4BV
D7DDD2
BNC4BV
D7CTD1
CX and CWCABV
D7DDD2
BNCABV
D7CTD1
CX.
CGB4DCB5 is generated as in Section 5.1, but using the
new definitions of CGB4DCB5, C4BV and CABV. The imple-
mentation of semantic subcat frames is similar to
syntactic subcat frames. They are multisets speci-
fying the semantic labels which the head requires in
its left or right modifiers.
As an example, the probability of generating the
phrase “our player 2” using NP-[player](player) AX
12
PRP$-[team](our) NN-[player](player) CD-[unum](2)
is (omitting only the distance measure):
C8
CW
B4NN-[player]CYNP-[player],player)A2
C8
D0CR
B4CWCUCV,CUteamCVCXCYNP-[player],player)A2
C8
D6CR
B4CWCUCV,CUunumCVCXCYNP-[player],player)A2
C8
D0
B4PRP$-[team](our)CYNP-[player],player,CWCUCV,CUteamCVCXB5A2
C8
D6
B4CD-[unum](2)CYNP-[player],player,CWCUCV,CUunumCVCXB5A2
C8
D0
B4STOPCYNP-[player],player,CWCUCV,CUCVCXB5A2
C8
D6
B4STOPCYNP-[player],player,CWCUCV,CUCVCXB5
5.3 Smoothing
Since the left and right modifiers are independently
generated in the same way, we only discuss smooth-
ing for the left side. Each probability estimation in
the above generation steps is called a parameter. To
reduce the risk of sparse data problems, the parame-
ters are decomposed as follows:
C8
CW
B4C0CYBVB5 BP C8
CW
D7DDD2
B4C0
D7DDD2
CYBVB5A2
C8
CW
D7CTD1
B4C0
D7CTD1
CYBVBNC0
D7DDD2
B5
C8
D0CR
B4C4BVCYBVB5 BP C8
D0CR
D7DDD2
B4C4BV
D7DDD2
CYBVB5A2
C8
D0CR
D7CTD1
B4C4BV
D7CTD1
CYBVBNC4BV
D7DDD2
B5
C8
D0
B4C4
CX
B4D0
CX
B5CYBVB5 BP C8
D0
D7DDD2
B4C4
CX
D7DDD2
B4D0D8
CX
D7DDD2
BND0DB
CX
B5CYBVB5A2
C8
D0
D7CTD1
B4C4
CX
D7CTD1
B4D0D8
CX
D7CTD1
BND0DB
CX
B5CYBVBNC4
CX
D7DDD2
B4D0D8
CX
D7DDD2
B5B5
For brevity, BV is used to represent the context on
which each parameter is conditioned; D0DB
CX
, D0D8
CX
D7DDD2
, and
D0D8
CX
D7CTD1
are the word, POS tag and semantic tag gener-
ated for the non-terminal C4
CX
. The word is generated
separately in the syntactic and semantic outputs.
We make the independence assumption that the
syntactic output is only conditioned on syntactic fea-
tures, and semantic output on semantic ones. Note
that the syntactic and semantic parameters are still
integrated in the model to find the globally most
likely parse. The syntactic parameters are the same
as in Section 5.1 and are smoothed as in (Collins,
1997). We’ve also tried different ways of condition-
ing syntactic output on semantic features and vice
versa, but they didn’t help. Our explanation is the
integrated syntactic and semantic parameters have
already captured the benefit of this integrated ap-
proach in our experimental domains.
Since the semantic parameters do not depend on
any syntactic features, we omit the D7CTD1 subscripts
in the following discussion. As in (Collins, 1997),
the parameter C8
D0
B4C4
CX
B4D0D8
CX
BND0DB
CX
B5CYC8BNC0BNDBBND8BNA1BNC4BVB5 is
further smoothed as follows:
C8
D0BD
B4C4
CX
CYC8BNC0BNDBBND8BNA1BNC4BVB5 A2
C8
D0BE
B4D0D8
CX
CYC8BNC0BNDBBND8BNA1BNC4BVBNC4
CX
B5A2
C8
D0BF
B4D0DB
CX
CYC8BNC0BNDBBND8BNA1BNC4BVBNC4
CX
B4D0D8
CX
B5B5
Note this smoothing is different from the syntactic
counterpart. This is due to the difference between
POS tags and semantic tags; namely, semantic tags
are generally more specific.
Table 1 shows the various levels of back-off for
each semantic parameter. The probabilities from
these back-off levels are interpolated using the tech-
niques in (Collins, 1997). All words occurring less
than 3 times in the training data, and words in test
data that were not seen in training, are unknown
words and are replaced with the ”UNKNOWN” to-
ken. Note this threshold is smaller than the one used
in (Collins, 1997) since the corpora used in our ex-
periments are smaller.
5.4 POS Tagging and Semantic Tagging
For unknown words, the POS tags allowed are lim-
ited to those seen with any unknown words during
training. Otherwise they are generated along with
the words using the same approach as in (Collins,
1997). When parsing, semantic tags for each known
word are limited to those seen with that word dur-
ing training data. The semantic tags allowed for an
unknown word are limited to those seen with its as-
sociated POS tags during training.
6 Experimental Evaluation
6.1 Methodology
Two corpora of NL sentences paired with MR’s
were used to evaluate SCISSOR. For CLANG, 300
pieces of coaching advice were randomly selected
from the log files of the 2003 RoboCup Coach Com-
petition. Each formal instruction was translated
into English by one of four annotators (Kate et al.,
2005). The average length of an NL sentence in
this corpus is 22.52 words. For GEOQUERY, 250
questions were collected by asking undergraduate
students to generate English queries for the given
database. Queries were then manually translated
13
BACK-OFFLEVEL C8
CW
B4C0CYBMBMBMB5 C8
C4BV
B4C4BVCYBMBMBMB5 C8
C4BD
B4C4
CX
CYBMBMBMB5 C8
C4BE
B4D0D8
CX
CYBMBMBMB5 C8
C4BF
B4D0DB
CX
CYBMBMBMB5
1 P,DB,D8 P,H,DB,D8 P,H,DB,D8,A1,LC P,H,DB,D8,A1,LC, C4
CX
P,H,DB,D8,A1,LC, C4
CX
, D0D8
CX
2 P,D8 P,H,D8 P,H,D8,A1,LC P,H,D8,A1,LC, C4
CX
P,H,D8,A1,LC, C4
CX
, D0D8
CX
3 P P,H P,H,A1,LC P,H,A1,LC, C4
CX
C4
CX
, D0D8
CX
4 – – – C4
CX
D0D8
CX
Table 1: Conditioning variables for each back-off level for semantic parameters (D7CTD1 subscripts omitted).
into logical form (Zelle and Mooney, 1996). The
average length of an NL sentence in this corpus is
6.87 words. The queries in this corpus are more
complex than those in the ATIS database-query cor-
pus used in the speech community (Zue and Glass,
2000) which makes the GEOQUERY problem harder,
as also shown by the results in (Popescu et al., 2004).
The average number of possible semantic tags for
each word which can represent meanings in CLANG
is 1.59 and that in GEOQUERY is 1.46.
SCISSOR was evaluated using standard 10-fold
cross validation. NL test sentences are first parsed
to generate their SAPT’s, then their MR’s were built
from the trees. We measured the number of test sen-
tences that produced complete MR’s, and the num-
ber of these MR’s that were correct. For CLANG,
an MR is correct if it exactly matches the correct
representation, up to reordering of the arguments of
commutative operators like and. For GEOQUERY,
an MR is correct if the resulting query retrieved
the same answer as the correct representation when
submitted to the database. The performance of the
parser was then measured in terms of precision (the
percentage of completed MR’s that were correct)
and recall (the percentage of all sentences whose
MR’s were correctly generated).
We compared SCISSOR’s performance to several
previous systems that learn semantic parsers that can
map sentences into formal MRL’s. CHILL (Zelle and
Mooney, 1996) is a system based on Inductive Logic
Programming (ILP). We compare to the version
of CHILL presented in (Tang and Mooney, 2001),
which uses the improved COCKTAIL ILP system and
produces more accurate parsers than the original ver-
sion presented in (Zelle and Mooney, 1996). SILT is
a system that learns symbolic, pattern-based, trans-
formation rules for mapping NL sentences to formal
languages (Kate et al., 2005). It comes in two ver-
sions, SILT-string, which maps NL strings directly
to an MRL, and SILT-tree, which maps syntactic
 0
 10
 20
 30
 40
 50
 60
 70
 80
 90
 100
 0  50  100  150  200  250
Precision (%)
Training sentences
SCISSOR
SILT-string
SILT-tree
CHILL
GEOBASE
Figure 6: Precision learning curves for GEOQUERY.
 0
 10
 20
 30
 40
 50
 60
 70
 80
 0  50  100  150  200  250
Recall (%)
Training sentences
SCISSOR
SILT-string
SILT-tree
CHILL
GEOBASE
Figure 7: Recall learning curves for GEOQUERY.
parse trees (generated by the Collins parser) to an
MRL. In the GEOQUERY domain, we also compare
to the original hand-built parser GEOBASE.
6.2 Results
Figures 6 and 7 show the precision and recall learn-
ing curves for GEOQUERY, and Figures 8 and 9 for
CLANG. Since CHILL is very memory intensive,
it could not be run with larger training sets of the
CLANG corpus.
Overall, SCISSOR gives the best precision and re-
call results in both domains. The only exception
is with recall for GEOQUERY, for which CHILL is
slightly higher. However, SCISSOR has significantly
higher precision (see discussion in Section 7).
14
 0
 10
 20
 30
 40
 50
 60
 70
 80
 90
 100
 0  50  100  150  200  250  300
Precision (%)
Training sentences
SCISSOR
SILT-string
SILT-tree
CHILL
Figure 8: Precision learning curves for CLANG.
 0
 10
 20
 30
 40
 50
 60
 70
 80
 0  50  100  150  200  250  300
Recall (%)
Training sentences
SCISSOR
SILT-string
SILT-tree
CHILL
Figure 9: Recall learning curves for CLANG.
Results on a larger GEOQUERY corpus with 880
queries have been reported for PRECISE (Popescu et
al., 2003): 100% precision and 77.5% recall. On
the same corpus, SCISSOR obtains 91.5% precision
and 72.3% recall. However, the figures are not com-
parable. PRECISE can return multiple distinct SQL
queries when it judges a question to be ambigu-
ous and it is considered correct when any of these
SQL queries is correct. Our measure only considers
the top result. Due to space limitations, we do not
present complete learning curves for this corpus.
7 Related Work
We first discuss the systems introduced in Section
6. CHILL uses computationally-complex ILP meth-
ods, which are slow and memory intensive. The
string-based version of SILT uses no syntactic in-
formation while the tree-based version generates a
syntactic parse first and then transforms it into an
MR. In contrast, SCISSOR integrates syntactic and
semantic processing, allowing each to constrain and
inform the other. It uses a successful approach to sta-
tistical parsing that attempts to find the SAPT with
maximum likelihood, which improves robustness
compared to purely rule-based approaches. How-
ever, SCISSOR requires an extra training input, gold-
standard SAPT’s, not required by these other sys-
tems. Further automating the construction of train-
ing SAPT’s from sentences paired with MR’s is a
subject of on-going research.
PRECISE is designed to work only for the spe-
cific task of NL database interfaces. By comparison,
SCISSOR is more general and can work with other
MRL’s as well (e.g. CLANG). Also, PRECISE is not
a learning system and can fail to parse a query it con-
siders ambiguous, even though it may not be consid-
ered ambiguous by a human and could potentially be
resolved by learning regularities in the training data.
In (Lev et al., 2004), a syntax-driven approach
is used to map logic puzzles described in NL to
an MRL. The syntactic structures are paired with
hand-written rules. A statistical parser is used to
generate syntactic parse trees, and then MR’s are
built using compositional semantics. The meaning
of open-category words (with only a few exceptions)
is considered irrelevant to solving the puzzle and
their meanings are not resolved. Further steps would
be needed to generate MR’s in other domains like
CLANG and GEOQUERY. No empirical results are
reported for their approach.
Several machine translation systems also attempt
to generate MR’s for sentences. In (et al., 2002),
an English-Chinese speech translation system for
limited domains is described. They train a statisti-
cal parser on trees with only semantic labels on the
nodes; however, they do not integrate syntactic and
semantic parsing.
History-based models of parsing were first in-
troduced in (Black et al., 1993). Their original
model also included semantic labels on parse-tree
nodes, but they were not used to generate a formal
MR. Also, their parsing model is impoverished com-
pared to the history included in Collins’ more recent
model. SCISSOR explores incorporating semantic
labels into Collins’ model in order to produce a com-
plete SAPT which is then used to generate a formal
MR.
The systems introduced in (Miller et al., 1996;
Miller et al., 2000) also integrate semantic labels
into parsing; however, their SAPT’s are used to pro-
15
duce a much simpler MR, i.e., a single semantic
frame. A sample frame is AIRTRANSPORTATION
which has three slots – the arrival time, origin and
destination. Only one frame needs to be extracted
from each sentence, which is an easier task than
our problem in which multiple nested frames (pred-
icates) must be extracted. The syntactic model in
(Miller et al., 2000) is similar to Collins’, but does
not use features like subcat frames and distance mea-
sures. Also, the non-terminal label CG is not further
decomposed into separately-generated semantic and
syntactic components. Since it used much more spe-
cific labels (the cross-product of the syntactic and
semantic labels), its parameter estimates are poten-
tially subject to much greater sparse-data problems.
8 Conclusion
SCISSOR learns statistical parsers that integrate syn-
tax and semantics in order to produce a semanti-
cally augmented parse tree that is then used to com-
positionally generate a formal meaning representa-
tion. Experimental results in two domains, a natural-
language database interface and an interpreter for
coaching instructions in robotic soccer, have demon-
strated that SCISSOR generally produces more accu-
rate semantic representations than several previous
approaches. By augmenting a state-of-the-art statis-
tical parsing model to include semantic information,
it is able to integrate syntactic and semantic clues
to produce a robust interpretation that supports the
generation of complete formal meaning representa-
tions.
9 Acknowledgements
We would like to thank Rohit J. Kate , Yuk Wah
Wong and Gregory Kuhlmann for their help in an-
notating the CLANG corpus and providing the eval-
uation tools. This research was supported by De-
fense Advanced Research Projects Agency under
grant HR0011-04-1-0007.

References
Daniel M. Bikel. 2004. Intricacies of Collins’ parsing model.
Computational Linguistics, 30(4):479–511.
Ezra Black, Frederick Jelineck, John Lafferty, David M. Mager-
man, Robert L. Mercer, and Salim Roukos. 1993. Towards
history-based grammars: Using richer models for probabilis-
tic parsing. In Proc. of ACL-93, pages 31–37, Columbus,
Ohio.
Borland International. 1988. Turbo Prolog 2.0 Reference
Guide. Borland International, Scotts Valley, CA.
Mao Chen et al. 2003. Users manual: RoboCup
soccer server manual for soccer server version 7.07
and later. Available at http://sourceforge.net/
projects/sserver/.
Michael J. Collins. 1997. Three generative, lexicalised mod-
els for statistical parsing. In Proc. of ACL-97, pages 16–23,
Madrid, Spain.
Yuqing Gao et al. 2002. Mars: A statistical semantic parsing
and generation-based multilingual automatic translation sys-
tem. Machine Translation, 17:185–212.
Daniel Gildea and Daniel Jurafsky. 2002. Automated labeling
of semantic roles. Computational Linguistics, 28(3):245–
288.
Daniel Jurafsky and James H. Martin. 2000. Speech and Lan-
guage Processing: An Introduction to Natural Language
Processing, Computational Linguistics, and Speech Recog-
nition. Prentice Hall, Upper Saddle River, NJ.
Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. 2005.
Learning to transform natural to formal languages. To ap-
pear in Proc. of AAAI-05, Pittsburgh, PA.
Iddo Lev, Bill MacCartney, Christopher D. Manning, and Roger
Levy. 2004. Solving logic puzzles: From robust process-
ing to precise semantics. In Proc. of 2nd Workshop on Text
Meaning and Interpretation, ACL-04, Barcelona, Spain.
Scott Miller, David Stallard, Robert Bobrow, and Richard
Schwartz. 1996. A fully statistical approach to natural lan-
guage interfaces. In ACL-96, pages 55–61, Santa Cruz, CA.
Scott Miller, Heidi Fox, Lance A. Ramshaw, and Ralph M.
Weischedel. 2000. A novel use of statistical parsing to ex-
tract information from text. In Proc. of NAACL-00, pages
226–233, Seattle, Washington.
Ana-Maria Popescu, Oren Etzioni, and Henry Kautz. 2003. To-
wards a theory of natural language interfaces to databases. In
Proc. of IUI-2003, pages 149–157, Miami, FL. ACM.
Ana-Maria Popescu, Alex Armanasu, Oren Etzioni, David Ko,
and Alexander Yates. 2004. Modern natural language in-
terfaces to databases: Composing statistical parsing with se-
mantic tractability. In COLING-04, Geneva, Switzerland.
Lappoon R. Tang and Raymond J. Mooney. 2001. Using multi-
ple clause constructors in inductive logic programming for
semantic parsing. In Proc. of ECML-01, pages 466–477,
Freiburg, Germany.
John M. Zelle and Raymond J. Mooney. 1996. Learning to
parse database queries using inductive logic programming.
In Proc. of AAAI-96, pages 1050–1055, Portland, OR.
Victor W. Zue and James R. Glass. 2000. Conversational in-
terfaces: Advances and challenges. In Proc. of the IEEE,
volume 88(8), pages 1166–1180.
