PROCESSING COMPLEX NOUN PHRASES IN A NATURAL 
LANGUAGE INTERFACE TO A STATISTICAL DATABASE 
Fred POPOWlCH, Paul MCFETRIDGE, Dan FASS, Gary HALL 
School of Computing Science / Centre for Systems Science 
Simon Fraser University, Burnaby, B.C., Canada V5A 1S6 
Abstract 
Analysis of a corpus of queries to a statistical database 
has shown considerable variation in the location and 
order of modifiers in complex noun phrases. Never- 
theless, restrictions can be defined on nominal mod- 
ification because of certain correspondences between 
nominal modifiers and the role they fulfill in a statisti- 
cal database, notably that the names of database tables 
and columns, and values of columns, are all determined 
by the modifiers. These restrictions are described. In- 
corporating these restrictions into Head-Driven Phrase 
Structure Grammar (HPSG) has caused us to examine 
the treatment of nominal modification in HPSG. A new 
treatment is proposed and an implementation within an 
HPSG based natural language front-end to a statistical 
database is described. 
1 Introduction 
A prototype natural language front-end to statistical 
databases is being developed as part of an Execu- 
tive Information System for Rogers Cablesystems, a 
Canadian cable television company. The initial target 
database is the Rogers Technical Operations Database, 
a relational database containing statistical data describ- 
ing aspects of the company's business related to cus- 
tomer service. 
The front-end employs an HPSG chart parser. There 
axe numerous variations of HPSG; we have chosen 
\[PS87\] since it is the most familiar and widely pub- 
lished. Our results can be extended to other variations. 
In the spirit of HPSG, we have avoided a proliferation 
of grammar roles and kept them highly schematic. 
In developing the grammar for the queries in our 
corpus, we encountered a selection of interesting noun 
phrase constructions which caused us to examine the 
treatment of adjunct modification of nominals within 
HPSG. This has resulted in a proposal which should 
be of interest to other researchers developing natural 
language interfaces. 
2 Complex NPs in Queries 
We began the project by collecting a corpus of 68 En- 
glish language queries from three senior executives at 
Rogers. Our corpus contains constructions paradig- 
matic of a wide selection of natural language queries 
that the executives would like to pose to their database. 
A selection of these queries are shown in (1-6). 
(1) Give me the we.stem region outage log summary. 
(2) Give me the system reliability performance. 
(3) Compare the basic service problem statistics per 
thousand customers. 
(4) Compare the terminal equipment problems. 
The sentences contain complex NP constructions 
and there is a large amount of variation with respect 
to the location and ordering of the modifiers. For 
example, most pre-nominal modifiers may also appear 
as post-nominal modifiers. 
(5) Vancouver system reliability performance 
(6) system reliability performance for Vancouver 
Prepositional phrases like for Vancouvercan be viewed 
as an abbreviated form of the prepositional phrase for 
the Vancouver division. 
The NPs within these sentences contain a great deal 
of syntactic ambiguity. Consider the complex NP in 
(1). The adjective western can either modify region or 
outage or log or summary. Similarly, region could 
modify any of the nominals appearing to its right. 
However, much of this syntactic ambiguity does not 
ACRES DE COLING-92. NANTES, 23-28 AOt3"r 1992 4 6 I'ROC. OI: COLING-92, NANTES. AUO. 23-28, 1992 
have a semantic interpretation in the database seman- 
tics. For example, (1) has only a single interpretation 
although there are numerous syntactic analyses. 
We have gone into detail about the corpus to show 
the rich structure of noun phrases and to motivate the 
reasons for the design choices in our semantics and 
grammar. 
3 Complex NPsin HPSG 
3.1 Overview of HPSG 
HPSG is one of the best known uni fication-based gram- 
mar formalisms. It employs attribute value matrices 
(called signs) to represent lexical entries, grammar 
rules and principles. HPSG borrows freely from other 
formalisms. For example, the treatment of syntactic 
categories, syntactic features, and sonre of the prin- 
ciples are from generalized phrase structure grammar 
(GPSG) \[GKPS85\]. The main syntactic categories in 
HPSG are heads (the head constituents of phrases), 
adjuncts (traditionally called modifiers) and comple- 
ments (traditionally called arguments). The principles 
of HPSG include the Constituent Order Principle, Sub- 
categorization Principle, Head Feature Principle, and 
Semantics Principle. 
HPSG contains three grammar rules for combining 
heads with complements. 
(7) \[SUBCAT (\[ \])\] --, H\[LEX+, INV-\], C" 
(8) \[SUBCAT( )\] ~ a\[LEX-\], C 
(9) \[SUBCAT( )\] --, H\[LEX+, INV +\], C* 
One rule (7) combines a lexical head with everything 
but its final complement. This rule can also be used 
to convert a lexical head requiring only a single com- 
plement into a non-lexical constituent still requiting 
a single complement. Another rule (8) combines a 
non-lexical head with its final complements. Yet an- 
other rule (9) works for inverted constructions: those 
involving a lexical head that is marked for inversion. 
As in GPSG, generalizations about the relative order 
of sister constituents is factored out of the phrase struc- 
ture rules and expressed in independent linear prece- 
dence (LP) constraints. The LP constraints are used 
by the Constituent Order Principle. HPSG roles are 
immediate dominance (ID) rules. Consequently, a sin- 
gle ID rule of the form X --, HA could describe a 
head constituent H either preceded or followed by an 
adjunct A -- the relative ordering of H and A is deter- 
mined by the LP constraints. 
3.2 Issues in the Treatment of Adjuncts 
Nominal modification is treated ill HPSG by having 
heads that contain a set valued feature called AD- 
JUNCTS \[PS87\]. Each element of this set is a sign 
which describes a potential adjunct. For instance, the 
ADJUNCTS feature for a noun will contain an entry 
for adjectives, one for nouns, one for prepositional 
phrases and one for verb phrases. 
An alternative, which was also discussed in \[PS87\] 
and has been adopted in other grammar formalisms 
(e.g., \[Usz86, CKZ88\]) and some variations of HPSG 
\[Coo90, Po191\], is to allow adjuncts to select their 
heads, t The head feature called HEADS contains a 
set of descriptions, one for each construction that can 
be modified by the adjunct. For example, the HEADS 
feature for an adjective will contain a sign for a noun. 
In our corpus, a head has more possible classes 
of modifiers than modifiers have classes of possible 
heads. For example, the set of modifiers for NPs and 
Ns (i.e., NPs lacking determiners) includes adjectives, 
nominals, PPs and even VPs (relative clauses). In §3.4 
we shall see that each of these modifiers can have only 
one or two possible heads. Furthermore, the task of 
reducing the size of the HEADS or ADJUNC'rS set, 
by discovering common semantic features for which 
a constituent can select, meets with greater success if 
modifiers select their heads. That is, one is more likely 
to find commonality among the constituents which an 
adjunct can modify than among the modifiers which a 
head can take. Selections of heads by adjuncts permits 
a greater range of subcategorization to be specified 
through default inheritance rather than explicit speci- 
fication. 
Some aspects of adjunct semantics arc impossible 
if adjuncts are selected by heads rather than heads se- 
lected by adjuncts. Predicates, both adjectives and 
verbs, have argument structure which coerces their ar- 
guments into thematic roles. For exanlple, the adjec- 
tive modern imposes on its argument the thematic role 
of Theme. ~ It is not obvious how the nonrinal argu- 
ment of the adjective receives its thematic role unless 
it is the adjective which selects the nominal, parallel to 
the assignment of thematic roles by verbs to their NP 
arguments. If modern selects its head, then the the° 
matic role of the head may be specified in the HEADS 
I Cooper ICoo90, Ch.3, §6\] looks in some detail at the arguments 
in favour of adjuncts selecting their heads. 
2In \[Po191, §1.3\], Pollard and Sag introduce semantic features 
like AGENT, GOAL andTHEME within the feature structure con- 
taining the semantic CONTENT. 
ACl'ES DE COL1NG-92, NANTI.:S, 23-28 AOt~T 1992 4 7 PROC. OF COLING-92, NA~rn~S, Auo. 23-28, 1992 
attribute and inherited by the head when it unifies with 
the HEADS attribute. If instead, heads subcategorize 
for their adjuncts, this information must be inherited in 
some other fashion, perhaps through structure sharing 
from the adjuncts list. 
The problem and its solution are evident when 
derivational morphology are considered. The verb 
read imposes the thematic role of Agent (Ag) on its 
subject and the thematic role of Theme (Th) on its ob- 
ject. When this verb is coerced into an adjective by 
the derivational suffix -able, the resulting adjective as- 
signs the thematic role of Theme to its argument. If 
adjectives select their heads, then the derivational rule 
is evident. 
(lO) V\[SUBCAT (NPTh, NPAg)\] 
Adj q-"able"\[HEADS {NTh}\] 
Given that adjuncts will select their heads, a gram- 
mar role for adjuncts can be stated most concisely 
if we combine a head with a single adjunct at a 
time. Thus, our constituent structures will contain an 
ADJUNCT-DTR feature which will take the adjunct as 
its value, rather than a list-valued ADJUNCT-DTRS 
feature which would take a list of adjuncts as its value. 
A head that is modified by more than one adjunct will 
require more than one application of the grammar rule. 
One disadvantage of this approach is that a com- 
plex nominallike system reliability for Vancouverwill 
have two analyses: one where the PP for Vancouver 
modifies the head noun reliability and another where 
it modifies the head nominal system reliability. If the 
adjuncts rule combined a head with all of its adjuncts 
at the same time, there would be only one analysis. 
However, one could argue that there should be two 
interpretations for the phrase and that both should be 
reflected in the grammar. Pollard and Sag note that 
"there is evidence that noun-noun and adjective-noun 
structures share some syntactic properties with lexical 
nouns as opposed to typical common noun phrases, 
e.g. they can occur themselves as modifiers in noun- 
noun structures" \[PS87, p.73\]. They propose ana- 
lyzing noun-noun and adjective-noun constructions as 
\[LEX +\] even though they have internal structure. By 
adopting this treatment of complex noun phrases, we 
can prevent analyses for ungrammatical constructions 
like system for Vancouver reliability, plus we can pre- 
vent ambiguity in the analysis of phrases like system 
reliability for Vancouver. In our grammar we introduce 
two rules for adjuncts, which are designed to give wide 
coverage and to avoid spurious ambiguities. 
3.3 Two Rules for Adjuncts 
One adjunct grammar rule is required for combining 
saturated lexical adjuncts with their heads. That is, for 
lexical adjuncts which have empty subcategorization 
lists, like adjectives, proper nouns (specifically, the 
proper nouns corresponding to months and cities) and 
adverbs. The rule will be restricted so that it will 
apply to phrases with unsaturated heads. Heads that 
fall into this category are Ns, PPs, 3 VPs, and APs. The 
specific pairing of adjuncts to heads is determined by 
the HEADS feature of the adjunct (§3.4). Additionally, 
if the head modified by the adjunct is marked \[LEX +\] 
then the resulting constituent will also be \[LEX +}, thus 
implementing the analysis of adj-noun and noun-noun 
constructions discussed in the previous section. Using 
the schematic notation for grammar rules introduced 
in \[PS87\], we can present the rule as shown in (11). 
(11) \[SUBCAT(\[\]), LEX \[~\]\] --+ H\[LEX E\]\], 
A\[SUBCAT (), LEX +, HEADS {...H...}\] 
Note that the two appearances of \[\] in (11) indi- 
cate that the head and the resulting constituent share 
the same value for their LEX features. The Subcate- 
gorization Principle will ensure that the head and the 
resulting constituent will have the same value for their 
SUBCAT features. Since the grammar rule is an ID 
rule, it does not place any restriction on the linear or- 
dering of the head (H) and adjunct (A). This rule is 
designed so that it applies before a head is combined 
with its final complement (8). It can be viewed as 
the HPSG counterpart to the adjunct rule from X-bar 
theory \[Cho82\] shown below, where the ADJUNCT is 
required to be lexical and not subcategoriz, e for any 
arguments. 
(12) X ---, X ADJUNCT 
in order for heads to be modified by unsaturated 
adjuncts, we propose a second grammar rule. 
(13) \[SUBCAT(\[\]), LEX \[\]\] ~ H\[LEX \[\]\], 
A\[SUBCAT (\[\]), LEX ~\], 
HEADS {...H...}\] 
~Like \[PS87, p.70\], we propose that propositions have two 
elements on their subcategorization list, the first being the prepo- 
sitional object and the second its subject. A PP is obtained by 
combining a preposition with its object NE We do not propose 
lexical entries for prepositions having only the object NP on its 
SUBCAT list since this would complicate the LP roles (~3.5) and 
grammar rules (7) and (8). 
AO'ES DE COLING-92, NANTES, 23-28 AO(rl" 1992 4 8 PROC. OF COLING-92. NAI'rrEs, AUG. 23-28, 1992 
Rule (13) requires the adjunct to have a single ele- 
ment in its SUBCAT list, thus allowing PR VP and 
modiliers to modify PPs, VPs and Ns. Of course, 
the contents of the HEADS feature will restrict the ap- 
plicabillty of this role (fi3.4). Unlike rule (11) which 
allowed a lexical adjunct to modify either a lexical or 
non-lexical head, rule (13) requires the head, adjunct 
and resulting constituent to possess the same values 
for their LEX features, as reflected by the coindexing 
wilh \[j_-\]. With this role, a "lexical" compound noun 
can modify a lexical noun to yield a "lexical" com- 
pound noun (e.g., N -~ N, N), or a (non-lexical) PP 
can nmdify a non-lexical nominal m yield a non-lexical 
nonrinal (N -~ N, PP). 
Direct consequences of our two adjuncts ndes are 
that prepositions and verbs are not allowed to modify 
anything (these have two or nrore elements in their 
SUBCAT lists), sentences or complex noun phrases 
cannot appear as adjuncts, and NPs, Ss, adjectives, 
verbs and prepositions cannot be modi fled by anything. 
Our grammar does not prevent nouns from being mod- 
ified, since rule (7) can be applied to a lexical noun to 
yield a non-lexical nominal (essenti',dly, N ~ N). If 
we "allowed full NPs or Ss to be modified, the result 
would be a syntactic ambiguity which would not have 
any semantic relevance. 
3.4 The HEADS Feature 
The applicability of the two adjuncts grammar roles 
is restricted by the value of the HEADS feature of 
the adjunct. For prepositions (lexical entries with 
SYNILOCIHEADIMAJ = P), the value of the HEADS 
feature will be a set containing a sign for N con- 
stituents (N\[SUBCAT (\[\]), LEX-\]) and a sign for 
VP constituents. 4 Lexical entries for nouns and adjec- 
tives will have a single element in their HEADS set. 
It will contain a sign for lexical nouns, which inchtdes 
compound nouns (N\[SUBCAT (\[ \]), LEX +\]). We are 
proposing that pre-nominal lnodifiers, like adjectives 
and (compound) nouns, will be combined with their 
head nouns before post-nominal modifiers, like PPs. 
We adopted this decision because applying modifiers 
in different orders does not result in any difference 
in the resulting semantic interpretation. Specifically, 
the semantic representation associated with \[the \[lsys- 
tern reliability\] for Vancouverl\] is the same as that 
4In our corpus PPs do not appear to nlodify any VPs, so we can 
actually simplify the HEADS feature so that it contains only the N 
entry. 
for \[\[the \[system mliabilityl\] lot Vancouverl and \[the 
Isystem \[reliability tbr Vancouverlll. With our pro- 
posal, we obtain only one analysis tot the phrase dis- 
cussed alx~ve. Finally, in order to allow relative clauses 
(MAJ=V), we need only propose that they contain a 
sign for N in their HEADS set. Ttms, we effectively 
treat relative clauses like restrictive relative clauses. 
As was the case with PP adjtmcts, the same seman- 
tic representation is obtained regardless of whether the 
relative clause modifies an N (restrictive relative) or 
an NP (non-restrictive relative). 
3.5 Linear Precedence 
We adopt the same LP constraints for heads and com- 
plement danghters as proposed in \[PS87\]. Lexical 
heads are required to precede their cmnplement(s), 
while non-lexical heads tollow their complement(s). 
Sister cmnplements appear in the reverse order of their 
appearance in tim SUBCAT list of flmir head. The I,P 
constraints lot adjuncts require signs with MAJ-A or 
MAJ:N (+N categories in terms of the chtssification 
present in \[Cho821) to precede their beads, while ad- 
juncts with MAJ=V or MAJ=P (-N categories) are 
required to follow their heads. Thus adjectives and 
nominal modifiers will precede the nouns they modify, 
while PPs and relative clauses will follow the con- 
stituenls they modify. 
3.6 Semantics 
Due to the close relationship between syntax and se- 
mantics in HPSG, we can avoid syntactic ambiguities 
which do not con'espond to distinct semantic analyses. 
Semantic infomlation, consisting n fTYPE and content 
(CONT), can be used to prevent ceIl.ain analyses. The 
TYPE of a complex constituent will be tbe san~e as that 
of its head. The Semantics Principle is responsible for 
creating the CONT of a complex constituent from that 
of its daughters (suhconsfituents) \[PS871. We adopt a 
version of this principle for building up semanlic in- 
formation for database stntctures, which we call the 
Database (DB) Semantics Principle \[McE911. 
We incorporate selectiomd restrictions based on a 
semantic type hierarchy which incoq~orates aspects of 
the database design. The Rogers Technical Opera~ 
tions Database is a statistical database; that is, each 
table in the database contains one or motx: category at- 
tributes (columns) whose values define sets of entities 
of a single type, and one or more statistic attributes 
(columns) whose values smnmarizc these sets. The 
AC'I~!S DP; COLING-92, NANrI~S. 23-28 AO\[TI' 1992 4 9 PROC. OF COLING 92, NANrI~S, At;(;. 23-28, 1992 
stype sset ,/~ time Io(: . . . 
Figure 1 : Semantic Type ttierarchy 
complex noun phrases used in natural language queries 
to this database consist of nominals, or nominal mod- 
ifiers which belong to five general classes: statistical 
type (stype), statistical set (sset), entity set (eset), mod- 
ifier (mud) and pre-moditier (pmod). Each of these 
classes may be divided into subclasses using informa- 
tion from the conceptual database design. These five 
classes are arranged in a semantic type hierarchy as 
shown in Figure 1. Using this hierarchy, we can incor- 
porate selectional restrictions into the HEADS feature 
of modifiers. Nouns like summary, sum, and ratio are 
used to refer to particular (sets of) statistics. Members 
of the sset class (e.g., log, performance, activity) may 
be used to modify stypes. Nouns from the sset class 
may be semantically vacuous, that is, we assume that 
all requests are forsome set of statistics and these nouns 
may not carry any information that can help identify 
the particular statistics sought by a user. We allow 
(compound) nouns within the eset class (e.g., problem, 
outage, call, reliability) to modify (compound) nouns 
of type star (i.e., sset or stype). Adjuncts of type rood 
may modify subclasses of eset. For example, a user 
can request either system reliability .statistics or service 
c~dls. The type proud may modify other modifiers and 
selected types of eset. 
The selectional restrictions distilled from our type 
hierarchy are by themselves not powerful enough to 
eliminate all of the "spurious" ,ambiguities. Just as 
we can use the TYPE feature from the semantics of 
the sign, we can also use the CONT to restrict possible 
analyses. To do this, we have modified the DB Seman- 
tics Principle with an Adjunct Contribution Constraint 
so that an adjunct is required to contribute semantic 
information to a head-adjunct constituent -- in partic- 
ular, adjuncts must contribute references to database 
constructs -- hence the constraint disallows semanti- 
cally vacuous adjuncts from combining with a head. 
A complex constituent like outage log summary, in 
which outage has semantic content but log makes no 
contribution of database information, would have only 
one analysis. The noun log would not be allowed to 
Sent Parse Total Edges 
(1) 14 (33) 19 (43) 99 (153) 
(2) 5 (6) 7 (8) 58 (65) 
(3) 12 (21) 16 (27) 96 (125) 
(4) 5 (5) 8 (8) 60 (60) 
Table 1: Parsing Performance 
modify summary, but outage could modify log, and 
then outage log could modify summary. 
4 Implementation 
Our treatment of complex NPs has been incorporated 
into the SX natural language interface \[MC90\]. The 
SX system uses grammar developed within the HPSG- 
PL grammar development system \[PV91a\]. The se- 
mantic representations built up by an HPSG parser are 
directed to a module which converts them into an SQL 
query. The query can then be directed to an Oracle 
database to obtain the requested information. 
SX makes use of chart parsing implementations of 
HPSG developed in LISP by McFetridge \[MC90\] and 
in Prulog by Popowich and Vogel \[PV91 b\]. Chart pars- 
ing is a type of parsing in which all syntactic structures 
which are built are placed on a single graph struc- 
ture called a chart. Nodes in the chart correspond to 
positions in an input sentence, with edges between the 
nodes describing analyses ofsubstringsofthe input. A 
successful parse corresponds to an edge that spans the 
entire input sentence. The performance of the Prolog 
parser on sentences (1)-(4) are summarized in Table 
1. For each sentence, the table shows the time in CPU 
seconds for obtaining the first parse (Parse) and for 
searching for all possible interpretations (Total). The 
table also contains the number of edges created by 
the chart parser while searching for these interpreta- 
tions. To illustrate the effect of the Adjunct Contribu- 
tion Constraint discussed in §3.6, Table 1 also shows 
(in brackets) the number of edges and CPU times when 
this constraint is not used. The tests were performed 
on a SUN SPARCstation 1 running Quintus Prolog 3.0. 
5 Discussion 
Natural language interfaces to statistical databases are 
still rare but, with the growing interest in Executive In- 
formation Systems and increasing needs of executives 
to have immediate access to summary (i.e., statistical) 
ACYES DE COL1NG 92. NANTES. 23-28 AOt';l" 1992 5 0 PROC. OF COLING-92, NANTI'S, AUG. 23-28, 1992 
566I '8~;-E5 "9flV 's':/J~,/VN 'E6-ONIqOD :to '3o~1 d I S E66I L qov 8E-~E 'SaJNVN 'E6-ONIqOD .3o s~J3V 

References

Noam Chomsky. Lectures on Government and Binding, the Pisa Lectures, 2nd Edition. Foris Publications, Dordrecht, Holland, 1982.

Jo Calder, Ewan Klein, and Henk Zecvat. Unification categorical grammar: A concise, extendable grammar for natural language processing.  COLING 1988.

Richard Cooper. Classification-based Phrase Structure Grammar: An Extended Revised Version of HPSG.  PhD Thesis.

Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan Sag.  Generalized Phrase Structure Grammar.  Basil Blackwell, 1985.

Diana Hwang. IBS to unveil EasyTalk software.  Digital News, April 17th, 1989.

Paul Mcfetridge and Nick Cercone. The evolution of a natural language interface: Replacing a parser.  Proc. Computational Intelligence.  1990.

Paul McFetridge. Processing English database queries with head-driven phrase structure grammar.  PRoc. of 2nd Japan-Australia Joint Symposium on NLP. 1991.

Carl Pollard. Topics in Constraint-Based Syntactic Theory. Third European Summer School in Language, Logic, and Information. 1991.

Carl Pollard and Ivan Sag. Information-Based Syntax and Semantics, Volume 1: Fundamentals. Centre for the Study of Language and Information, 1987.

Fred Popowich and Carl Vogel. The HPSG-PL system. Tech Report CSS-IS-TR-91-08.

Fred Popowich and Carl Vogel. A logic based implementation of head-driven phrase structure grammar. In C.G. Brown and G. Koch, eds. Natural Language Understanding and Logic Programming III. 1991.

Hans Uszkoreit. Categorical unification grammars. COLING 1986.
