Multimodal Database Query 
Nicholas J. Haddock 
ttewlett-Packard Laboratories 
Filton Road, Stoke Gifford, 
Bristol BS12 6qZ, U.K. 
nj h©hpl, hp. co. uk 
Abstract 
The paper proposes a multimodal interface for a real 
sales database application. We show how natural lan- 
guage processing may be integrated with a visual, di- 
rect manipulation method of database query, to pro- 
duce a user interface which supports a flexible form of 
query specification, provides implicit guidance about 
the coverage of the linguistic component, and allows 
more focused discourse reference. 
Introduction 
Recently there has been a burgeoning of interest in 
the combination of natural language processing with 
visual and gestural forms of communication. The 
range of research includes the interpretation of com- 
bined linguistic and diagrammatic input (e.g. Klein 
and Pineda, 1990), the generation of multimedia ex- 
planations (e.g. Wahlster et al., 1991), the integra- 
tion of NLP with hypertext (Stock, 1991), and the 
combination of natural language input with point- 
ing (e.g. Kobsa et al., 1986), menus (Tennant ctal., 
1983), and forms (Cohen et al., 1989). The rationale 
for most of this work is that since the different modes 
are best suited to expressing different kinds of infor- 
mation, the expressiveness of the communication can 
be increased by employing a combination of modes 
rather than one in isolation. At one end of the spec- 
trum, this rationale has led to applications in which 
there is a clear dividing line between the function of 
the different modes; for example, in GRAFLOG (Klein 
and Pineda, 1990), the function of linguistic utter- 
ances like This line is a wall is to provide real-world 
interpretations for the parts of a line drawing. Our 
own work lies at the other end of the spectrum. We 
are interested in exploring the power of combining 
natural language and direct manipulation when the 
function and expressive power of the two modes are 
similar. 
The present paper describes this dual approach in 
the context of a specific, real database query appli- 
cation. We have integrated NLP with a visual, di- 
rect manipulation method of database query, in such 
a way that both query modes can express approx- 
imately the same range of queries. Our objective 
in this work is to explore the ways in wlfich the co- 
presence of a direct manipulation interface improves 
the usability and flexibility of the natural language 
interface. The concern of this paper is in the nature 
of our computational proposals, rather than empiri- 
cal evidence about their utility. So far we have per- 
formed a user trial on the direct manipulation inter- 
face alone (Frohlich, 1991), and we intend to perform 
similar experiments with the combined interface and 
a stand-alone version of the NL interface. 
We start with an overview of our application area, 
and then summarise the pertinent features of our di- 
rect manipulation interface. The body of the pa- 
per illustrates how we have integrated NLP with this 
interface, with respect to the available vocabularly, 
the portrayal of user queries, and reference to past 
queries. 
Application Domain 
The target of our application is a relational database 
used by a large UK company to summarise the value 
of their product sales. The main users of the data arc 
sales professionals and their secretarial assistants. ~lb 
date, these users have \[lad two routes of access to thc 
data: they can retrieve fixed-format financial tables 
using a menu-based query system, or they can use 
a rather brittle natural language interface (supplied 
by a third-party vendor). The design of our system 
stems partly from users' experience of these existing 
interfaces. Our user interface is geared to supporting 
tbe following user requirements: it should be simple 
to use; it should support flexible-format views or "re- 
ports" on the data; and it should allow new queries 
to be composed with reference to past queries. 
Although our user interface is a prototype, and 
is not yet in use, we have done nothing to change 
the structure of the underlying relational database. 
The database summarises sales along three dimen- 
sions, or parameters: the product sold, the pur- 
chaser (in this case, retail outlets), and the time of 
sale. In conceptual terms, each dimension forms a 
hierarchy. For example, tim leaves of the purchaser 
or "customer" hierarchy identify trading points (nu- 
meric identifiers corresponding to distinct physical re- 
tail stores); at the next level, these are grouped into 
AcrEs OR COLING-92. NANTES, 23-28 ho~r 1992 1 2 7 4 PRoc. ol: COLING-92, NhgrEs. AUG. 23-28, 1992 
trading concerns (corresponding to the familiar high- 
street names of retail chains); and at the top level the 
trading concerns are grouped into corporate concerns 
(corresponding to the public limited company which 
owns the chain). Similarly, the time hierarchy rep- 
resents financial time periods from leaf "bookweeks" 
up to the financial year, and the product hierarchy 
represents specific product box sizes up to a general 
classification of market areas. 
Direct Manipulation Interface 
Tbe direct manipulation interface represents each 
sales dimension by a visual domain model. For in- 
stance, the temporal domain model (see Figure 1) 
is depicted as a scrollable timeline, demarcated into 
hierarchical time periods. The model presents both 
the concepts (e.g. "year", "quarter") and values (e.g. 
"1990", "Q2-90") which are distinguished in the tem- 
poral dimension. The product and customer domain 
models are similarly displayed as hierarelfica\[ "pick- 
lists". 
Figure 1: Temporal domain model 
Users pose qneries by constructing the format, or 
appearance, of tim report they want to see, using a 
technique we have dubbed "Query by Format". The 
present system supports three types of query in this 
fashion, providing: numeric summaries of sales with 
respect to specific parameters (e.g. the value of sales 
of Krunchy in June), lists of values related to the 
sales parameters (e.g. all trading points which bought 
Krunchy), and details on specific values (e.g. tire tele- 
phone number of trading point 647). To simplify the 
exposition, the remainder of this paper will discuss 
only the first kind of query, requesting summaries of 
sales. 
Figure 2 shows a sales summary table which the 
user has created and then evaluated as a query 
against the database. It represents the sales of 
Krunchy for Oct-90 and for Nov-90, for the stores 
Amlrews and Walkers. This talfie is created by first 
selecting "Create a sales tat)le" from a menu, which 
produces a skeletal table structure without row or ¢ol- 
mnn headings. The nser then specifies the headings 
by gesturally selecting value elements (at any level) 
from the domain models and dropping them into ap- 
propriate positions on the table. Once the user is sat- 
istied with the format of the table, it is submitted to 
the system where it is interpreted as a query against 
the database, and the derived results filled in. The 
Figure 2: Tabular sales query 
table therefore has a standard "intersective" seman- 
tics, in which each cell represents tim summed total 
of sales with respect to its corresponding row aud 
column constraints for examt)le , the top left-hand 
cell in Figure 2 represents the total value of sales of 
Krunchy in Oct-90 to Andrews. 
Each such report created by the user forms a dis- 
tinct window on the screen, and is therefore subject 
to standard window management functions such as 
iconisation. Importantly, the user can return to an 
existing report at any stage and refine it to create a 
new view--by adding, expanding, deleting, or replac- 
ing report headings. 
Integrated Natural Language 
Processing 
Natural language processing is based oil tile Core 
Language Engine (CLE; Alshawi et at. 1989), 
wlfich performs application-indcpendent processing 
from string segmentation and morphological analysis 
through to quantifier scopmg and discourse reference, 
and produces semantic logical forlYm as output. Tbe 
CLE contains "hooks" for al)plication-spceific mod- 
ules, and we have used these to augment the CLE 
with an application-specific lexicm~, a set of rules 
for reference resolution in our domain, and a ,nodule 
for evaluating the logical forms against the relational 
database. The present coverage of the NL system is 
similar to the gestural interface, in terms of the sub- 
ject of the query and the conceptual parameters of 
the query. So we can ask questions about the value 
of sales with respect to any of tbe three sales parame- 
ters (e.g. Show the sales of Krunchy 250g for Q3-90) 
and questions about the sales parameters themselves 
(e.g. What Andrews trading points are there?). It 
goes beyond the graphics in supporting certain forms 
ACRES DE COLING-92, NANTES, 23-28 AOU-r 1992 I 2 7 5 PROC. OF COLING-92. NANTES, AUG. 23-28. 1992 
of request which yield a simple textual response (such 
as yes/no questions). At present, natural language 
queries and direct manipulation queries have separate 
routes of access to the database, and we achieve the 
integration described below by translating between 
the two worlds at certain points in processing. 
The following sections present three ways in which 
we have explored the integration of natural language 
with the direct manipulation interface. In each case 
we argue that the integrated interface is superior to 
stand-alone, teletype-style natural language interac- 
tion. 
Vocabulary 
A well-known problem with natural language inter- 
faces to databases is that the user may be uncer- 
tain about the conceptual scope of the database and 
the supported linguistic coverage (Hendrix, 1982). 
The graphical environment of our NL interface of- 
fers a partial solution to this problem, since the dis- 
played domain models remind the user of the linguis- 
tically available parameters of a sale. In particular, 
each domain model communicates the range of sales- 
parameter concepts and values that can be referred 
to, and in doing so shows one way of expressing the re- 
lated content nominals in linguistic input. For exam- 
ple, the temporal domain model indicates that the un- 
derlined forms in Get Krunchy sales for bweek 13-19 
May-90 are available as lexical expressions. (We also 
allow for synonym~s in the NL, given the tendency to 
refer to "bweek', say, as "week".) 
Note that the domain models are an appropri- 
ate site for lexical reference, since they abstract 
away from the internal structure and content of the 
database in order to provide a user-oriented "view" 
of the data. For example, the temporal model depicts 
information from relational tables as a single hierar- 
chy, and combines distinct database fields to form sin- 
gle values within this hierarchy which are meaningful 
to the end-user. 
Representation of Queries 
The user's natural language dialogue with the system 
is displayed in a separate window in teletype form. 
Yes/no and "how many" questions elicit a simple tex- 
tual response, whereas the response to other queries 
is a textual pointer (e.g. See table 13) to a report 
displayed elsewhere on the screen. The decision on 
presentation style is made by a set of rules indexed 
on the sentence form (e.g. Wtl-question, imperative) 
and the requested class of data. 
Here we will consider those queries which request 
sales figures. Figure 3 shows a numeric summary ta- 
ble that has been produced in response to the input 
Show all sales in Nov-90 to Andrews trading points. 
This linguistically created report is an "live" graphi- 
cal object which has exactly the same tabular seman- 
Figure 3: Tabular representation of Show all sales in 
Nov-gO to Andrews trading points 
tics as if it had been constructed by direct manipula- 
tion. In fact, we can think of the table as a represen- 
tation of the natural language query in the language 
of tabular queries. The table readily expresses the 
linguistic constraint in Nov-90 with the tabular la- 
bel "Nov-90". The intensional expression Andrews 
trading points cannot be represented directly, since 
our tabular restrictions must be extensional values 
from the domain models and not intensional concepts. 
However, we can represent this intensional expres- 
sion indirectly in terms of its semantically equivalent 
extension--i.e, the set of all trading points related to 
Andrews. 
To display a natural language query and database 
response in this form, we must therefore not only re- 
trieve the correct values from the database, but also 
generate row and/or column labels which correctly 
define the values in the table, x In general, this is a 
non-trivial task which ultimately requires sensitivity 
to the given/new information structure of the input; 
our approach uses information derived only from the 
logical form of the query, and corresponding values 
from the database. In the current version of our sys- 
tem, table labels are generated by the module which 
evaluates CLE logical forms against the database, as 
it searches for sales values in accordance with the con- 
straints of the query. For example, to find all sales in 
Nov-90 to Andrews trading points, it searches for all 
sales values such that the following constraints hold: 2 
bmonth = nov - 90 
trading_concern -~ andrews 
trading_point = t 
Here t is a free variable that will match any trading 
point value. Each time we find a matching sale value, 
we record, with this value, the corresponding values 
of the attributes bmonth, trading_concern, aud trad- 
ing_point. This results in a set of tuples of the form 
IAn alternative approach is simply to generate the tabular 
constraints, and rely on tabular query processing to produce 
the a~les totals. We have not explored this approach. 
2Note that the notation here has been simplified for the 
purposes of exposition. 
Acrl~s DE COLING-92, NANTES, 23-28 AOUT 1992 I 2 7 6 PROC. OF COLING-92. NANTEs. AUG. 23-28, 1992 
<sale_value, bmonth, trading-concern, trading.point>, 
such as: 
< 333,nov-90,andrews, 181 > 
< "/03,now90,andrews, 36"/ > 
< 885,nov-90,andrews, 2717 > 
if 181, 367, and 2717 are the only Andrews trading 
poiuts to have bought goods in November 1990. Such 
a set of tuples can then be transformed into a tree 
structure which removes the repetition of values ap- 
parent in the set of tuples. This tree structure corre- 
sponds to a correct labelliug of the table, where each 
node represents a label. 
This approach extends to the representation of a 
class of more complex expressions involving negation, 
coordination and quantification. For example, under 
the wide-scope reading of sales to all trading concerns 
except Andrews and Walkers, we find all trading con- 
cerns such that there is a sale with the following con- 
straints: 
trading_concern = t 
t ~ andrews 
t :~ walkers 
Ilere wc generate a set 
of tuples of the form <sale_value,trading.concern>, 
where trading_concern will vary over all concerns ex- 
cept the excluded stores. 
As an example of a reading which we cannot rep- 
resent in a tabular query, consider Total the sales for 
Jan-90 and Feb-90. ttere we can express the read- 
ing in which two totals are required, mapping to a 
table with one cell for each month. But we cannot 
express the reading where the user would like to see a 
single cell table, corresponding to the summed sales 
of January and February, because such a query can- 
not be specified in the tabular language. Hence, this 
reading is blocked, because the rules which transform 
the set of tup|es {<sale_value,braonth>} into a tree 
structure do not allow distinct values (i.e. jan-90 and 
feb-90) to be represented by a single node in the label 
tree. 
Hence it is not possible to represent all natural 
language queries in our simple intersective, tabular 
language, because the former can be much more ex- 
pressive. However, our interface approach alleviates 
this problem to a good extent, since tabular sales 
summaries are just one of a variety of gestural query 
devices at our disposal (i.e. those mentioned earlier 
in the paper) to express the communicative content 
of a natural language query, and we can add others as 
the need arises. None of our gestural query methods 
have the expressive power of a relational query lan- 
guage such as, say, QBE (Zloof, 1975); rather, we have 
created a set of graphical access methods tailored to 
our target users' needs, which strike a balance be- 
tween expressive power and ease of use. We decided 
on the range of our query devices by analysing the 
transcripts of the users' real sessions with the exist- 
ing natural language front-end, in addition to other 
forms of analysis, such as interviews with target users. 
Given that natural language and direct manipula- 
tion both yield the same tabular output, what is the 
advantage of supporting two modes of query, rather 
than one? First, the user can build-up a table us- 
ing whichever mode or combination of modes is most 
productive. By combining the presentation of gestu- 
ral and linguistic queries, a linguistically generated 
report can be refined and extended by the direct ma- 
nipulation operations described earlier. In the follow- 
ing section we will see how natural language query 
can extend an existing table, completing the circle 
of mixed-mode dialogue. Many of the distinguish- 
ing and productive features of natural language docu- 
mented by Walker (1989), such as coordination, nega- 
tion and quantification, can he beneficially applied 
to ti~e present task. So a user may start a query 
table with the request Find sales for the \]irst week 
in every month, exploiting the rich quantificational 
structure of English to swiftly generate a set of labels 
that would take a good many point-and-click actions. 
The user may then wish to fitrther parameterise this 
query with certain products which are best selected 
visually (perhaps because their spelling is difficult to 
remember). 
Second, our user studies in this and other domains 
have shown that there are differences in the prefer- 
ences of individual users. Some users simply prefer 
the feel of, say, natural language interaction, for rea- 
sons which are difficult to explicate with thcoretical 
factors such as those above, llere the user may issue 
even gesturally simple commands like Open front the 
linguistic command line. 
Reference to Past Queries 
Given the exposition so far, we call construct a new 
table with either natural language or gestures, but 
can only modify an existing table using direct manip- 
ulation. To complete this picture of multimodal dia- 
logue, wc must also allow linguistic queries to refer to 
past tables and from them specify modified versions. 
In our application domain, we expect users to have 
several reports open on their screen at any one time, 
and identifying an individual table solely in ter~rm of 
the content of a referring expression can be arduous, 
if not impossible. One option is to name the report 
using its unique identifier. In common with many 
other systems (e.g. Kobsa et al., 1986), we also allow 
the user to refer to aJt object by pointing to it. Our 
present system only allows one such deictic reference 
per sentence, and behaves &q follows. At any point, 
the user can click on a button on a table to make it the 
"context" for the next linguistic query. If this action 
occurs, an anaphoric referring expression, like $hese 
Andrews sales in Show these Andrews sales as a bar 
chart, is then taken to refer directly to the contextual 
ACIES DE COLING-92, NAN'IT.S, 23-28 AOU-V 1992 l 2 7 7 PROC. OF COL1NG-92. NANTES, AUG. 23-28, 1992 
table, assuming that the content of the expression 
(in this case, Andrews sales) does not contradict the 
content of the table. However, if the context button is 
pressed and then followed by the use of a definite noun 
phrase, as in Show the sales for Jan-gO, then the table 
is seen as providing the universe of reference (i.e. the 
set of sales specified by the table) for the sales for 
Jan-gO, rather than the referent itself. In this case, 
then, the query will yield a new table which combines 
the constraints of the contextual table with the Jan- 
90 constraint provided by the definite NP. 
For completeness, the primary objects arising in 
the discourse are tracked using the CLE's discourse 
model, which is based on a salience-ranking view of 
discourse reference. We track all tables that the user 
has created and evaluated, whether through direct 
manipulation or natural language, rather than just 
those arising from the linguistic dialogue. If a sales 
table is uniquely salient (because, say, it was the 
most recently created) in the discourse model, then 
an anaphoric expression such as these sales will be 
taken as referring to this table (without the need for 
pointing), and the query Which of these sales is for 
Walkers? will accordingly produce a new table ex- 
tended with the label "Walkers" 3 
In the future we intend to experiment with schemes 
where more general gestural actions affect the salience 
of objects in the discourse model. For example, "low- 
ering" or iconising table windows on the screen could 
reduce their salience rating, whereas "raising" them 
would increase it. Although this is an over-simplified 
view of how to track the user's focus of attention, 
such schemes would give the user the potential for 
more explicit control over the working set of objects 
available for reference. 4 This scheme, and our imple- 
mented treatment of multimodal reference, therefore 
support a flexible form of discourse reference which is 
more unnatural to attain in teletype-style linguistic 
dialogue. 
Conclusion 
The previous sections have explored the consequences 
of a thorough integration of a gestural method of 
query and natural language query in the context of 
a specific database application. We considered the 
case where the coverage of the two styles is similar, 
in terms of the range of expressible queries, and have 
demonstrated that several benefits acrue from this in- 
tegration. First, by translating a user's NL query into 
a graphical query, we support a flexible approach to 
the specification of the query, in which the user can 
SCurrently, lash'tinge can add but not alter table labels, as 
would be required for the elliptical reading in which Walkexa 
replaces an exl-tinlg store name. 
t Cohen et al. (1989) advocate a similar technique in which 
the user can direct manipulate the visually displayed tree struc- 
ture of the discotm,e. However, in our proposal the discourse 
st~cture :- inherent in the visual layout of the screen, without 
the presentation o/meta-\[evel information about the dialogue. 
employ whichever combination of modes is best suited 
to the "micro" tasks involved in query specification. 
Second, the visual interface gives implicit guidance 
to the user as to the coverage of the natural language 
interface. Third, the use of direct manipulation to 
focus discourse reference offers a more flexible dia- 
logue structure than found in a pure NL interface. In 
the future it would be profitable to empirically assess 
these present implemented proposals, and investigate 
further computational issues, such as the generation 
of linguistic descriptions of gestural actions. 
Acknowledgements 
Many of the ideas discussed in this paper were devel- 
oped through conversations with Andrew Nelson, who 
also implemented the algorithm for mapping relations into 
minimal-node trees. The direct manipulation interface 
described here was implemented by Andrew Nelson and 
Steve Loughran. Thanks also to David Adger, Phil Stew 
ton, David Frohlich and Lyn Walker. 
References 
Alshawi, lI., Carter, D. M., van Eijck, J., Moore, R. C., 
Moran, D. B., Pereira, F. C. N., Pulman, S. G. and 
Smith, A. G. \[1989\] Research Programme in Natu- 
ral Language Processing: Final Report. SRI Project 
No. 2989, SRI International Cambridge Computer 
Science Research Centre, Cambridge, U.K. 
Cohen, P. R., Dalrymple, M., Moran, D. B., Pereira, F. 
C. N., Sullivan, J. W., Gorgon, R. A., Schlossberg, J. 
L. and Tyler, S. W. \[1989\] Synergistic Use of Direct 
Manipulation and Natural Language. In Procs. of 
CHI-89, Austin, Texas, May 1989. 
Frohlich, D. \[1991\] Evaluation of the SAMI System Pro- 
totype. Technical Report No. HPL-92-17, Hewlett- 
Packard Laboratories, Bristol, U.K. 
Hendrix, G. \[1982\] Natural Language Interface. Compu- 
tational Linguistics, Vol. 8, No. 2. 
Klein, E. and Pineda, L. A. \[1990\] Semantics and Graph- 
ical Information. In Procs. of INTERACT-90. 
Kobsa, A., Allgayer, J., Reddig, C., Reithinger, N., 
Schmauks, D., Harbusch, K., and Wahlster, W. 
\[1986\] Combining Deictic Gestures and Natural Lan- 
guage for Referent Identification. In Procs. o\] 
COLING.86, Bonn, Germany. 
Stock, O. \[1991\] Natural Language and the Exploration 
of an Information Space: the ALFresco Interactive 
System. In Procs. of 1JCA1-91, Sydney, Australia. 
Tennant, H., Ross, K., Ssenz, R., Thompson, C., and 
Miller, J. \[1983\] Menu-based Natural Language Un- 
derstanding. In Procs. of £1st ACL, Cambridge, 
Mass., June 1983. 
Wahlster, W., Andre, E., Graf, W., and Rist, T. \[1991\] 
Designing Illustrated Texts. In Procs. o.f 5th EACL, 
Berlin, Germany, April 1991. 
Walker, L. \[1989\] Natural Language in a Desktop En- 
vironment. In Procs. o.? HCI International '89, 
Boston, Mass. 
Zloof, M.. \[1975\] Query by Example. In Procs. of AFIPS 
44 NCC, 1975. 
ACRES DE COLING-92, NANTES, 23-28 Aofrr 1992 I 2 7 8 FROC. OF COLING-92, NAlcrt/s, AlJo. 23-28. 1992 
