FIELD TESTING THE TRANSFORMATIOHAL 
qUESTION AHSWERIHG (TqA) SYSTEM 
S. R. Patrick 
~DM T.J. Watson Reseorch Center 
PO BOX 218 Yorktown Heights, NQW York 10598 
The Transformatlonal question Answering (TqA) system 
was developed over a period of time beginning in the 
early part of the last decade and continuing to the 
present. Its syntactic component is a transformational 
grammar parser \[1, 2, ~\], and its semantic Gomponqnt is 
a Knuth attribute grammor \[~, 5\]. The combination of 
these components providQs sufficiQnt generality, 
conveniQnga, and efficiency to implement a broad range 
of linguistic models; in addition to a wide spectrum of 
transformational grammars, Gilder-type phrase 
structure grammar \[6\] and lexigal functional grammar 
\[7\] systems appear to be cases in point, for example. 
The Particular grammar Nhich was, in fact, developed, 
however, was closest tO those of the genQrative 
semantics variety of trsnsformationel grammar; both the 
underlying structures assigned to sQntences and the 
transformations employed to effect that assignmQnt 
traced their origins to the generative semantics model. 
The system ~orks by finding the undQrlying structures 
corresponding tO English queries through the use of the 
transformational parsing facility. Those underlying 
structures are then translated to logical forms in a 
domain relationol calculus by the Knuth attribute 
grammar component. Evaluation of logical forms with 
respect to a given data base completes the 
question-answering process. Our first logical form 
evaluator took the form of a toy implementation of a 
relational data base system in LISP. We soon reelaced 
the low level tuple retrieval facilities of this 
implementation with the RSS (Relational StorogQ System) 
portion of the IBM System R \[8\], This version of logicol 
form evoluation was the one employed in the field 
testing to be dQscribed. In a more recent version of the 
system, however, it has been replacod by a translation 
of logical forms, first to equivalent logical forms in 
a set domain relational calculus and then to 
appropriate expressions in the 5el language, SystQm RIs 
high level query language. 
The first data base to which the system was applied was 
one concerning business statistics such as the sales, 
earnings, number of employees, etc. of 60 large 
companies over a five-year Period. This was a toy data 
base, to be sure, but it was useful tO US in developing 
our System. A later dota base contained the basic land 
identification records of about 10,000 parcels of land 
in a city nQar our research center. It WaS developed for 
use by members of the city planning departmQnt and 
(less frequently) other departments to answQr questions 
concerning the information in that file. Our purpose in 
making the 1system available to those city employees 
was, of course, to provide access to o data base of real 
interest to a group of users and to fiQld test our 
system by evaluating their usa of it. Accordingly, thQ 
TqA system was tailored to the land usa file 
oppltcation and installed at City Hall at the and of 
1977. It remained there during 1978 and 1979, during 
which time it WaS used intormittently as thQ need arose 
for ad hoc cuQry to supplement thQ report generation 
programs that were already available for the extraction 
of information. 
Total usage of the system Was less than we had expected 
would be the case when We made the decision to proceed 
with this application. This resulted from a number of 
factors, including a change in mission for the planning 
department, a reduction in the number of people in that 
dQpartment, a decision tO rebuild the office space 
during the period Of usage, and a degree of 
obsolescence of the data due to the length of time 
between uodatQS (which were to have been supplied by 
the planning department). During 1978 a total of 788 
queries were addressed to the system, and during 1979 
the total ~as 210. Damerau \[9\] giVQS thQ distribution 
of these quQries by month, and he alSO breaks thQm down 
by month into a number of different ¢atQgories. 
DamQPaU'S report of the gross performance statistics 
for the year 197~, ~nd a similar, as yet unpublished 
report of his for 1979, contain a WQaith of data that I 
will not attempt to include in this brief note. Even 
though his reports contain a large quantity of 
statistical performance data, honorer, there are a lot 
of important observations which can only bQ made from a 
detailed analysis of the day-by-day transcript of 
system usage. An analysis of sequences of related 
ouastions is a case in point as is an analysis of the 
attempts of users to phrase nQW queriQ5 in response tO 
failure of the system to procoss certain SQntances. A 
papQr in preperatlon by Plath is concerned with 
treating thesQ end similar issues with the care and 
detail which they ~arrsnto Time and SpaCQ 
considerations limit my contrlbution in this note tO 
just highlighting SOmQ of the major findings of DamQrau 
and Plath. 
Consider first a summary of the 1978 statistics: 
Total Queries 788 
TQrmination Conditions: 
Completed (AnswQr rQachQd) $13 65.1 
Aborted (System crash, QtC.) 53 6.7 
USQr Cancelled 21 2.7 
Program Error 39 ;.9 
Parsing Failure 1~7 18.7 
Unknown IS 1.9 
OthQr ReIQvant Events: 
User Comment 96 12.2 
OpQrator Message qS S.7 
USQP Message 11 1.~ 
Word not in Laxicon 119 15.1 
Lexical Choice RQsOlvQd by User 119 15.1 
'~Nothing in Data Base" AnswQr 61 7.7 
The pQrcQntage of successfully processed sQntQnCQS iS 
consistent with but slightly smallQr than that of such 
other invQstigators as Woods ClO\], Bellard and Bierman 
\[11\], and Hershman Qt al \[12\]. Extreme care should bQ 
QxercisQd in intQrprQting any such OVQra~l numbers, 
however, and Qvan more garQ must be qxercisQd in 
comparing numbers from different studies. LQt me just 
mention a few considerations that must be keot in mind 
in interpreting the TqA results above. 
First of a11, our users t purposes varied tremendously 
from day to day and even from question to question, On 
one occasion, for QxamplQp a session might bQ devoted 
to a serious attempt to extract data needed for a 
federal grant proposal, and either the query comolexity 
might bQ relatively limited so as to minimize the 
changQ of error, or else the questions might be 
essentially repetitions of the some query, with minor 
variations to select different data. On another 
occasion, however, thQ session might be a 
demonstration, or i serious attempt to dQtermine th Q 
limits of the systemVs understanding capability, or 
even a frivolous OUQry tO Satisfy the user's curiosity 
as to the computorls response to a question outside its 
area of expertise. (One of our failurQs was the 
sQntence, "Who killed C~ck Robin?".) 
Our users varied widely in terms of their familiarity 
with the contents of the data base. Hone kne. anything 
abou~ the internal organization of information (e.g. 
ho, the data was arranged into relations), but some had 
good knowledge of just what kind of data was stored, 
some had limltQd knowledgQ, and some had no knowledge 
and even false expQctations as to what knowZQdge was 
included in the data base. In addition, thQy varied 
widely with respect to the amount of prior experiQnca 
they had with the systQm. Initially we provided no 
formal trolning in the use of the system, but some users 
acquired significant knowledge of the system through 
its sustalnQd use over a period of t~me. Something OVQr 
half of the total usage was mode by the individuol from 
the plannlng department who was responsiblQ for 
starting the system up and shutting it down each day. 
Usage was also made by other members of the planning 
department, bv members of OthQr departments, and by 
summer interns. 
%t should al~o be noted that the TeA system itself did 
not stay constant over the two-year period of tasting. 
AS problems werQ encountered, modifications werQ madQ 
tO many components of the system. %n particular, the 
lexicon, grammar, semantic interpretation fuzes 
(attribute grammar rules), and logical form evaluation 
functions all QVOlved OVer thQ period ~n question 
(continuously, ~ut at a decrQasing rata). The porsQr 
and the sQmantic interpreter ghonged little, if any. A 
rerun of all sentences, using thQ version of the 
grammar thor existed at the conclusion of thQ field 
test arogram showed that 50 ~ of thQ sentences which 
previously failed ware processed correctly. This is 
impressive when it iS observed that a large percentage 
of the rQmalning ~0 ~ constitute sQntQncos which are 
either ungrammatical (SOmQtimes sufficiently tO 
prQclude human comprehension) or QISQ contain 
references to sQmantic concepts OUtside OUr universe of 
(land use) discourse. 
On the whole, our USQrS indicated they were satisfied 
with the performance of thQ system. In a conferQnce 
with them 8t one point during the field test, they 
indicated thQy would prefer us to spQnd our time 
bringing more of thQir files on linQ (Q.g., the zoning 
board of aPPQalS file) rather than to spend more time 
35 
providing additional syntactic and associated semantic 
capability. Those instances whQro an unsuccessful 
query was followed uP by attempts to rephrase the query 
SO as to permit its processing showQd few instances 
where success was not achieved within three attempts. 
This data is obscured somewhat by the fact that users 
called us on • few occasions to get advice as to ho~ to 
record I query. On other occasions the terminal mQsSagQ 
facility WaS invoked for the PUrpose of obtaining 
advice, and this lof~ • record in our automatic logging 
facility. That facility preserved a record of aLL 
traffic between the uservs terminal, the computer, and 
our own monitoring terminal (which ~aS not always 
turned on or attended), and it included • time stamp for 
every Line displayed on the users f terminaL. 
A word is in order on tho real time performance of the 
system and on the amount of CPU time required. Oamerau 
\[9\] includes a chart which shows ham many queries 
required a given number of minutes of real timQ fOP 
complete processing. The total elapsed time for • 
query Was typically around three minutes (58X of the 
sentences ware processed in four minutes or Less). 
Slapsad time depended primarily on machine Load and 
user behavior at the terminal. The computer on ~hich 
the system operated was an IBM System 370/168 with an 
attached processor, ~ megabytes of memory and extensive 
peripheral storage, operating under the VR/370 
operating system. There were typically in excess of ZOO 
users competing for PISCUPCeS on the system at the 
times when the TQA system was running during the 
L978-1979 field tests. Besides queuing for the CPU and 
memcry, this system dQVQLOpQd queues fop the IBM 3850 
MaSS Storage System, on which tho TqA data base ~ao 
stored. 
Users had no complaint: about reel time response, but 
this may have been due to their Procedure for handling 
ad hoc quQries prior to the installation of the Tea 
system. That procedure caLLed for ad hoc queries to be 
coded in RPG by members Of the data Processing 
department, and the turnaround time was • matter of 
days rathQr than minutes. It is likely that the real 
time performance of the system caused users sometimes 
to look up data about a specific parcel in a hard copy 
printout rather than giving it to the system. ~ueries 
were most often of the type requiring statistical 
processing of a set of parcels or of the type requiring 
a search for the parcel or parcels that satisfied given 
search criteria. 
The CPU requirements of the system, broken da~n into a 
number Of categories, arc aLsc plotted by Oamereu \[9\]. 
The typical time tO process a sentenca l~ss ten seconds, 
but sentences with Large data base retrieval demands 
took up tO i minute. System hardware improvements made 
subsequent to the 1778-1777 field tests havQ cut this 
processing time approximately in half. Throughout our 
davaLopment of the TqA system, ¢onsideratton~ of speed 
have been secondary. He have idQntified meny areas in 
which racodt~g should produce I dramatic incrqasm in 
speed, but thio has been assigned • lesser priority 
than basic QnhantQmont of the SyStem and the coverage 
Of \[ngLish provided through its transformational 
gremsar. 
Our experiment has sho~n that ~|aLd tasting of question 
answering systems provides certain information that is 
not otherwise available. The day to day usage of the 
system ~S different in many respects fPom usage that 
results from controLLed, but inevitably someNhat 
artificial, experiments. He did not influence our users 
by the wording of problems posed to them because wa gave 
them no problems; their requests for information were 
solely for their own purposes. Our sample queries that 
wa initially exhibited to city employees to indicate 
the system ~lO reedy to ba tasted wePe invariably 
greeted with mirth, due to the improbability that 
anyone would ~snt to know the information requested. 
(They poked fop Pmassurance that the system would also 
answer wreaLw questions). ~a alSO obtained valuable 
information on such matters aS haw Long USers persist 
in rephrasing queries when they encounter difficulties 
Of variouskinds, ho~ succaosful they are in correcting 
errors, and what neM errors are Likely to be lade while 
Correcting initial errors. ~ hope to discuss these and 
ether matters in more detail in the oral version of this 
paper. 
Valuable as our f|ald taste ere, they cannot provide 
certain information that must ba obtained from 
controlled experiments. Accordingly, ~a hops tO conduct 
a comparison of Tea with several formal query Languages 
in the neap fUtUrO, using the Latest enhanced version 
of the system and carefully controlling such factors as 
user training and problem stateloQnt. After teaching a 
course in data base management systems at queens 
CcLLege and the Pratt Institute, end after running 
informal axpQriments there comparing students f relative 
success in uoing TqA, ALPHA, relational algebra, qBE, 
and SEQUEL, I am convinced that even for educated, 
prsgralmlinQ-oriantad users with I fair amount Of 
experience in learning i formalL query Languaca, the Tea 
sys~ell offers.significant advantages over formal query 
~anguages in retrieving data quickly and correctly. 
This remains to ba proved (or disproved) by conducting 
appropriate formal experiments. 
\[1J Plath, W. J., Transformational Grammar and Transformational Parsing in the Request System, 
IBM Research Report RC 4396, Thomas J. Watscn 
Research Center, Yorktown Heights, H.Y., 1973. 
\[2\] Plath, W. J., String Transformations in the 
REQUEST System, American Journal of Computational 
Linguistics, Microfiche 8, 197;. 
\[3\] Potrick, S. R., Transformational Analysis, HatuPal 
Lanquaqe PPocessino (R. Rustin, ed.), ALgorithmics 
Press, 1973. 
\[4\] Knuth, O. E., Semantics of Context-Free Languages, 
MQthem~tlcal Systems Theory , ZI, June 1968 2, pp. 
127-I¢5. 
\[5\] Potrick, S. R., Semantic Interpretation in the 
Request System, in Computational and Mathematical 
Linguistics, Proceeding: of the International 
Conference on Computational Linguistics, Piss, 
Z7/VIII-I/%X 1973, pp. 585-610. 
\[6\] Gazdar, Go J. M., Phrase Structure Grammar, to 
appear in Thq ~ature of Syntactic RecPes~ntation , 
(sdso P. Jacobson and G. K. PuLlum), 1979. 
\[7\] Sresnan, J. W. and gaplan, R. M., 
LoxicaL-FunctionaL Grammar: A Formal System for 
Grammatical Representation, to appear in T~ 
Mental Reprs=entation of Grammatical Relations (J. 
W. Bresnan, ed.), Cambridge: MIT Pross. 
C8\] Astrahan, M.M.; 8Lasgen, M.W.; Chambqrlin, D.D.; 
Eswarln, K.P.; Gray, J.H.; Griffiths, P.P.; King, 
W.F.; Lories, R.A.; McJones, J.; Meh~, J.W.; 
PutzoLu, G.R.; Traiger, I.L.; Wade, B.W.; and 
Watscn, V., System R: Relational Approach to 
Database Manag~ent, ACM Transactions on Database 
Systems, Vo1. 1, No. 21, June, 1976, pp. 97-137. 
\[9\] Oamerau, F. J., The Transformational question 
Answering (Tea) System Operational Statistics ® 
1978, tc appear in AJCL, June 1981. 
\[10\] Wocds, W. A., Transition Network Grammars, Natural 
Lanmuaae Procassinm (R. gustin, ed.), ALgorithmics 
Press, 1973. 
\[11\] Btermann, A. W. and Ballard, S. W., To~ard Natural 
Language Computation, AJCL, 9oi. 6, No. 2, 
April-June 1980, pp. 71-86. 
\[12\] Hershsan, R. L., Kelley, R. T., and Miller, H. C., 
User Performance with a Natural Language query 
Systsm for Colmaand Control, HPRDC TR 7917, Navy 
Personnel Research end Development Center, San 
Diego, Cal. 92152, January 1979.. 
36 
