EUFID: A FRIENDLY AND FLEXIBLE FRONT-END FOR DATA MANAGEMENT SYSTEMS 
Marjorie Templeton 
System Development Corporation, Santa Monica, CA. 
EUFID is a natural language frontend for data management 
systems. It is modular and table driven so that it can 
be interfaced to different applications and data manage- 
ment systems. It allows a user to query his data base 
in natural English, including sloppy syntax and mis- 
spellings. The tables contain a data management system 
view of the data base, a semantic/syntactic view of the 
application, and a mapping from the second to the first. 
We are entering a new era in data base access. Computers 
and terminals have come down in price while salaries 
have risen. We can no longer make users spend a week in 
class to learn how to get at their data in a data base. 
Access to the data base must be easy, but also secure. 
In some aspects, ease and security go together because, 
when we move the user away from the physical character- 
istics of the data base, we also make it easier to 
screen access. 
EUFID is a system that makes data base access easy for 
an untrained user, by accepting questions £n natural 
English. It can be used by anyone after a few minutes 
of coaching. If the user gets stuck, he can ask EUFID 
for help. EUFID is a friendly but firm interface which 
includes security features. If the user goes too far 
in his questions and asks about areas outside of his 
authorized data base, EUFID will politely misunderstand 
the question and quietly log the security violation. 
One beauty of EUFID is its flexibility. It is written 
in FORTRAN for a PDP-II/70. With minor modifications 
it could run on other minl-computers or on a large com- 
puter. It is completely table driven so ~hat it can 
handle different data bases, different views of the same 
data base, or the same view of a restructured data base. 
It can be interfaced with various data management 
systems--currently it can access a relational data base 
via INGRES or a network data base via WWDMS. 
EUFID is an outgrowth of the SDC work on a conceptual 
processor which was started in 1973. 1 It is now demon- 
strable with a wide range of sentences questioning two 
data bases. It is still a growing system with new 
power being added. 
In the following sections we will explore the features 
that make EUFID so flexible and easy to use. The main 
features are: 
• natural English 
• help 
• semantic tables 
• data base tables 
s mapping tables 
s intermediate language 
• security 
i. NATURAL ENGLISH 
EUFID has a dictionary containing the words that the 
users may use when querying the data base. The 
dictionary describes how words relate to each other and 
to the data base. Unlike some other natural language 
systems, EUFID has the words in the sentence related to 
fields in the data base by the time the sentence is 
"understood." More will be said about this process in 
the section on semantic tables. 
EUFID is forgiving of spelling and grammar errors. If 
i~ does not have a word in the dlctionary t but has a 
word that is close in spelling, it will ask the user if 
a substitution can be made. It also can "understand" 
a sentence even when all words are not present or ~ome 
words are not grammatically correct. For example, any 
of these queries are acceptable: 
"What companies ship goods?" 
"Companies?" (list all companies) 
"What company shop goods?" 
("shop" will be corrected to "ship". The plural 
"companies" will be assumed) 
Users are free to structure their input in any way that 
is natural to them as long as the subject matter covers 
what is in the data base. EUFID would interpret these 
questions in the same way: 
"Center shipped heavy freight to what warehouses in 
1976?" 
"What warehouses did Center ship heavy freight to 
in 1976?" 
Each user may define personal synonyms if tile vocabulary 
in the dictionary is not rich enough for him. For 
example, for efficiency a user might prefer to use "wh" 
for "warehouse" and "co" for "company". Another user of 
the same data base might define "co" for "count". 
2. HELP 
Basically, EUFID has only four commands. These are 
"help", "synonym" (to define a synonym), "comment" (to 
criticize EUFID), or "quit". These four commands are 
described in the help module as well as the general 
guidelines for questions. 
If the user hits an error while using EUFID, he wlll 
receive a sentence or two at his terminal which describes 
the problem. In some cases he will be asked for clari- 
fication or a new question as shown in these exchanges. 
User: "What are the names of female secretaries' 
children?" 
EUFID: "Do you mean 
(i) female secretaries or 
(2) female children?" 
User: "2" 
or 
User: "What is the salary of the accounting 
department?" 
EUFID: '~e are unable to understand your question 
because "salary of department" is not 
meaningful. Please restate your question." 
If the description is not enough to clarify the problem, 
the user can ask for help. First, HELP will give a 
deeper description of the problem. If that is not 
enough, the user can ask for additional information which 
may include a llst of valid questions. 
3. TABLES 
EUFID is application and data base independent. Thls 
independence is achieved by having three sets of tables-- 
the semantic dictionary tables, the data base tables, 
and the mapping tables which map from the semantic view 
to the data base. Conceivably, a single semantic view 
could map to two data bases that contain the same data 
but are accessed by different data management systems. 
91 
3.1 SEMANTIC TABLES 
The semantic view is defined by an application expert 
working with a EUFID expert. Together the 7 determine the 
ways chat a user mlghc want to talk about the data. From 
this, a llsC of words is developed and the basic sentence 
structures are defined. Words are classed as: 
entitles (e.g., company) 
events (e.g., send) 
funcClons (after 1975) 
parrs of a phrase or idiom (map coordlnaCes) 
connectors (co) 
system words (the) 
anaphores (ic) 
two or more of the above (ship an enClCy plus 
ship an event) 
An entity corresponds approximately co a noun and an 
event co a verb. Connectors are preposlClons which are 
dropped after the sentence is parsed. System words are 
conjunctions, auxiliaries, and decermlners whloh partici- 
pate in determining meaning buc do noC relate co data 
base fields. Anaphores are words chac refer Co previous 
words and are replaced by them while parsln 8. Basically 
then, the only words chat relate co the items in the 
data base are entities, events, and funcclons. 
Entities and events are defined using a case structure 
representation which combines synCacclc and sm---clc 
information. Lexlcal items which may co-occur with an 
entity to form noun phrases, or wlch a verb co form 
verb phrases, fill cases on the enClCy or event. Cases 
are disclngulshed by the sac of possible fillers, the 
possible connectors, and the syncactlc position of the 
case relaclve co the antic 7 or event. A case may be 
specified as opclonal or obllgacory. 
A sense of an entlCy or event is defined by the sac of 
cases which form a dlsCincC noun phrase or verb phrase 
type. Three senses of the word "ship" are illustrated 
in Figure i. 
~IPPING CC~ANY I I 
S~O~. aT" 
SlIP 
I- - 
OJL/Ga~aY } 08~lcaT0aT ~ 
,m, I'~- "," I"~. 0~3/~m, AFro. 
mI~rr 
CASK F C~Jl G CASE C 
IN =- 
Figure I. 
The flrsc sense of "ship" accounts for acClve voice 
verb phrases wlch the pattern "Companies ship goods 
CO companies in year.*' 
Examples are: .. 
Whac companies ship to Ajax? 
In 1976, who shipped light freight co Colonial? 
This sense of "ship" has ~wo obligatory cases, A and C, 
and ~ao optional cases B and H. The face chac the 
"year" case can be moved opclonally wichln the phrase 
is noC represented within the case structure, buc is 
recoEnlzed by the Analyzer, which assigns a structure 
Co the phrase. 
The second sense of "ship" accounts for the passive con- 
8CrucClon of the type "Goods are shipped Co company by 
company." 
Examples are: 
Was llghc frelghc shipped Co Ajax in 19787 
What goods were shipped Co Ajax by Colonial? 
By whaC companies in 1975 was hesw/ freight 
shipped Co Colonial? 
Case O has the same filler as case B, but precedes 
"ship" and is obligatory. Case g has the same filler 
as case A, buc follows "ship", has a dlfferenc con- 
nector, and is optional. That is, sense i of "ship" 
is daflned as the associaclon of "ship" with cases 
A,B,C. Sense 2 is the associ&clon of "ship" with cases 
C,D,E. Sense 3 of "ship" describes the nominallzed 
form "shlpmenc" and expliclCly captures the informaclon 
Chac shlpmencs involve goods and reflect transacClons 
between companies. 
An *~-mple is: 
'~taC is the cransacclon number for the shlpmanc 
of bolts from Colonial co Ajax?" 
3.2 DATA BASE TABLES 
The data base cables describe the data base as viewed 
by the data management system. Since all dace mamags- 
menC syscemn deal with dace iCmma organized into groups 
chac are related through links, ic is possible co have 
a co~n cable format for any dace management system. 
The dace bus cables actually consist of two cables. 
The CAN table contains information about groups and 
dace iC ~a. A group (also called entity or record in 
ocher systems) is Idenclfled by the group name. A 
dare Icam in che CAN cable consists of Che data ices 
"mine, che grOUp CO which IC belongs, a uniC code, an 
output Idenclflar, and some field type informaClon. 
Notably missing is anything about the byte wichln the 
record or the number of bytes. ~UFID accesses the dace 
base through s data management sysCom. Therefore, the 
dace can be reorganized ~rLChou¢ changing the EUFID 
cables aa long as the dace iCeml retain their names and 
chair groupings. 
The second data beam cable is the P~L cable which contains 
an encz 7 for each group with its links co ocher groups. 
For nscwork dace bases, cha link is the chain name for 
the primary chain chac connects master and derail 
records. For relational dace bases, every dace item 
pair in the two groups chac can have the same value is 
a potential link. 
3.3 MAPPING TABLES 
The mapping cablu cell the program how to gec from the 
semantic nods, as found in the semantic dictionary, co 
the dace base field names. Each entry in the mapping 
table has a node name followed by two parts. The 
first parr describes the pacCsrn of cases and their 
fillers for chac node name. The second parr is called 
a production and ic gives the mapping for each case 
filler. A node may map co a node higher in the sentence 
tree before iC maps co a dace bus item. For exalpls, 
"company name" in the question '~at companies are 
locacnd in Los Angeles?" may map to a group containing 
ge~sral company ~n~ormacion. However, "company name" 
in the question "W~'mt companies ship Co Los Angeles?" 
may map to a group concain~ng shipping company information. 
92 
Therefore, it is necessary to first map "company name" 
up to a higher node that determines the meaning. At the 
point where a unique node is determined, the mapping is 
made to a data item name via the CAN table. This data 
item name is used in the generatlon of the query to the 
data management system. 
4. INTERMEDIATE LANGUAGE 
EUFID is adaptable to most data management systems with- 
out changes to the central modules. This is accomplished 
by using an intermediate language (IL). The main parts 
of EUFID analyze the question, map it to data items, and 
then express the query in a standard language (IL). A 
translator is written for each data management system in 
order to rephrase the IL query into the language of the 
data management system. This is an extra step, but it 
greatly enhances EUFID's flexibility and portability. 
The intermediate language looks like a relational re- 
trieval language. Translating it into QUEL is straight- 
forward, but translating It to a procedural language 
such as WWDMS is very difficult. The example below shows 
a question with its QUEL and WWDMS equivalent. 
QUESTION: WHAT ARE THE NAMES AND ADDRESSES OF THE 
EXECUTIVE SECRETARIES IN R&D? 
INGRES IL: 
RETRIEVE \[JOB.EHFLOYEE,JOB.ADDRESS\] 
WHERE (DIV.NAHE = "R&D") 
AND (DIV.JOB = JOB.NAHE) 
AND (JOB.NAME = "SECRETARY") 
AND (JOB.CLASS = "EXECUTIVE") 
QUEL: 
range of div is dlv 
range of Job is Job 
retrieve (Job.employee,Job.address) 
where dlv.name = "R&D") 
and dlv. Job= Job.name 
and Job.name = "secretary" 
and Job.class = "executive" 
W~ IL: 
RETRIEVE \[JOB.EMPLOYEE,JOB.ADDRESS\] 
WHERE (DIV.DNAME - "R&D") 
AND (DIV.DIV JOB CH - JOB.DIV_JOBCH) 
AND (JOB.JNAME - "SECRETARY") 
AND (JOB.CLASS - "EXECUTIVE") 
WW'DMS QUERY: 
INVOKE 'WWDMS/PERSONNEL/ADF' 
REPORT EUFID-1 ON FILE 'USER/PASSWD/EUFID' 
FOR TTY 
QI. LINE "EMPLOYEE NAME =",EMPLOYEE 
Q2. LINE "ADDRESS "",ADDRESS 
El. RETRIEVE E-DIV 
WHERE DNAME " "R&D" 
WHEN R1. 
R2. RETRIEVE E-JOB 
WHERE JNANE - "SECRETARY" 
AND CLASS - "EXECUTIVE" 
WHEN R2 
PRINT ql 
PRINT Q2 
END 
5. SECURITY 
EUFID protects the data base by removin B the user from 
direct access to the data management system and data 
base. At the most general level, EUFID will only allow 
users to ask questions within the semantics that are 
defined and stored in the dictionary. Some data items 
or views of the data could be omitted from the dlctlonazy. 
At a more specific level, EUFID controls access through 
a user profile table. Before a user can use EUFID, a 
93 
system person must define the user profile. This cable 
states which applications or subsets of applications are 
available to the user. One user may be allowed Co query 
everything that is covered by the semantic dictionary. 
Another user may be restricted in his access. 
The profile table is built by a concept graph editor. 
When a new login id is established for EUFID, the system 
person gives the application name of each application 
that the user may access. Associated with an applicatlon 
name is a set of file names of the tables for the appli- 
cation. If access is to be restricted, a copy of the 
CAN and mapping function tables is made. The copies are 
chanEed to delete the data items which the user is not 
to know about. The names of the restricted tables are 
then stored in the user's profile record. EUFID will 
still be able to find the words that are used co talk 
about the data item, but when EUFID maps the word to a 
removed data item it responds to the user as though the 
sentence could not be understood. 
6. CONCLUSION 
EUFID is a system that makes data base access easy and 
direct for an end user so that he does not need to go 
through a specialist or learn a language to query his own 
data base, It is modular and table driven so that it can 
be interfaced with different data management systems and 
different applications. It is written in hlgh-level 
transportable languages to run on a small computer for 
maximum transportability. The case grammar that it uses 
allows flexibility in sentence syntax, ungrammatical 
syntaxj and fast, accurate parsing. 
If the reader wants more detail he is referred to refer- 
ences 2-4. 
7. RE F~E~CES 
1. Burger, J., Leal, A., and Shoshanl, A. "Semantic 
Based Parsing and a Natural-Language Interface for 
Interactive Data Management," AJCL Microfiche 32, 
1975, 58-71. 
2. Burger, John F. "Data Base Semantics in the EUFID 
System," presented at the Second Berkeley Workshop 
on Distributed Data Management and Computer Networks, 
May 25-27 1977, Berkeley, CA. 
3. Walner, J. L. "Deriving Data Base Specifications 
from User Queries," presented at the Second Berkeley 
Workshop on Distributed Data Management and Computer 
Net-works, May 25-27, 1977, Berkeley, CA. 
4. Kameny, I., Welner, J., Crilley, M., Burger, J., 
Gates, R., and Brill, D. "EUFID: The End User 
Friendly Interface to Data Management Systems," SDC, 
September 1978. 

