Research and Development in Natural Language Processing 
at BBN Laboratories 
in the Strategic Computing Program 
BBN Laboratories, Inc. 
Cambridge, MA 02238 
STAFF: Ralph Weischedel (Principal Investigator), Remko Scha, Edward Walker, Damaris 
Ayuso, Andrew Haas, Erhard Hinrichs, Robert Ingria, Lance Ramshaw, Varda Shaked, 
David Stallard 
1 Background 
BBN's responsibility is to conduct research and development in natural language 
interface technology. This responsibility has three aspects: 
o to demonstrate state-of-the-art technology in a Strategic Computing 
application, collecting data regarding the effectiveness of the demonstrated 
heuristics, 
! 
o to conduct research in natural language interface technology, as itemized in 
the description of JANUS later in this note, and 
o to integrate technology from other natural language interface contractors, 
including USC/Information Sciences Institute, the University of Pennsylvania, 
and the University of Massachusetts. 
Of the three initial applications described in the overview, the Fleet Command 
Center Battle Management Program (FCCBMP) has been the application providing the 
domain in which our work is being carried out. The FCCBMP encompasses the 
development of expert system capabilities at the Pacific Fleet Command Center in 
Hawaii, and the development of an integrated natural language interface to these new 
capabilities as well as to the existing data bases and graphic display facilities. BBN is 
developing a series of increasingly sophisticated natural language understanding 
systems which will serve as an integrated interface to several facilities at the Pacific 
Fleet Command Center: the Integrated Data Base (IDB), which contains information 
about ships, their readiness states, their capabilities, etc.; the Operations Support 
Group Prototype (OSGP), a graphics system which can display locations and itineraries 
of ships on maps; and the Force Requirements Expert System (FRESH) which is being 
built by Texas Instruments. 
The target users for this application are naval officers involved in decision 
1 
BBN Laboratories Incorporated 
making at the Pacific Fleet Command Center; these are executives whose effort is 
better spent on navy problems and decision making than on the details of which 
software system offers a given information capability, how a problem should be divided 
to make use of the various systems, or how to synthesize the results from several 
sources into the desired answer. Currently they do not access the data base or OSGP 
application programs themselves; instead, on a round-the-clock basis, two operators 
act as intermediaries between the Navy staff and the computers. The utility of a' 
natural language interface in such an environment is clear. 
The starting point for development of the natural language interface system at 
the Pacific Fleet Command Center was the IRUS system, which has been under 
development at BBN for a number of years. A new version of this system, IRUS-86, 
has been installed in the FCCBMP testbed area at the Pacific Fleet Command Center for 
demonstration. Further basic research on the problems of natural language 
interfacing is continuing, and the results of this and future research will be 
incorporated into a next generation natural language interface system called JANUS, to 
be delivered to the Pacific Fleet Command Center at a later date. JANUS will share 
most of its domain-dependent data with IRUS-86, and it will share other modules as 
well; IRUS-86 will therefore be able to evolve gradually into the final version of JANUS. 
2 IRUS-86: The Initial Test Bed System 
The architecture of IRUS \[Bates 83\] is a cascade consisting of a sequence of 
translation modules: 
o An ATN parser which produces a syntactic tree. 
o A semantic interpreter which produces a formula of the meaning 
representation language MRL. 
o A postprocessor for resolving anaphora and ellipsis. 
o A translation module which produces a formula of the relational data base 
language ERL ("Extended Relational Language"). 
o A translation module which produces a sequence of commands for the 
underlying data base access system. 
IRUS-86, the version of IRUS which is now installed at the Pacific Fleet Command 
Center, is a version of IRUS which is extended in several ways. Two of these 
extensions are especially worth mentioning: 
o IRUS-86 uses the NIKL system \[Moser 83\] to represent its domain model, i.e., 
the relationships between the predicates and relations of the meaning 
representation language MRL. The NIKL domain model supports the system's 
treatment of semantic anomaly, anaphora, and nominal compounds. 
BBN Laboratories Incorporated 
o IRUS-86 contains a new module which exploits this NIKL domain model to 
simplify MRL expressions; this makes it possible to translate complex MRL- 
expressions into ERL constants, thus allowing for significant divergences 
between the input English and the structure of the underlying data base 
\[Stallard 86\]. 
In addition to accessing the NIKL domain model, the parser, semantic interpreter 
and MRL-to-ERL translator access other knowledge sources which contain domain- 
dependent information: 
o the lexicon, 
o the semantic interpretation rules for individual concepts, 
o the MRL-to-ERL mapping rules for individual MRL constants, which introduce 
the details of underlying system structure, such as file and field names. 
To port IRUS to the navy domain, the relevant domain-dependent data had to be 
supplied to the system. This task is being accomplished by personnel at the Naval 
Ocean Systems Center (NOSC). In August, 1985, BBN provided NOSC with an initial 
prototype system containing small example sets of lexical entries, semantic 
interpretation rules, and MRL-to-ERL rules; using acquisition tools provided by BBN, 
NOSC personnel have been entering the rest of the data. 
IRUS-86 was delivered to the FRESH developers at Texas Instruments in January 
1986, was installed in a test bed area of the Pacific Fleet Command Center in April 
1986, and will be demonstrated in June 1986. Currently, the lexicon and the domain- 
dependent rules of the system only cover a relatively small part of the OSGP 
capabilities and the files and attributes of the Integrated Data Base. Once enough 
data have been entered so that the system covers a sufficiently large part of the data 
base, it will be tried out in actual use by Navy personnel. This will enable us to 
gather data about the way the system performs in a real environment, and to fine- 
tune the system in various respects. For instance, IRUS-86 makes use of shallow 
heuristic methods to address some aspects of natural language understanding such as 
anaphora and ellipsis for which general solutions are still research issues. The 
FCCBMP application provides a test bed in which such heuristic methods can be 
evaluated, and enhancements to them developed and tested, as part of the 
evolutionary technological growth intended to continue throughout the Natural 
Language Technology effort of the Strategic Computing Program. 
3 
BBN Laboratories Incorporated 
3 Functional Goals for JANUS 
The IRUS-86 system excels by its clean, modular structure, its broad 
syntactic/semantic coverage, its sophisticated domain model, and its systematic 
treatment of discrepancies between the English lexicon and the data base structure. 
We thus expect that it will demonstrate considerable utility as an interface component 
in the FCCBMP application. Nevertheless, IRUS-86 shares with other current systems 
several limitations which should be overcome if natural language interfaces are to 
become truly "natural". In developing JANUS, the successor of IRUS-86, we shall 
attempt to overcome some of those limitations. The areas of increased functionality 
we are considering are: semantics and knowledge representation, ill-formedness, 
discourse, cooperativeness, multiple underlying systems, and knowledge acquisition. 
3.1 Semantics and Knowledge Representation 
IRUS-86, like most other current systems, represents sentence meanings as 
formulas of a logical language which is a slight extension of first-order logic. As a 
consequence, many important phenomena in English have no equivalent in the meaning 
representation language, and cannot be dealt with correctly, e.g., modalities, 
propositional attitudes, generics, collective quantification, and context-dependence. 
Thus, one foregoes one of the most important potential assets of a natural language 
interface: the capacity of expressing complex semantic structures in a succinct and 
comfortable way. 
In JANUS, we will therefore adopt a new meaning representation language which 
combines features from PHILIQAI's enriched lambda-calculus \[Scha 76\] with ideas 
underlying Montague's Intensional Logic \[Montague 70\], and possibly a distributed 
quote-operator \[Haas 86\]. It will have sufficient expressive power to incorporate a 
version of Carlson's treatment of generics \[Carlson 79\], a version of Scha's treatment 
of quantification \[Scha 81\], Montague's treatment of modality, and various possible 
approaches to propositional attitudes and context-dependence. 
In adopting a higher order logic as proposed, one confronts problems of formula 
simplification and the need to apply meaning postulates to reduce the semantic 
representation of an input sentence to an expression appropriate to the underlying 
system, e.g., a relational algebra expression in the case that the underlying system is 
a data base. To do this, we will investigate the limited inference mechanisms of KL- 
TWO \[Moser 83, Vilain 85\], following up on our previous work \[Stallard 86\]. The 
advantage of these inference mechanisms is their tractability; discovering their power 
and limitations in this complex problem domain should be an interesting result. 
4 
BBN Laboratories Incorporated 
3.2 Discourse 
The meaning of a sentence depends in many ways on the context which has been 
set up by the preceding discourse. IRUS and other systems, however, currently ignore 
most of these dependencies, and employ a rather shallow model of discourse structure. 
To allow the user to exploit the full expressive potential of a natural language 
interaction, the system must track topics, reference times, possible antecedents for 
anaphora, etc.; it must be able to recognize the constituent units of a discourse and 
the subordination or coordination relations obtaining between them. A substantial 
amount of work has been done already on several of these issues, much of it by BBN 
researchers \[Sidner 85, Hinrichs 81, Polanyi 84, Grosz 86\]. Research in this area 
continues under a separate DARPA-funded contract. We.expect to be able to integrate 
some of the results of that research in the JANUS system. 
3.3 M-formedness 
A natural interface system should be forgiving of a user's deviations from its 
expectations, be they misspellings, typographical errors, unknown words, poor syntax, 
incorrect presuppositions, fragmentary forms, or violated selection restrictions. 
Empirical studies show that as much as 25% of the input to data base query systems is 
ill-formed. 
IRUS currently handles some classes of ill-fo'rmedness by using a combination of 
shallow heuristics and user interaction. It can correct for typographical misspellings, 
for omitted determiners or prepositions, and for some ungrammaticalities, like 
determiner-noun and subject-verb disagreement. The JANUS system will employ a 
more general approach to ill-formedness that will handle a larger class of 
ungrammatical constructions and a larger class of word selection problems, and that 
will also explore correcting several types of semantic ill-formedness. 
These capabilites have major implications for the control of the understanding 
process, since considering such possibilities can exponentially expand the search 
space. Maintaining control will require care in integrating the ill-formedness 
capability into the rest of the system, and also making maximal use of the guidance 
that can be derived from a model of the discourse and user's goals to constrain the 
search. 
3.4 Cooperativeness 
A truly helpful system should not react to the literal meaning of a sentence, but 
to its perceived intent. If in the context of a given application it is possible to 
characterize the goals that a user may be expected to be pursuin$ through his 
interaction with the system, the system should try to infer from the user-input what 
the underlying goal could be. A system can do this by accessing a goal-subgoal 
5 
BBN Laboratories Incorporated 
hierarchy which links the speech acts expressed by individual utterances to the global 
goals that the user may have. This strategy has been applied successfully to rather 
small domains \[Allen 83, Sidner 85\]. We wish to investigate whether it carries over to 
the FCCBMP applications. 
3.5 Modelling the Capabilities o_.f Multiple System 
The way in which IRUS-86 decides whether an input sentence translates into an 
IDB query or an OSGP command may be refined. There is a need for work on what 
kind of knowledge would be necessary to interface smoothly and intelligently to 
multiple underlying systems. A reasoning component is needed that can determine 
which underlying system or systems can best fulfill a user's request. Such a 
reasoning component would have to combine a model of the capabilites of the 
underlying systems with a model of the user goals and current intentions in the 
discourse context in order to choose the correct system(s). Such a model would also 
be useful for providing supporting information to the user. 
3.8 Knowledge Acquisition 
Further research is also called for to expand the power of the knowledge 
acquisition tools that are used in adding to the lexicon, the set of case frame rules, 
the~model of domain predicates, and the set of transformation rules between the 
Meaning Representation Language and the languages of the underlying systems. The 
acquisition tools available in IRUS, unlike those in some other systems, are not tied to 
the specific fields and relations in the underlying database. The acquisition tools 
should work on the higher level of the domain model, since that provides a more 
general and transportable result. The knowledge acquisition facilities for JANUS will 
also need to be redesigned to support and to make maximal use of the power of the 
new meaning representation language based on intensional logic. 
4 New Underlying Technologies 
4.1 Coping with Ambiguity 
The new functionalities we described in the previous section, and the techniques 
we intend to use to achieve them, raise an issue which has important consequences 
for the design of JANUS: we will be faced with an explosion in the number of 
interpretations that the system will have to process; every sentence will be manifold 
ambiguous. One source of this phenomenon is the improvement of the semantic 
coverage and the broadening of the discourse context. Distinctions and ambiguities 
which so far were ignored will be dealt with: for instance, different interpretation and 
6 
BBN Laboratories Incorporated 
scopes of quantifiers will be considered, and different antecedents for pronouns. Even 
more serious is the processing of ill-formed sentences, which may require trying out 
all partial interpretations to see which one can be extended to a complete 
interpretation after relaxing one or more constraints. 
To cut down on the processing of spurious interpretations, it is very important 
that interpretations of sentences and their constituents be tested for plausibility at 
an early stage. Different techniques must probably be used in conjunction: 
o Simplification transformations may show that an interpretation is absurd, by 
reducing it to TRUE or FALSE or the empty set. 
o The discourse context and the model of the user's goals impose constraints 
on expected sentences. 
4.2 Parallel Parsing 
Since some of the techniques that we intend to use to fight the ambiguity 
explosion are themselves rather computation-intensive, it is clearly unavoidable that 
the improved system functionality that we aim for will lead to a considerable increase 
in the amount of processing required. To avoid a serious decrease of the new 
system's response times, we will therefore move it to a suitable parallel machine such 
as BBN's Butterfly or Monarch, running a parallel Common Lisp. This in itself has 
rather serious consequences for the software design. It means that from the outset 
we will keep parallelizability of the software in mind. 
We have begun to address this issue in the area of syntax. A new declarative 
grammar is being written, which will ultimately have a coverage of English larger than 
the current RUS grammar; the grammar is written in a side-effect-free formalism (a 
context-free grammar with variables) so that different parsing algorithms may be 
explored which are easily parallelizable. The first such algorithm was implemented in 
May 1986 on BBN's Butterfly. 
5 Contributions from Other Sites 
5.1 ISI/UMass: Generation 
We should not expect that JANUS will always be able to assess correctly which 
interpretation of a sentence is the intended one. In light of such situations, it is 
very important that the system can give a paraphrase of the input to the user, which 
shows the system's interpretation. This may be done either explicitly or as part of 
the answer. To be able to develop such capabilities, work on Natural Language 
Generation is needed. At USC/ISI a project directed by William Mann and Norman 
7 
BBN Laboratories Incorporated 
Sondheimer is underway to develop the generation system PENMAN, using the NIGEL 
systemic grammar. PENMAN will be integrated to become the generation component of 
JANUS. PENMAN itself consists of several subcomponents. Some of these, specifically 
the "text planning" component, will be developed through joint work between USC/ISI 
and David McDonald at the University of Massachusetts, based on the farter's 
experience with the MUMBLE system. 
5.2 UPenn: Cooperation and Clarification 
Under the direction of Aravind Joshi and Bonnie Webber at the University of 
Pennsylvania, several focussed studies have been carried out to investigate various 
aspects of cooperative system behaviour and clarification interactions. (For more 
detail, see their paper in this issue.) As part of the Strategic Computing Natural 
Lanauge effort, UPenn will eventually develop this into a module which can be 
integrated into JANUS to further enhance its capabilities. 
BBN Laboratories Incorporated 
References 
[Allen 83] 
[Bates 83] 
[Carlson 79] 
[Grosz 86] 
[Haas 86] 
[Hinrichs 81] 
[Montague 70] 
[Moser 83] 
[Polanyi 84] 
Allen, J.F. 
Recognizing Intentions from Natural Language Utterances. 
In M. Brady and R.C. Berwick (editors), Computational Models of 
Discourse, pages 107-166. Massachusetts Institute Technology 
Press, 1983. 
Bates, M. and Bobrow, R.J. 
A Transportable Natural Language Interface for Information Retrieval. 
In Proceedings of the 6th Annual International ACM SIGIR Conference. 
ACM Special Interest Group on Information Retrieval and American 
Society for Information Science, Washington, D.C., June, 1983. 
Carlson, G. 
Reference to Kinds in English. 
Garland Press, New York, 1979. 
Grosz, B.J. and Sidner, C.L. 
The Structures of Discourse Structure. 
In L. Polanyi (editor), Discourse Structure. Ablex Publishers, Norwood, 
NJ, 1986. 
Haas, A.R. 
A Syntactic Theory of Belief and Action. 
Artificial Intelligence , 1986. 
Forthcoming. 
Hinrichs, E. 
Temporale Anaphora im Englischen. 
1981. 
Unpublished ms., University of Tuebingen. 
Montague, R. 
Pragmatics and Intensional Logic. 
Synthese 22:68-94, 1970. 
Moser, M.G. 
An Overview of NIKL, the New Implementation of KL-ONE. 
In Sidner, C. L., et al. (editors), Research in Knowledge Representation 
for Natural Language Understanding - Annual Report, f September 
1982 - 31 August 1983, pages 7-26. BBN Laboratories Report No. 
5421, 1983. 
Polanyi, L. and Scha, R. 
A Syntactic Approach to Discourse Semantics. 
In Proceedings of Int'l. Conference on Computational Linguistics. 
Stanford University, Stanford, CA, 1984. 
BBN Laboratories Incorporated 
\[Scha 76\] 
\[Scha 81\] 
\[Sidner 85\] 
\[Stallard 86\] 
\[Vilain 85\] 
Scha, R.J.tI. 
Semantic Types in PHLIQA1. 
In Proceedings of the 6th International Conference on Computational 
Linguistics. 1978. 
Scha, R.J.H. 
Distributive, Collective and Cumulative Quantification. 
Formal Methods in the Study of Language, Part 2. 
Mathematisch Centrum, Amsterdam, 1981, pages 483-512. 
Reprinted in: J.A.G. Groenendijk, T.M.V. Janssen and M.B.J. Stokhof 
(editors). Truth, Interpretation and Information. GRASS 3. 
Dordrecht, Foris, 1984. 
Sidner, C.L. 
Plan parsing for intended response recognition in discourse. 
Computational Intelligence 1(1):1-10, February, 1985. 
Stallard, D.G. 
A Terminological Simplification Transformation for Natural Language 
Question-Answering Systems. 
In Proceedings of the 24th Annual Meeting of the Association for 
Computational Linguistics. Association for Computational 
Linguistics, June, 1986. 
Vilain, M. 
The Restricted Language Architecture of a Hybrid Representation 
System. 
In Proceedings of IJCAI85, pages 547-551. International Joint 
Conferences on Artificial Intelligence, Inc., Morgan Kaufmann 
Publishers, Inc., Los Angeles, CA, August, 1985. 
10 
