Simple NLP Techniques for Expanding Telegraphic Sentences 
Kathleen F. McCoy 
CIS Department 
University of Delaware 
Newark, DE 19716 
mccoy@cis, udel. edu 
Abstract 
Some people have disabilities which make 
it difficult for them to speak in an un- 
derstandable fashion. The field of Aug- 
mentative and Alternative Communication 
(AAC) is concerned with developing meth- 
ods to augment the communicative abil- 
ity of such people. Over the past 9 
years, the Applied Science and Engineer- 
ing Laboratories (ASEL) at the University 
of Delaware and the duPont Hospital for 
Children, has been involved with applying 
natural language processing (NLP) tech- 
nologies to the field of AAC. One of the 
major projects at ASEL (The COMPAN- 
SION project) has been concerned with the 
application of primarily lexical semantics 
and sentence generation technology to ex- 
pand telegraphic input into full sentences. 
While this project has shown some very 
promising results, its direct application to 
a communication device is somewhat ques- 
tionable (primarily because of the compu- 
tational power necessary to make the tech- 
nique fast). This paper describes some of 
the problems with bringing Compansion to 
a standard communication device and in- 
troduces some work being done in conjunc- 
tion with the Prentke Romich Company 
(PRC) (a well known communication de- 
vice manufacturer) on developing a pared- 
down version of Compansion for people 
with cognitive impairments. 
1 Introduction 
Some people have disabilities which make it difficult 
for them to speak in an understandable fashion. The 
field of Augmentative and Alternative Communica- 
tion (AAC) is concerned with developing methods 
to augment the communicative ability of such peo- 
ple. In addition to problems that make "speaking" 
difficult, AAC users often have difficulties in coor- 
dinating extremities (so typing on a standard key- 
board may be impossible and access to large keys is 
often very slow). Cognitive difficulties may also be 
present. The field of AAC is concerned with develop- 
ing methods that provide access to communicative 
material under reasonable time and cognitive con- 
straints. 
Over the past 9 years, the Applied Science and En- 
gineering Laboratories (ASEL) at the University of 
Delaware and the duPont Hospital for Children, has 
been involved with applying natural language pro- 
cessing (NLP) technologies to the field of AAC. One 
of the major projects at ASEL (The COMPANSION 
project) has been concerned with the application of 
primarily lexical semantics and sentence generation 
technology to expand telegraphic input into full sen- 
tences (McCoy et al., 1989), (Demasco and McCoy, 
1992), (McCoy et al., 1994). 
The project can best be thought of as a rate en- 
hancement technique used in the context of a writ- 
ing tool. Assuming the user is selecting full words 
at a time (so time of word selection is basically con- 
stant and is independent of the number of letters iri 
the word), the technique shows the most gain when 
used by a linguistically sophisticated user who de- 
sires well-formed English constructions. The system 
speeds rate by allowing the user to select basic con- 
tent and having the system provide expansions into 
well-formed sentences. The user may then select 
among the generated expansions with 1 additional 
keystroke (for example). 
Consider the following input: 
Mary think 3 watch give John Andrew. 
expanded as: 
Mary thinks that the 3 watches were given 
to John by Andrew. 
17 
Notice that assuming a root word can be selected 
in a single keystroke and endings added with ad- 
ditional keystrokes, the initial input would take 7 
keystrokes, while the expanded version would have 
required 16. 
The Compansion prototype contains three pro- 
cessing modules and requires a great deal of lexical 
knowledge: 
Word Order Parser - encodes a loose grammar 
of telegraphic sentences and determines and at- 
taches modifiers (e.g., 3 is an adjective which 
is modifying watch), determines part of speech 
information (e.g., think is a verb, watch is a 
noun), and passes sentence sized chunks to the 
next phase of processing (e.g., first 3 watch 
give John Andrew would be passed on to the 
next phase, and then Mary thinks with the re- 
sult of the previous processing). 
Semantic Parser - uses semantic information as- 
sociated with words to create a semantic rep- 
resentation (Fillmore, 1968), (Fillmore, 1977), 
(Allen, 1995), (Palmer, 1984), (Hirst, 1987) for 
each sentence. E.g., verb frame information is 
associated with each verb. This information in- 
dicates which cases the verb is likely to have and 
the semantic type of words that are likely to fill 
those cases. Individual nouns have associated 
semantic types. 
Sentence Generator - creates an actual English 
sentence from the semantic representation (E1- 
hadad, 1991). In this phase the system at- 
tempts to keep the word order that was orig- 
inally input. 
While the Compansion prototype is viewed as 
a promising and successful application of NLP to 
AAC, it raises some questions when viewed as a 
practical AAC system. 
• Unlimited Vocabulary - Compansion relies on 
having a large amount of semantic information 
associated with each word for the processing 
within the semantic parser. We have been in- 
vestigating gathering as much of this informa- 
tion as possible through online lexical resources 
(Zickus, 1995), (Zickus et al., 1995), but much 
of this information must still be hand encoded. 
This is particularly true of verbs which are the 
cornerstone of the semantic reasoning. While 
some information on noun semantic categoriza- 
tion can be gleaned from online lexical resources 
such as WordNet (Miller, 1990; Miller, 1995), it 
is well beyond the state of the art to glean the 
kind of verb semantics necessary from online re- 
sources. 
Sophisticated Grammatical Input - Sophisti- 
cated writers are apt to want to use compli- 
cated grammatical constructions which may lie 
outside the processing ability supplied by the 
Compansion technique. In some instances the 
fault may be that the system has not yet been 
programmed to deal with certain constructions 
(e.g., certain kinds of verb compliments). Such 
deficiencies can be remedied. Much more se- 
rious is the possibility of some grammatical 
constructions not being understandable in tele- 
graphic form even to a human reader. For ex- 
ample, relative clauses in telegraphic input may 
be impossible for a human to interpret correctly 
(at least, unless a great deal of world knowledge 
information is applied). Thus, there is a limit to 
the sophistication of grammatical constructions 
that are possible to disambiguate in telegraphic 
input. 
Sophisticated Grammatical Output - The sen- 
tence generator used by Compansion relies on 
a grammar which necessarily encodes some lim- 
ited set of grammatical constructions. 
These problems, coupled with the relatively low 
processing power and space on devices used for AAC, 
led us to question whether or not NLP is possible in 
viable AAC devices. Notice that the rate enhance- 
ment power of Compansion is heightened when so- 
phisticated linguistic constructions are used. On the 
other hand, it is exactly the situations where such 
constructions are most used that the other problem 
areas of the system are most prevalent. 
The question is: Are there uses for techniques 
such as Compansion that avoid some of the above 
mentioned problems? Is it feasible and beneficial to 
provide some kind of "pared-down" version of Com- 
pansion on an AAC device? 
In conjunction with the Prentke Romich Company 
(PRC) (a well known communication device man- 
ufacturer) we have been working on developing a 
pared-down version of Compansion for people with 
cognitive impairments (McCoy et al., 1997). In the 
next section we motivate the focus on this popu- 
lation. We indicate that not only might the tech- 
nique prove very useful for this population, but by 
focusing on this population some of the problems 
with Compansion can be eliminated. We describe 
our processing (a simplification of the processing in 
Compansion) and note some challenges that still re- 
main. 
18 
2 The Need: Target Population 
Description 
One population of AAC users that might greatly 
benefit from expanded telegraphic input are those 
who are young in age but who suffer from some 
cognitive impairments which affect their expressive 
language production. According to (Kumin, 1994) 
and (Roth and Casset-James, 1989), language pro- 
duction from a population that has such cognitive 
impairments includes: (1) short telegraphic utter- 
ances; (2) sentences consisting of concrete vocab- 
ulary (particularly nouns); (3) morphological and 
syntactical difficulties such as inappropriate use of 
verb tenses, plurals, and pronouns; (4) word addi- 
tions, omissions, or substitutions; and (5) incorrect 
word order. While people exhibiting these kinds of 
production problems may be understandable, they 
will often be perceived negatively in both social and 
educational situations. Therapy is often geared to- 
ward developing strategies that overcome or com- 
pensate for these production problems. Children 
who use AAC and have these kinds of difficulties 
face additional problems over speaking children with 
the same impairments both because they have addi- 
tional obstacles in accessing language elements (i.e., 
language elements must be accessed through a de- 
vice) and because language and literacy acquisition 
are not well understood in children who use AAC. 
Because of this, it is not clear what kind of interven- 
tions will be effective with these children. 
Our aim is to provide an AAC device which will 
be useful to this population both by allowing their 
output to be more standard, and as a potential lan- 
guage intervention (therapy) tool. 
3 Challenges with This Population 
Before any intelligent device can be developed for 
this population, the problem of lexical access must 
be solved. That is, we must find a method that en- 
ables the user to select the lexical items that they 
wish to communicate. The speech output commu- 
nication aids that PRC designs for commercial use 
incorporate an encoding technique called semantic 
compaction, commercially known as Minspeak R (a 
contraction of the phrase "minimum effort speech") 
(Baker, 1982), (Baker, 1984). Minspeak R is ulti- 
mately an abbreviation expansion system, but it is 
designed to eliminate much of the cognitive load as- 
sociated with abbreviation expansion. In using ab- 
breviation expansion, users are required to memo- 
rize a set of abbreviations that, if typed, will be 
expanded into full words. For example, the user 
might type "t " for CCthe " and ~Csch " for 
' 'school ' '. Of course, there is a tremendous 
amount of cognitive load associated with memoriz- 
ing abbreviations (and with developing memorable 
abbreviations for a large number of words). Suffice 
it to say that regular abbreviation expansion is not 
a viable option for the population being considered 
here. 
Minspeak x deals with the cognitive load associ- 
ated with standard abbreviation systems by form- 
ing abbreviations using multi-meaning icons rather 
than letters. Because the icons are rich in mean- 
ing and associations, a small number of icons (keys) 
can be used to represent a large vocabulary where 
each item can be selected using a memorable se- 
quence of 2-3 icons. The success of the general 
Minspeak R paradigm of vocabulary access led PRC 
to start designing tailored prestored vocabulary pro- 
grams known as Minspeak Application Programs 
(MAPs TM) for specific populations of users. These 
programs are concerned with providing both an ap- 
propriate vocabulary and a set of icons appropriately 
placed on the keyboard so as to allow communica- 
tion in an "automatic" fashion. 
One of these MAPsTM(the Communic-Ease 
MAP TM) was developed for users chronologically 
10 or more years of age with a language age of 5- 
6 years. This MAP provides access to a basic vo- 
cabulary and has proven to be an effective inter- 
face for users in our target population. It provides 
access to approximately 580 single words divided 
into 38 general categories. Most of these words are 
coded as 2-icon sequences. The first icon in the se- 
quence (the category icon) establishes the word cat- 
egory. For example, the <SKULL> icon indicates a 
body part word, the <MASKS> icon indicates a feel- 
ing word, and the <APPLE> icon indicates a food 
word. The second icon denotes the specific word. 
For example, <MASK> followed by <SUN> produces 
the word "happy"; <APPLE> followed by <APPLE>. 
produces the word "eat". The learning and use 
of icon sequences is facilitated by the incorporation 
of "icon prediction". In icon prediction the user is 
"prompted" for valid icon sequences using lights on 
keys. For example, once the first icon is hit (e.g., 
<MASK>) lights will appear on icons that lead to 
a word (e.g., all icons that complete a valid feeling 
word will be lit). 
In addition to the words which are accessed 
via the icon sequences, Communic-EaseTMcontains 
some morphology and allows the addition of end- 
ings to regular tense verbs and regular noun plurals. 
However, note that to accomplish this, additional 
keystrokes are required. Also, it is possible to spell 
words that are not included in the core vocabulary. 
19 
In practice, however, users with either slow access 
methods or poor language ability tend to produce 
telegraphic messages consisting of sequences of core 
vocabulary items without embellishing morphology. 
Our project builds a processing method on top of 
the Communic-Ease MApTMwhich will expand the 
telegraphic/mis-ordered input on the part of the user 
into well formed sentences. Notice that developing a 
~ystem for this particular population will overcome 
some of the difficulties faced with the general Com- 
pansion system built as a writing tool for people with 
sophisticated linguistic ability. First, this popula- 
tion will rely on a limited vocabulary. As was noted 
above, the users of this particular vocabulary access 
system generally use only the vocabulary items pro- 
grammed into the Communic-Ease MAP TM. While 
the method allows them to spell any word, in actu- 
ality spelling is rather limited and the spelled vocab- 
ulary items may be easier to anticipate. Second, the 
output structures of the sentences will not require so- 
phisticated syntactic constructions. This population 
requires limited output structures comprised primar- 
ily of fairly simple sentence structures. Finally, the 
system will not face the same sorts of input problems 
described in conjunction with the Compansion sys- 
tem. Again, this primarily follows from the language 
sophistication of the chosen population. 
On the other hand, this population of users does 
bring with it other difficulties. For example, it is 
likely that users may produce unusual sentence in- 
put. While we do not expect to see the same sorts of 
complications with the input described above with 
respect to Compansion, it is likely that the input 
will display unusual characteristics. For example, 
with linguistically sophisticated users we expect the 
input word order to mirror the desired surface form. 
This assumption does not hold for users with cogni- 
tive impairments. Thus we must carefully study this 
population to determine exactly what kind of input 
to expect. 
Other problems faced by this system have to do 
with the ability of the user to handle the decisions 
that are required of them. For instance, it may be 
the case that selecting from a set of expanded sen- 
tences may prove very difficult for this population 
who may become confused or may be unable to re- 
tain their desired sentence when given a list of sen- 
tences to choose from. Thus it becomes extremely 
important that the system present appropriate ex- 
pansions and that these expansions be ordered using 
a heuristic that accurately predicts the most likely 
expansion. 
Because of the cognitive impairments, it is also 
likely that the user will have a great deal of diffi- 
culty if the system acts in unexpected ways. Thus, 
the system must be extremely robust and capable of 
handling any input given to it. 
Finally, the system's interface must be carefully 
designed so as to make it easy for users to learn 
and use. In addition, the system must be fast and 
runnable on relatively inexpensive and portable PC 
platforms so as to make it cost effective. 
4 Simple Techniques 
In this project we have decided to collapse the three 
levels of processing found in Compansion into one 
level. The system is implemented in C++ for econ- 
omy of memory and for speed and compatibility con- 
siderations. 
The major processing in the system takes place in 
an augmented transition network type of grammar. 
The network itself encodes a grammar of the tele- 
graphic input expected from this population. The 
tests in the grammar may be made on the basis of 
syntactic or semantic features stored in the lexicon 
on each word. Some of the actions in the gram- 
mar are responsible for manipulating a particular 
register which encodes the "generated string" or ex- 
pansion associated with each state in the network. 
Thus these actions are responsible for adding deter- 
miners etc. Sets of registers axe also maintained for 
recording semantic aspects of the partial sentence 
(e.g., information such as what word is the agent). 
This information is primarily used to reconstruct an 
appropriate expansion if later input indicates inap- 
propriate decisions were made in earlier states of the 
parse. 
The augmented transition network formalism was 
chosen for this work mainly because it allowed par- 
allel traversal of all possible parses and therefore the 
ability to predict next input words. This allowed us 
to extend the icon prediction mechanism described 
in the Communic-Ease MApTMto the word level. 
For instance, in a situation where the user has typed 
an adjective, only icons that begin valid next words 
(e.g., adjective, noun) will be highlighted thus fa- 
cilitating learning. One can imagine this particular 
aspect of the system being expanded for therapy ses- 
sions. For example, it might be used to teach a user 
to use standard agent-verb-object sentences by high- 
lighting only words that fit into that pattern. 
5 Methodological Issues 
Our system functionality has been determined 
by a collection of transcripts from Communic- 
EaseTMusers. We have collected both raw keystroke 
data (so that we can establish the range of input 
20 
we expect from the population) and keystroke data 
from videotaped sessions where interpretations of 
the keystroke data are provided by a communication 
partner. This data allows us to ensure the output 
from the system is in fact appropriate. In addition 
it has been used to validate expected sentence struc- 
tures, validate the expectation that the core vocab- 
ulary will comprise most of the input, allow us to 
better anticipate the spelled vocabulary, and vali- 
date input expectations. 
Our methodology in using the data has been to 
partition it into several sets. First, some portion of 
each of the two kinds of data has been set aside for 
testing purposes. Thus it is not seen for purposes 
of system development. Because the videotaped ses- 
sions contain both input and its expansion, these 
are being used primarily as a means for tuning the 
expansion rules used in the grammar and the appro- 
priateness heuristics that order the expansions pro- 
duced by the system. We will attempt for the system 
to mimic the partner on the videotaped sessions. 
The raw keystroke data is being used in two ways. 
Most obviously it is being used to tune the gram- 
mar to the range of input. Secondly, some of the 
raw data is being associated with multiple interpre- 
tations deemed reasonable by one of the team mem- 
bers. These interpretations will be used to further 
tune the grammar expansions. 
Several evaluations of the completed prototype 
system are planned and made possible by the set- 
aside collected data. Ffi:st, the robustness of the 
grammar can be tested by determining the num- 
ber of completed input utterances found in the col- 
lected data that can be handled by the grammar. 
Second, the appropriateness of the grammar can be 
tested by determining how often the grammar's out- 
put matches the interpretation provided by the com- 
munication partner (in the video sessions containing 
interpretations by the partner) or by a human faced 
with the same sequence of words (for the raw data 
to which interpretations have been added). 
In addition to the theoretical grammar testing de- 
scribed above, we also plan an informal evaluation 
of the usability of the system. We plan to iteratively 
refine the interface by doing usability studies of our 
prototype with current users of the Communic-Ease 
MAP TM. We anticipate beginning this testing dur- 
ing the summer and fall of 1997. 
6 Conclusions 
We have motivated and described a system that is 
under development via a joint venture between the 
Applied Science and Engineering Laboratories of the 
University of Delaware and the duPont Hospital for 
Children and the Prentke Romich Company. Impor- 
tant features of the effort include a multidisciplinary 
team with technical expertise in various areas in- 
cluding NLP and clinical expertise with the target 
population. This effort focuses on a particular user 
population which enables us to constrain the system 
processing sufficiently to make the NLP application 
feasible. Our effort involves designing the system 
around the specific needs and abilities of the partic- 
ular population. 
Characteristics of the language used by the par- 
ticular population being studied has permitted us to 
apply some simple NLP techniques which are prov- 
ing to be sufficiently robust for this task. We an- 
ticipate the addition of some statistical reasoning 
particularly as a heuristic for ordering possible ex- 
pansions that the system deems appropriate. 
The system is in its prototype stage and currently 
consists of a PRC Liberator (a standard piece of 
hardware which provides access to vocabulary items 
through the Communic-EaseTMMAP) attached to a 
Pentium-based desktop PC running the user inter- 
face with Windows NT and a software-based text- 
to-speech synthesizer. When the final details of the 
user interface and the other software components are 
worked out, the completed prototype will be imple- 
mented on a tablet-based PC and will be field tested 
with current users of the Communic-EaseTMMAP. 
7 Acknowledgments 
This work has been supported by a Small Business 
Research Program Phase I Grant from the Depart- 
ment of Health and Human Services Public Health 
Service, and a Rehabilitation Engineering Research 
Center Grant from the National Institute on Dis- 
ability and Rehabilitation Research of the U.S. De- 
partment of Education (#H133E30010). Additional 
support has been provided by the Nemours Founda~ 
tion. 
The author would like to thank Arlene Badman, 
Patrick Demasco, David Hershberger, Clifford Kush- 
ler, and Christopher Pennington for their collab- 
oration on the design and implementation of this 
project. In addition we thank John Gray for his dis- 
cussions and implementation of many of the C++ 
aspects of the system, and Marjeta Cedilnik for her 
work on the grammar (and transformation rules). 

References 
Allen, James. 1995. Natural Language Understand- 
ing, Second Edition. Benjamin/Cummings, CA. 
Baker, B. 1982. Minspeak. Byte, page 186ff, 
September. 
Baker, B. 1984. Semantic compaction for sub- 
sentence vocabulary units compared to other en- 
coding and prediction systems. In Proceedings of 
the l Oth Conference on Rehabilitation Technology, 
pages 118-120, San Jose CA. RESNA. 
Demasco, Patrick W. and Kathleen F. McCoy. 1992. 
Generating text from compressed input: An in- 
telligent interface for people with severe mo- 
tor impairments. Communications of the ACM, 
35(5):68-78, May. 
Elhadad, Michael. 1991. FUF: The universal uni- 
fier user manual version 5.0. Technical report, 
Columbia University, Computer Science Depart- 
ment. 
Fillmore, C. J. 1968. The case for case. In E. Bach 
and R. Harms, editors, Universals in Linguistic 
Theory, pages 1-90, New York. Holt, Rinehart, 
and Winston. 
Fillmore, C. J. 1977. The case for case reopened. In 
P. Cole and J. M. Sadock, editors, Syntax and Se- 
mantics VIII: Grammatical Relations, pages 59-- 
81, New York. Academic Press. 
Hirst, Graeme. 1987. Semantic interpretation and 
the resolution of ambiguity. Cambridge University 
Press, Cambridge. 
Kumin, L. 1994. Communication skills in children 
with Down Syndrome: A guide for parents. Wood- 
bine House, Rockville, MD. 
McCoy, K., P. Demasco, Y. Gong, C. Pennington, 
and C. Rowe. 1989. Toward a communication 
device which generates sentences. In Proceed- 
ings of the 12th Annual Conference, New Orleans, 
Louisiana, June. RESNA. 
McCoy, Kathleen F., Patrick W. Demasco, Mark A. 
Jones, Christopher A. Pennington, Peter B. Van- 
derheyden, and Wendy M. Zickus. 1994. A com- 
munication tool for people with disabilities: Lex- 
ical semantics for filling in the pieces. In Proceed- 
ings of the First Annual ACM Conference on As- 
sistive Technologies (ASSETS94), pages 107-114, 
Marina del Ray, CA:. 
McCoy, Kathleen F., Patrick W. Demasco, Christo- 
pher A. Pennington, and Arlene Luberoff Bad- 
man. 1997. Some interface issues in developing 
intelligent communication aids for people with dis- 
abilities. In Proceedings of the 1997 International 
Conference on Intelligent User Interfaces, IUI97, 
Orlando, Florida. 
Miller, G. A. 1990. WordNet: An on-line lexical 
database. International Journal of Lexicography, 3(4). 
Miller, G. A. 1995. A lexical database for En- 
glish. Communications of the ACM, pages 39--41, 
November. 
Palmer, Martha S. 1984. Driving Semantics for 
a Limited Domain. Ph.D. thesis, University of 
Edinburgh. Chapter 2: Previous Computational 
Approaches to Semantic Analysis. 
Roth, F. P. and E. Casset-James. 1989. The 
language assessment process: Clinical implica- 
tions for individuals with severe speech impair- 
ments. Augmentative and Alternative Communi- 
cation, 5:165-172. 
Zickus, Wendy M. 1995. A software engineering 
approach to developing an object-oriented lexical 
access database and semantic reasoning module. 
Technical report 95-13, Department of Computer 
and Information Sciences, University of Delaware, 
Newark, DE. 
Zickus, Wendy M., Kathleen F. McCoy, Patrick W. 
Demasco, and Christopher A. Pennington. 1995. 
A lexical database for intelligent AAC systems. 
In Anthony Langton, editor, Proceedings of the 
RESNA '95 Annual Conference, pages 124-126, 
Arlington, VA. RESNA Press. 
