GRAMl~IATICAL AND UNGRAMMATICAL STRUCTURES IN USER-ADVISER DIALOGUES1 
EVIDENCE FOR SUFFICIENCY OF RESTRICTED LANGUAGES IN NATURAL 
LANGUAGE INTERFACES TO ADVISORY SYSTEMS. 
Raymonde Gulndon 
Aficeoelectroni~ and Computer Technology Corporation 
P.O. Boz gOOlg5 
Austin, Te=a., 787~0 
guindonOmcc.com 
1 Kelly Shuldberg 
University of Teza.~, Austin O/19\[CC 
Joyce Conner 
~V\[icroelectronic~ and Computer Technology Corporation 
ABSTRACT 
User-adviser dialogues were collected in a typed Wizard- 
of-Oz study (=man-behind-the-curtain study*). Thirty-two 
users had to solve simple statistics problems using an un- 
familiar statistical package. Users received help on how to 
use the statistical package by typing utterances to what they 
believed was a computerized adviser. The observed limited 
set of users' grammatical and ungrammatical forms 
demonstrates the sufficiency of a very restricted grammar of 
English for a natural language interface to an advisory sys* 
tem. The users' language shares many features of spoken 
face-to-face language or of language generated under real- 
time production constraints (i.e., very simple forms of 
utterances). Yet, users also appeared to believe that the 
natural language interface could not handle fragmentary or 
informal language and users planned or edited their language 
to be more like formal written language (i.e., very infrequent 
fragments and phatics). Finally, users also appeared to 
believe in poor shared context between users and com- 
puterized advisers and referred to objects and events using 
complex nominals instead of faster-to-type pronouns. 
INTRODUCTION 
It has been azgued that natural language interfaces with 
very rich functionality are crucial to the effective use of ad- 
visory systems and that interfaces using formal languages, 
menus, or direct manipulation will not suffice (Finin, Joshi, 
and Webber, 1986). Designing, developing, and debugging a 
rich natural language interface (its parser, grammar, 
recovery strategies from unparsable input, etc.) are time- 
consuming and labor-intensive. Nevertheless, natural lan- 
guage interfaces can be quite brittle in the face of uncon* 
strained input from the user, as can be found in applications 
such as user-advising. One step toward a solution to these 
problems would be the identification of a subset of gram* 
matical and ungrammatical structures that correspond to the 
language generated by users in any user-advising situations, 
irrespective of the domain. This subset could be used to 
design a core grammar, strategies to handle ungrammatical 
input, and some parsing heuristics portable to any natural 
language interface to advisory systems. This strategy would 
increase the habitability of the natural language interface 
(Watt, 1968; Trawick, 1983) and reduce its development 
cost. 
An important feature of this restricted subset is its in- 
dependence from a particular domain (e.g., statistics, 
medicine}, making it portable. This is in contrast with 
another strategy which also capitalizes on restricted subsets 
of English, the use of sublanguages. There are naturally oc- 
curring subsets of English, usually associated with a par- 
ticular domain or trade that have been called sublanguages 
(Harris, 1968; Kittredge, 1982). Sublanguages are charac- 
terized by distinctive specialized syntactic structures, by the 
occurrence of only certain domain-dependent word subclasses 
in certain syntactic combinations, and by the inclusion of 
specific ungrammatical forms (Sager, 1982). However, the 
association of • sublanguage with a particular domain and 
the emphasis on syntactic-semantic co-restrictions reduce the 
portability of a grammar defined on such a sublanguage. 
This paper presents an empirical characterization of 
users' language in an user-advising situation for the purpose 
of defining a domain-independent restricted subset of gram- 
matical and ungrammatical structures to help design more 
habitable natural language interfaces to advisory systems. 
This paper also presents an interpretation of the factors that 
cause users to naturally limit themselves to a very restricted 
subset of English in typed communications between users and 
computerized advisers. We believe these factors will be 
found in any typed communications between users and ad- 
visers for the purposes of performing a primary task. Hence, 
the restricted subset of English should be general to any such 
situations. 
A STUDY OF USER-ADVISER DIALOGUES 
IN A WIT.ARDoOF-OZ SETTING 
METIIOD ~ PROCEDURE 
Thirty-two graduate students with basic statistical 
knowledge were asked to solve up to eleven simple statistics 
problems. Participants had to use an unfamillar statistical 
package to solve the problems. The upper window of the 
participants' screen was used to perform operations with the 
statistical package and the lower window was used to type 
utterances to the adviser. The participants were instructed 
to ask help in English from what they believed was a com- 
puterized adviser by typing in the help window. Tile 
participants' and adviser's utterances were sent to each 
other's monitor and the utterances were recorded and time- 
stamped automatically to files. 
INow at Automated Language Processing Systems, 
Provo, Utah 
41 
RESULTS AND COMPARISON TO OTIIER 
STUDIES 
We are reporting only a small subset of our results, those 
to be compared to the results of Thompson (1980) and of 
Chafe (~.982). The comparison is to identify the grammatical 
and ungrammatical specializations specific to users' language 
with advisory systems and to help determine what features 
of user-advising situations might encourage or cause such 
specializations of structures. Chal'e (1982) investigated Infor- 
mal Spoken language (i.e., dinner table conversations) and 
formal written language (i.e., academic papers). Thompson 
(1980), in her second study, compared three types of 
dialogues, Spoken Face-to-Face, Typed Human-Human 
(terminal-to-terminal) with both conversants knowing their 
counterpart was human, and Human-Computer using the 
REL natural language front-end. The task was information 
retrieval. 
The data table report two sets of data, the percentage of 
utterances with a particular form (e.g., one or more Frag- 
ments, one or more phatics) to compare to Thompson's 
results and the corresponding number of occurrences of this 
form per 1000 words to compare to Chafe's results. When 
numbers are omitted from the tables, the corresponding data 
were not collected by Thompson or Chafe. Note that the 
reported data are only about users' utterances, and not the 
adviser's utterances. We will use typed user-adviser 
dialogues and Wizard.of-Oz condition to refer to the data of 
our study. 
Completeness and Formality of Users' 
Utterances 
As can be seen in Table I, for completeness (i.e., 
fragments) and formality (i.e., phatics and and-connectors) 
users' utterances with advisory systems are more like 
Human-Computer dialogues and Formal Written language 
than Spoken Face-to-Face or Typed Human-Human 
dialogues. 
Table 1: Completenes~ and Formality 
f - \[ 
CIs~ Humst- r? psql ~lm~m~s 
)'l"J|m~M 24'11, .... i1~k 27 74% 74 $9"& @2 
I~l~la~ .lqk | It 2J 41b • $~ $$ 111441 114 
~t4 ,d~ ~tae~ler iq, I 14 ~4 
Users avoided casual forms of language since they 
produced only 24% of fragmentary utterances, as opposed to 
74% in Typed Human-Human dialogues, but similar to 19% 
in the Human-Computer condition. Similarly, we found 2% 
of utterances with phatics, as opposed to 59% in the Typed 
Human-Human dialogues, but similar to 4% in the Human- 
Computer dialogues. Likewise, Chafe \[1980) found no 
phatics in Formal Written discourse, but .,bout 23 per 1000 
words in informal speech. There is a similar finding for and- 
connectors. 
Users in the typed user-adviser dialogues seem to expect 
the interface to be unable to handle fragmentary input such 
~s found in Informal Spoken language and planned or 
edlted their language to be as complete and formal as 
in the Human-Computer dialogues, and more complete and 
formal than the language in Typed Human-Human dialogues. 
This is the case even though the Wizard in our study hardly 
ever rejected or misunderstood any users' utterance, no mat- 
ter how fragmentary or ungrammatical it was. However, 
when conversants know that their counterpart is another 
human, their language contains a large percentage of frag- 
ments and phatics, even when typed. So it appears that a 
priori beliefs about the nature and abilities of the adviser 
(i.e., this is not a human) can determine the characteristics of 
the language produced by the user, even when task and lin- 
guistic performances by the adviser were not negatively af- 
letted by fragmentary language from the user. 
Ungrammatlealltles 
Even though users seemed to attempt to edit or plan 
their utterances to be more complete and formal, 31°~ of the 
utterances contained one or more ungrammaticalities 
(excluding spelling and punctuation mistakes, if included 
about 50~ of utterances were ungrammatical). The most 
frequent ungrammaticalities were Fragments (13% of ut- 
terances with part(s) of the utterance being one or more 
fragments), missing constituents (14~ of utterances with one 
or more determiners missing), and lack of agreement between 
constituents (5% of utterances). While users seemed to plan 
or to edit their language to be as complete and formal as in 
the Human-Computer dialogues, certain types of ungram- 
maticaiities were produced. Two possible interpretations of 
this finding are: I) Certain types of ungrammaticalities do 
not seem to be easily under the conversant's control and 
edited or planned to be avoided during the dialogue; 2) They 
correspond to a telegraphic language assumed to be under- 
stood by the interface. 
It would be interesting to find whether really two types of 
ungrammaticaiities exist, some that can be avoided under 
some planning and others that cannot be so easily avoided. 
However, it is unclear whether the purposeful avoidance of 
some ungrammaticaiities by users can be capitalized upon to 
reduce the need For sophisticated robust parsing us we do not 
know the cost from the users of avoiding certain types of un- 
grammaticalities. On the other hand, knowing the nature 
and frequency of the actual ungrammaticaiities produced by 
users, as they are provided by this study, Facilitates realizing 
robust parsing. 
General Syntactic Features 
As can be seen in Table 2, users' utterances in typed user- 
adviser dialogues resemble more spoken informal discourse 
than written formal discourse. The difference in number of 
occurrences per I000 words between the Wizard-of-OZ con- 
dition and the Informal Spoken condition is much less than 
the same difference between the Wizard-of-OZ and the For- 
mai Written conditions. 
Table 2: Occurrences per 1000 Words of 
Various Syntactic Features 
Wi~u~b.of-Oz (~ufe Chu4"c Inl'urmaJ bi~-ech Furmal tVnllen 
~emence * ,'nk, th 9 7 ! 7-25 
Pu.~vm Vuice 1.0 $.O 25.4 
Cuurdir~m Cunjunctiuas 6.7 $.X 2J.S 
At triimliv~ .-~dj4.~ ti~ ~ 3....1 J3.$ 1.34.9 
I"ir~ P~r'~n H.dcrcam 49.0 61-;5 4.6 
Nomis~.~liu¢~ 11.4 9.7 S$.S 
I (.~:* ur N,m~imdizml Veto .7 .0 i I 
~d~j~;t of Numi~ V~ i-2 .Ol 4.1 
,~ 0 .9 7.2 
R,~ -t.~iw ~ J.S 9.? 15.11 
Short, simple (g5% of our utterances were simple), active 
sentences, with few coordinations, few subordinations, few 
relative clauses, Few nominaiizations, (and deletion of deter- 
miners and unmarked agreement, see the section on 
Ungrammaticalities) characterize the language in typed user- 
adviser dialogues observed in our study. These same features 
are features of unplanned language, which are atso features 
or child language, which are also features of language 
produced under real-time production constraints (Ochs, 1079; 
Givon, 1970). 
42 
While formality and completeness of typed user-adviser 
dialogues resemble more Formal Written language, the 
general syntactic features of typed user-adviser dialogues 
resemble more Informal Spoken language. Formality and 
completeness appear to be independent properties of users' 
language from the general syntactic features, possibly 
planned independently. 
More important for the design of naturM language inter- 
faces, the observation that typed user-adviser dialogues 
resemble language produced under real-time production con- 
stralnts indicates that users are strained by typing utterances 
to request help to perform a primary task. This constrains 
the usability of natural language interfaces as interfaces to 
advisory systems. One needs to identify the conditions under 
which the benefits of obtaining help outweight the costs of 
typing in utterances to determine when natural language in- 
terfaces are effective interfaces to advisory systems. On the 
other hand, the natural restrictions on the language produced 
by the users appear generalizable to any situation where real- 
time production constraints exist, of which, we believe, any 
typed interaction to an advisory system for the purpose or 
performing a primary task is an instance. 
Features Due Specifically to the User-Advlslng 
Appllcatlon 
As can be seen in Table 3, there are less imperatives in 
user-advising dialogues because the user cannot request the 
adviser to perform a statistical operation. Moreover, we also 
observe a goal-directed language with frequent to infinitives 
(I want/need to ...) and to purpose clauses (What is the 
command to compute ...), much more frequent than in Infor- 
mal Spoken or Formal Written languages. We believe this is 
the only feature that appears to be specific to the advisory 
application, as opposed to be specific to communications un- 
der real*time constraints. However, the goal-directedness of 
the language may be specific to advisory systems for 
procedural tasks as opposed to more generaln information 
retrieval tasks. Of course, we are here excluding lexical 
restrictions because they are expected and uninteresting and 
syntactic-semantic co-restrictions because of the desire for 
easy portability. 
Table 3: Features Specific to Advising 
m,,,~ Cl~re CIm/e 
hnpcrJu~ 5.J% 19.0% 
Te Cmmple~'u 17.4 2.1l Ii,JII 
Complexity of Referring Expressions 
In our study, users produced mostly very simple sentence 
constructions, as if under real-time production constraints 
(e.g., users' utterances were short and 95% of them were 
simple (see the section on General Syntactic Features)). 
Nevertheless, very few pronouns occurred, 3% of utterances 
contained pronouns, similar to what was found in Formal 
Written Language, Human-Computer dialogues, and in 
Cohen, Pertig, ,~ Start (1982) in their typed terminal-to- 
terminal condition. This is surprising because pronouns are 
very short to type. However, there were very frequent com- 
plex nominals with prepositional phrases (e.g., a record of tAc 
li~ting of the names of the features). At least 50,C/o of the 
,:set-adviser utterances had one or more prepositional 
phrases..-ks can be seen in Table 4, most of the structurally 
ambiguous prepositional attachments arc to NPs, in fact, 
mostly to the most contiguous/nearest NP. So, users prefer 
longer to type complex nominals with explicit relations be* 
tween contiguous NPs over faster to type pronouns, even 
though there is evidence that they are operating under real- 
~.ime production constraints. Because pronominal noun 
phrases (and also deictic expressions) are so rare, it appears 
that users rely little on spatial context (i.e., the screen), lin- 
guistic context (i.e., the utterances produced so far), and task 
context (i.e., statistical commands typed so far) in producing 
referring expressions. One interpretation of this finding is 
that users believe that there is poor shared context between 
user and adviser when they do not share physical context (as 
in Formal Written language) or do not know the linguistic 
capabilities of the conversant (a~ in Human-Computer 
dialogues). So, while in unplanned discourse speakers rely 
more on the context to express propositions and use more 
pronouns than in planned discourse (Ochs, 1979) and while 
user-adviser dialogues exhibit many features of unplanned 
discourse, users did not capitalize on context in producing 
referring expressions. It appears that the referential func- 
tions in language can be planned independently of and are 
not necessarily subject to the same real-time production con- 
straints than the predicative and other functions of language. 
Again we are finding that typed user-adviser dialogues have 
some features of planned, Formal Written language but also 
have features of unplanned, Informal Spoken language. 
Table 4: Distribution of Propositional 
Attachments 
CtHmp~x NP 71 NP/Vp 120 Complex NP/VP !04 
Nemrt~ NP t7 131 {~l~r NPs 4 NP 72 C~npk~ NP-nL~r¢~ 
VI e 31 ('mnpknl N P-.~hcrs S 
.4nll~guow | 7 ~ P .14 
,,,u~igum~ 21 
Nevertheless, not only are most prepositional attach- 
ments to NPs to create precise description of objects, they 
are mostly to the most contiguous NP. This observation 
suggests that real-time production constraints nevertheless 
play some role in the production of referential expressions. 
Users appear to minimize resources allocated to the produc- 
tion of referentiM expressions by reducing short-term 
memory load by attachments to the lowest, most recent NF. 
This interpretation is supported by studies that show that it 
is easier to process right-branching structures than left- 
branching ones (Yngve, 1060). 
The finding that most prepositional phrases attach to 
NPs rather than VPs and moreover attach most often to the 
lowest, nearest NP is important for the semantic interpreta- 
tion of sentences because of the combinatorial explosion of 
possible attachments of prepositional phrases. 
DISCUSSION 
Users' utterances in typed user-adviser dialogues, when 
the users believe that the adviser is computerized, resemble 
Informal Spoken speech, except for referring expressions (i.e., 
frequent complex nominals) and for completeness and for- 
mality (i.e., few phatics and and-connectors, and relatively 
few fragments), in which case they resemble more Formal 
Written language. We would like to hypothesize that the 
grammatical and ungrammatical forms observed occur be- 
cause the communicative context and the application induce 
certain user's beliefs and goals and induce certain processing 
constraints which determine the most effective syntactic 
forms to communicate verbally. The communicative context 
describes dimensions of the situation in which the discourse is 
generated that are believed to affect the form of the dis- 
course. Examples of dimensions arc: interaction, the extent 
to which user and adviser can quickly interact, respond to 
each other; involvement, the extent to which the communica- 
tion is directed specifically to one person as opposed to an 
anonymous class of persons; spatial commonality, the degree 
to which the conversants see each other, see the same physi- 
cal environment, and know that they share this environment 
perceptually. As can be seen in Table 5, typed user-adviser 
dialogues in a Wizard-of-Oz setting are more similar to int'or- 
• real Spoken language on dimensions of interaction and in- 
43 
volvement, but more similar to Formal Written language on 
the dimension of spatial commonality. We would like to 
hypothesize that different values on these dimensions are as- 
sociated with different restricted languages produced by the 
users. Findings from Biber (1980) help support this 
hypothesis. He performed a factor analysis on 545 text 
samples. He uncovered the following three dimensions: 
• INTERACTIVE vs. Edited: High personal in- 
volvement and real*time constraints. 
• SITUATED vs. Abstract contexts: Reliance on 
external situation, concrete vs. detached and 
deliberate. 
• IMMEDIATE vs. Reported: Reference to a cur- 
rent situation vs. removed or past situation. 
Table ,5: Communicative Context Parameters 
Im**fmsl ~ Termmel-ia-T~m,dud wLL~rd id Ui Feesa~ wnuel 
M~ily ~ *rims ,,nt~m m-.Im 
Ioternct~= hiO *an *w~ 
~hmr~l k~iewk~lce *N ,= *~, a*m~*m ** ~ ~ *~ amm*m Lm ** toga 
From the set of features reported by Biber that loaded 
highly on the three dimensions, user-adviser dialogues had 
features of both interactive texts (e.g., many Wh-questions, 
many first person references, final prepositions) and edited 
texts (e.g., few phatics). This is because user-adviser 
dialogues, while written by users uncertain about the 
interface's ability to handle fragmentary and informal input, 
have a high degree of interaction and involvement of the 
conversants. The syntactic features observed in user-adviser 
dialogues overlapped greatly with the features of sltuated 
texts (e.g., few passives and nominalizations), except for the 
frequent use of complex nominals and unfrequent use of 
pronouns an deictic expressions, and of Immedlate texts 
(e.g., use of present tense, few third person pronouns). The 
complexity of referring expressions uncovers a dimension not 
revealed in Biber's work: the degree of believed shared 
knowledge by the converSants. Our users seemed to ~ssume 
poor shared knowledge and relied on complex referring ex- 
pressions to insure successful communication. Another 
dimension is the conversants' belief in the ability of their 
counterpart to handle fragmentary or informal language. 
Informal Spoken face-to-face language is often unplanned, 
interactive, situated, immediate, and subject to real-time 
production constraints. So are users' typed utterances to ad- 
visory systems. However, unlike Informal Spoken face-to* 
face language, users believe that there is poor shared context 
between conversants and rely little on context in producing 
referring expressions and users do not assume that the inter- 
face can handle fragmentary or informal language. 
We would like to conclude by making the hypothesis that 
any typed terminal-to*terminal user-adviser dialogues will be 
similar to Informal Spoken language, as wiLs observed in our 
study, because they are under the same communicative con- 
text and application. This provides a subset of grammatical 
and ungrammatical forms that can be used to define a core 
grammar portable to most user-advising situations, irrespec- 
tive of the domain. On the other hand, the complexity of 
referring expressions and the degree of completeness and for- 
mality of language may differ according to the users' beliefs 
about the linguistic capabilities of the interface. , 
ACKNOWLEDGEMENTS 
We wish to thank Elaine Rich, Kent Wittenburg, and 
Gregg Whittemore for useful comments on this research 
project. We also thank Sherry Kalin, Hans Brunner, and 
Gregg Whittemore for their help in collecting or analyzing 
the dialogues between users and adviser. 
REFERENCES 
Biber, D. (1986). Spoken. and written textual dimensions 
in English. Lan9uage, 6~ (2), 384-414. 
Chafe, W.L. (1982). Integration and involvement in 
speaking, writing, and oral literature. In D. Tannen (Ed.), 
Spokevx and written language: Ezploring orality and 
literacy.. Norwood, N J: Ablex. 
Cohen, P.R., Pertig, S., .~" Starr, K. (1982). Dependencies 
of discourse structure on the modality of communication: 
Telephone vs. teletype. Proceedings of the ~Oth Annual 
,~\[eeting of the Aaaoclation for Gomputatlonal Linguistics. 
University of Toronto, Ontario, Canada. 
Finin, T.W., Joshl, A.K., ~ Webber, B.L. (1986). Natural 
language interactions with artificial experts. Proceedings of 
the IEEE, 7J, 7, 921-938. 
Givon, T. (1979). From discourse to syntax: Grammar 
a~ a processing strategy. In T. Givon (Ed.), Syntaz and 
Semantics: Discourse and syntaz. New York: Academic 
Press. 
Grishman, R., Hirshman, L., .~ Nhan, N.T. (1086) Dis- 
covery Procedures for Sublanguags Selectional Patterns: In- 
itial Experiments. Computational Linguistics, I~3). 
Haxris, Z.S. (1968). A\[athematical Structures in 
Language. New York: Wiley (Interscience). 
Kittredge, R. (1982). Variation and Homegeneity of Sub- 
languages. In R. Kittredge ~ J. Lehrberger (Eds.), 
Sublanguage: Studies of Language in Restricted Semantic 
Domains. New York: Walter de Gruyter ~ Co. 
Ochs, E. (1979). Planned sad unplanned discourse. In 
T. Givon (Ed.), Syntaz and Sevnantlc$: Discourse and 
syntaz. New York: Academic Press. 
Sager, N. (1982). Syntactic Formatting of Science Infor- 
mation. In R. Kittredge -~ J. Lehrberger (Eds.), Sublanguage: 
Studies of Language in Restricted Semantic Domains. New 
York: Walter de Gruyter 2~ Co. 
Thompson, B.H. (1980). Linguistic analysis of natural 
language communication with computers. Proceedings of the 
~h International Conference on Computational 
Linguistics. Tokyo, Japan. 
Trawick, D.J. (1983). Robust Sentence Analysis and 
Habitability. Doctoral Dissertation, California Institute of 
Technology, Pasadena. 
Watt, W.C. (1968). Habitability. American Documen. 
ration, /g(3), 338-351. 
Yngve, V. (1980). A model and an hypothesis for lan- 
guage structure. Proceddings of the American Philosophical 
Society. 
44 
