"Lh\]guistic" Sentences and "Real" Sentences 
Masaru "l'omlt~ 
Computer Science Department and 
Center for Machtne Translation 
Carnegie Mellon University 
This p~$r Id~ntift~ iwe klr¢ds of sentences: "llngulsUc" 
sentences ~J "r~al" a~-ntence~. "l't~,~ former is a kind of sentel~'~,e~ 
that ar~ often discussed in (cemputatkmal) Ungulstlc literatures, 
sucll as the:re In i:lgure 1. The latter, on the other hand, is a kind of 
senteace=~ that appear in practical applications, such as those In 
Flgu='e 2. Where~ts both are grammatical English sentences, they 
appear' to I:,e significantly different. In this paper, we discuss the 
characteristi~ of those two kinds of sentences, aud claim that a 
different approach is necessary to parse each Idnd of sentences. 
John hl~ Mary. 
Every man who owns a donkey beats It. 
I saw a man with a telescope. 
The horse raced past the barn fell. 
~llme fries like an arrow. 
Tbe n louse the cat the dog ctlased ate died. 
John persuaded Mary to expect that he believes 
tl=a't she likes art apple. 
Flflure 1: "Linguistic" Sentences 
All processes (programs) In the daetroyed window (ar Icol~) are 
killed (except nahuped wocesgas; see nohup(1) in the t/FLUX 
Iqefer'on,~); therefore, make ~ure you really wish ta destroy a 
window or an leon before you petfarm this t~ek. 
This wit,Jew coldeins an HP-UX shell (either a Bourne ~hell or 
C-shell, depending on the value of the SHELL. environment 
variable; for details, uen the "Concepte" section of the "Using 
Geminates" ahapter). 
Figure 2: "Real" Sentences 
It seems that problems in parsing sentences can be classified Into 
two categories: linguistically "Interesting" problems and linguistically 
"uninteresting" problems. Linguistically "Interesting" problems are 
thos~ for which there are no obvious solutions, and reasonably 
sophisticated theories are required to solve them, or those behind 
which there are general linguistic pdnclples, and a small number of 
general rules can cope wtth them (e.g., relatlvlzatlmr, 
causativlzation, ambiguity, movement, garden-path, etc). On the 
other hand, linguistically "uninteresting" problems are these for 
which there extst obvious solutions, or those behind which there Is 
no general linguistic principle, and It Is Just a matter of writing and 
adding rule,J to cope with these problems (e.g., punctuation, date 
and time expressions, Idioms, etc). 
Ftgures 3 and 4 show example "Interesting" and "uninteresting" 
problems, respectively. While she could give an elegant 
explanation of why the second sentence In figure 3 18 
ungramnlatl=;al, there Is no particular reason why "15th July" 18 
ungram~aatS~a, other tlrau that tt is simply not English. 
John ~lxpects Mary to kiss herself. 
* Johr, expects Mary to kiss himself. 
John ¢~xpects Mary to kiss him. 
Figure 3: An Interesting Probleul 
on July 15th 
on the fifteenth of July 
on 7/15 
* on I 5th July 
" In July 15ti~ 
FIqUh'(; 4: An Uninteresting Problem 
"Linguistic" sentences usually contain one or more linguistically 
Interesting problems, with few or no linguisctically uninteresting 
problems. "Real" sentences, on the other band, contain many 
uninteresting problems, but fewer interesting problems. In 
(computational) linguistic literatures, uninteresting problems can be 
Ignored, as long as everybody agrees that there are obvious 
solutions for them. In practical applications, on the other hand, we 
cannot ignore uninteresting problems, or systems simply do not 
work. 
One of the projects at the Center for Machine Translation at 
Carnegie-Mellon Unlverslly Is to translate personal computer 
manuals from English to Japanese and from Japanese to Englisb. 
In this project, and perhaps in ally other practical projects that have 
to deal with "real" sentences, the system's failures are caused by a 
few Interesting problems and tons of uninteresting problems. There 
often exist reasonable approximate solulions to Interesting problems 
lu practical applications; Ior example, it Is quite acceptable to 
assume that there are no embedded relative clauses In computer 
manuals, In order to simplify the (Interesting) problem of 
relatlvizatlon. On the other band, there are no quick solutions to 
uninteresting problems other than writing a bench of rules. 
• We can never anticipate and prepare for all of these 
unleteresting problems in advance. It seems as if there 
will be always these problems no matter how carefully 
and how many times we test and debug the system and 
Its grammar. 
• The quantity of the knowledge sources (i.e., 
grammars/rules) has to be very large; unlike Interesting 
problems, rules for uninteresting problems can hardly 
generalized into a smaller number of rules, as each of 
them represents an uninteresting problem with no 
general linguistic principles behind it. 
• It Is more difficult for humans Is test, debug, and 
maintain a larger amount of knowledge sources 
accurately and consistently. 
~, II Is more difficult for a system to access a larger 
amount of knowledge sources efficiently. 
These problems are much more serious than linguistically 
"Interesting" problems, and directly affect performance of practical 
syste~ns. 
