Ambiguity Resolution for Machine Translation of Telegraphic Messages I 
Young-Suk Lee 
Lincoln Laboratory 
MIT 
Lexington, MA 02173 
USA 
ysl@sst. II. mit. edu 
Clifford Weinstein 
Lincoln Laboratory 
MIT 
Lexington, MA 02173 USA 
cj w©sst, ll. mit. edu 
Stephanie Seneff 
SLS, LCS 
MIT 
Cambridge, MA 02139 USA 
seneff~lcs, mit. edu 
Dinesh Tummala 
Lincoln Laboratory 
MIT 
Lexington, MA 02173 USA 
tummala©sst. II. mit. edu 
Abstract 
Telegraphic messages with numerous instances of omis- 
sion pose a new challenge to parsing in that a sen- 
tence with omission causes a higher degree of ambi6u- 
ity than a sentence without omission. Misparsing re- 
duced by omissions has a far-reaching consequence in 
machine translation. Namely, a misparse of the input 
often leads to a translation into the target language 
which has incoherent meaning in the given context. 
This is more frequently the case if the structures of 
the source and target languages are quite different, as 
in English and Korean. Thus, the question of how we 
parse telegraphic messages accurately and efficiently 
becomes a critical issue in machine translation. In this 
paper we describe a technical solution for the issue, and 
reSent the performance evaluation of a machine trans- 
tion system on telegraphic messages before and after 
adopting the proposed solution. The solution lies in 
a grammar design in which lexicalized grammar rules 
defined in terms of semantic categories and syntactic 
rules defined in terms of part-of-speech are utilized to- 
ether. The proposed grammar achieves a higher pars- 
g coverage without increasing the amount of ambigu- 
ity/misparsing when compared with a purely lexical- 
ized semantic grammar, and achieves a lower degree of. ambiguity/misparses without, decreasing the pars- 
mg coverage when compared with a purely syntactic 
grammar. 
1 Introduction 
Achieving the goal of producing high quality machine transla- 
tion output is hindered by lexica\] and syntactic ambiguity of the 
input sentences. Lexical ambiguity may be greatly reduced by 
limiting the domain to be translated. However, the same is not 
generally true for syntactic ambiguity. In particular, telegraphic 
messages, such as military operations reports, pose a new chal- 
lenge to parsing in that frequently occurring ellipses in the cor- 
pus induce a h{gher degree of syntactic ambiguity than for text 
written in "~rammatical" English. Misparsing triggered by the 
ambiguity ot the input sentence often leads to a mistranslation 
in a machine translation system. Therefore, the issue becomes 
how to parse tele.graphic messages accurately and efficiently to 
produce high quahty translation output. 
In general the syntactic ambiguity of an input text may be 
greatly reduced by introducing semantic categories in the gram- 
mar to capture the co-occurrence restrictions of the input string. 
In addition, ambiguity introduced by omission can be reduced 
by lexicalizing grammar rules to delimit the lexical items which 
1This work was sponsored by the Defense Advanced Research 
Projects Agency. Opinions, interpretations, conclusions, and rec- 
ommendations are those of the authors and are not necessarily 
endorsed by the United States Air Force. 
~yrP iCally occur in phrases with omission in the given domain. A awback of this approach, however, is that the grammar cover- 
age is quite low. On the other hand, grammar coverage may be 
maximized when we rely on syntactic rules defined in terms of 
part-of-speech at the cost of a high degree of ambiguity. Thus, 
the goal of maximizing the parsing coverage while minimizing 
the ambiguity may be achieved by adequately combining lexi- 
calized rules with semantic categories, and non-lexicalized rules 
with syntactic categories. The question is how much semantic 
and syntactic information is necessary to achieve such a goal. 
In this paper we propose that an adequate amount of lex- 
ical information to reduce the ambiguity in general originates 
from verbs, which provide information on subcategorization, and 
prepositions, which are critical for PP-attachment ambiguity res- 
olution. For the given domain, lexicalizing domain-specific ex- 
pressions which typically occur in phrases with omission is ade- 
quate for ambiguity resolution. Our experimental results show 
that the mix of syntactic and semantic grammar as proposed 
here has advantages over either a syntactic grammar or a lexi- 
calized semantic grammar. Compared with a syntactic grammar, 
the proposed grammar achieves a much lower degree of ambigu- 
ity without decreasing the grammar coverage. Compared with 
a lexicalized semantic grammar, the proposed grammar achieves 
a higher rate of parsing coverage without increasing the ambi- 
guity. Furthermore, the generality introduced by the syntactic 
rules facilitates the porting of the system to other domains as 
well as enablin.g the system to handle unknown words efficiently. 
This paper is organized as follows. In section 2 we discuss 
the motivation for lexicalizing grammar rules with semantic cat- 
egories in the context of translating telegraphic messages, and 
its drawbacks with respect to parsing coverage. In section 3 we 
propose a grammar writing technique which minimizes the ambi- 
guity of the input and maximizes the parsing coverage. In section 
4 we give our experimental results of the technique on the basis 
of two sets of unseen test data. In section 5 we discuss system 
engineering issues to accommodate the proposed technique, i.e., 
integration of part-of-speech tagger and the adaptation of the 
understanding system. Finally section 6 provides a summary of 
the paper. 
2 Translation of Telegraphic Messages 
Telegraphic messages contain many instances of phrases with 
omission, cf. (Grishman, 1989), as in (1). This introduces a 
greater degree of syntactic ambiguities than for texts without 
any omitted element, thereby posing a new challenge to parsing. 
(1) 
TU-95 destroyed 220 nm. (~ An aircraft TU-95 was destroyed 
at 220 nautical miles) 
Syntactic ambiguity and the resultant misparse induced by 
such an omission often leads to a mistranslation in a machine 
translation system, such as the one described in (Weinstein et 
ai., 1996), which is depicted in Figure 1. 
The system depicted in Figure 1 has a language understanding 
module TINA, (Seneff, 1992), and a language generation module 
120 
LANGUAGE 
GENERATION 
GENESIS 
Figure 1: An Interlingua-Based English-to-Korean Machine 
Translation System 
GENESIS, (Glass, Polifroni and SeneR', 1994), at the core. The 
semantic frame is an intermediate meaning representation which 
is directly derived from the parse tree andbecomes .the input to 
the generation system. The hierarchical structure of the parse 
tree is preserved in the semantic frame, and therefore a misparse 
of the input sentence leads to a mistranslation. Suppose that 
the sentence (1) is misparsed as an active rather than a passive 
sentence due to the omission of the verb was, and that the prepo- 
sitional phrase 220 nm is misparsed as the direct object of the 
verb destroy. These instances of misunderstanding are reflected 
in the semantic frame. Since the semantic frame becomes the 
input to the generation system, the generation system produces 
the non-sensical Korean translation output, as in (2), as opposed 
to the sensible one, as in (3). 3 
(2) TU-95-ka 220 hayli-lul pakoy-hayssta 
TU-95-NOM 220 nautical mile-OBJ destroyed 
(3) TU-95-ka 220 hayli-eyse pakoy-toyessta 
TU-95-NOM 220 nautical mile-LOC was destroyed 
Given that the generation of the semantic frame from the parse 
tree, and the generation of the translation output from the se- 
mantic frame, are quite straightforward in such a system, and 
that the flexibility of the semantic frame representation is well 
suited for multilingual machine translation, it would be more de- 
sirable to find a way of reducing the ambiguity of the input text 
to produce high quality translation output, rather than adjust- 
ing the translation process. In the sections below we discuss one 
such method in terms of grammar design and some of its side 
effects.x 
2.1 Lexicalization of Grammar Rules with 
Semantic Categories 
In the domain of naval operational report messages (MUC-II 
messages hereafter), 4 (Sundheim, 1989), we find two types of 
ellipsis. First, top level categories such as subjects and the copula 
verb be are often omitted, as in (4). 
(4) 
Considered hostile act (= This was considered to be a hostile 
act). 
Second, many function words like prepositions and articles are 
omitted. Instances of preposition omission are given in (5), where 
z stands for Greenwich Mean Time (GMT). 
(5) 
a. Haylor hit by a torpedo and put out of action 8 hours (---- for 
8 hours) 
b. All hostile recon aircraft outbound 1300 z (= at 1300 z) 
If we try to parse sentences containing such omissions with the 
grammar where the rules are defined in terms of syntactic cat- 
egories (i.e. part-of-speech), the syntactic ambiguity multiplies. 
3In the examples, NOM stands for the nominative case 
marker, OBJ the object case marker, and LOC the locative 
postposition. 
4MUC-II stands for the Second Message Understanding Con- 
ference. MUC-II messages were originally collected and prepared 
by NRaD(1989) to support DARPA-sponsored research in mes- 
sage understanding. 
To accommodate sentences like (5)a-b, the grammar needs to al- 
low all instances of noun phrases (NP hereafter) to be ambiguous 
between an NP and a prepositional phrase (PP hereafter) where 
the preposition is omitted. Allowing an input where the copula 
verb be is omitted in the grammar causes the past tense form 
of a verb to be interpreted either as the main verb with the ap- 
propriate form of be omitted, as in (6)a, or as a reduced relative 
clause modifying the preceding noun, as in (6)b. 
(6) 
Aircraft launched at 1300 z ... 
a. Aircraft were launched at 1300 z ... 
b. Aircraft which were launched at 1300 z ... 
Such instances of ambiguity are usually resolved on the basis 
of the semantic information. However, relying on a semantic 
module for ambiguity resolution implies that the parser needs 
to produce all possible parses of the input text andcarry them 
along, thereby requiring a more complex understanding process. 
One way of reducing the ambiguity at an early stage of pro- 
cessing without relying on a semantic module is to incorporate 
domain/semantic knowledge into the grammar as follows: 
• Lexicalize grammar rules to delimit the lexical items which 
typically occur in phrases with omission; 
• Introduce semantic categories to capture the co-occurrence 
restrictions of lexical items. 
Some example grammar rules instantiating these ideas are 
given in (7). 
(7) 
a..locative_PP 
{at in near off on ...} NP 
headless_PP 
e..np_distance 
numeric nautical_mile 
numeric yard 
e..time_expression 
\[at\] numeric gmt 
b..headless_PP 
\[all np-distance a np_bearing 
d..temporal_PP 
(during after prior_to ...} NP 
time_expression 
f..gmt 
z 
(7)a states that a locative prepositional phrase consists of a 
subset of prepositions and a noun phrase. In addition, there is 
a subcategory headless_PP which consists of a subset of noun 
phrases which typically occur in a locative prepositional phrase 
with the preposition omitted. The head nouns which typically 
occur in prepositional phrases with the preposition omission are 
nautical miles and yard. The rest of the rules can be read in a 
similar manner. And it is clear how such lexicalized rules with 
the semantic categories reduce the syntactic ambiguity of the 
input text. 
2.2 Drawbacks 
Whereas the language processing is very efficient when a system 
relies on a lexicalized semantic grammar, there are some draw- 
backs as well. 
• Since the grammar is domain and word specific, it is not 
easily ported to new constructions and new domains. 
• Since the vocabulary items are entered in the grammar as 
part of lexicalized grammar rules, if an input sentence con- 
tains words unknown to the grammar, parsing fails. 
These drawbacks are reflected in the performance evaluation of 
our machine translation system. After the system was developed 
on all the training data of the MUC-II corpus (640 sentences, 12 
words/sentence average), the system was evaluated on the held- 
out test set of 111 sentences (hereafter TEST set). The results 
are shown in Table 1. The system was also evaluated on the 
data which were collected from an in-house experiment. For this 
experiment, the subjects were asked to study a number of MUC- 
II sentences, and create about 20 MUC-II-like sentences. These 
121 
Total No. of sentences 111 
No. of sentences with no 66/111 (59.5%) 
unknown words 
No. of parsed sentences 23/66 (34.8%) 
No, of misparsed sentences 2/23 (8:7%) 
Table 1: TEST Data Evaluation Results on the Lexicalized 
Semantic Grammar 
Total .No. of sentences 281 
No. of sentences with no 239/281 (85.1%) 
unknown words 
NO. of parsed sentences 103/239 (43.1%) 
No. of misparsed sentences 15/103 (14.6%) 
Table 2: TEST' Data Evaluation Results on the Lexicalized 
Semantic Grammar 
MUC-II-like sentences form data set TEST'. The results of the 
svstem evaluation on the data set TEST' are given in Table 2. 
" Table 1 shows that the grammar coverage for unseen data is 
about 35%, excluding the failures due to unknown words. Table 2 
indicates that even for sentences constructed to be similar to the 
training data, the grammar coverage is about 43%, again exclud- 
ing the parsing failures due to unknown words. The misparse 5 
rate with respect to the total parsed sentences ranges between 
8.7% and 14.6%, which is considered to be highly accurate. 
3 Incorporation of Syntactic Knowledge 
Considering the low parsing coverage of a semantic grammar 
which relies on domain specific knowledse, and the fact that the 
successful parsing of the input sentence ks a prerequisite for pro- 
ducing translation output, it is critical to improve the parsing 
coverage. Such a goal may be achieved by incorporating syn- 
tactic rules into the ~ammar while retaining lexical/semantic 
information to minim'ize the ambiguity of the input text. The 
question is: how much semantic and syntactic information is 
necessary? We propose a solution, as in (8): 
(8) 
(a) Rules involving verbs and prepositions need to be lexicalized to resolve the prepositional phrase attachment ambiguity, cf. 
(Brill and Resnik, 1993). 
(b) Rules involving verbs need to be lexicalized to prevent mis- 
arSing due to an incorrect subcategorization. 
) Domain specific expressions (e.g.z. nm in the MUC-II cor- 
pus) which frequently occur in phrases with omitted elements. need to be lexicalized. 
(d) Otherwise. relv on svntactic rules defined in terms of part- of-speech. " " 
In this section, we discuss typical misparses for the syntac- 
tic grammar on experiments in the MUC-II corpus. We then 
illustrate how these misparses are corrected by lexicalizing the 
grammar rules for verbs, prepositions, and some domain-specific phrases. 
3.1 Typical Misparses Caused by Syntactic 
Grammar 
The misparses we find in the MUC-II corpus, when tested on a 
syntactic grammar, are largely due to the three factors specified in (9). 
5The term misparse in this paper should be interpreted with 
care. A number of the sentences we consider to be misparses are 
t svntacuc mksparses, but "semanucallv anomalous. Since 
we are interested in getting the accurate interpretation in the 
given context at the parsingstage, we consider parses which are 
semantically anomalous to be misparses. 
(9) i. Misparsing due to prepositional phrase attachment 
(hereafter PP-attachment) ambiguity 
ii. Misparsing due to incorrect verb subcategorizations 
iii. Misparsing due to the omission of a preposition, e.g. 
i,~10 z instead of at I~10 z 
Examples of misparses due to an incorrect verb subcatego- 
rization and a PP-attachment ambiguity are given in Figure 2 
and Figure 3. respectively. An example of a misparse due to 
preposition omission is given in Figure 4. 
In Figure 2, the verb intercepted incorrectly subcategorizes for a 
finite complement clause. 
In Figure 3, the prepositional phrase with 12 rounds is u~ronglv 
attached to the noun phrase the contact, as opposed to the verb 
phrase vp_active, to which it properly belongs. 
Figure 4 shows that the prepositional phrase i,~i0 z with at 
omitted is misparsed as a part of the noun phrase expression 
hostile raid composition. 
3.2 Correcting Misparses by Lexicalizing Verbs, 
Prepositions, and Domain Specific Phrases 
Providing the accurate subcategorization frame for the verb in- 
tercept by lexicalizing the higher level category "vp" ensures that 
it never takes a finite clause as its complement, leading to the 
correct parse, as in Figure 5. As for PP-attachment ambiguity, lexicalization of verbs and 
prepositions helps in identifying the proper attachment site of the 
prepositional phrase, cf. (t3rill and Resnik, 1993), as illustrated 
in Figure 6. 
Misparses due to omission are easily corrected by deploying 
lexicalized rules for the vocabulary items which occur in phrases 
with omitted elements. For the misparse illustrated in Figure 3, 
utilizing the lexicalized rules in (10) prevents IJI0 z from being 
analyzed as part of the subsequent noun phrase, as in Figure 7. 
(10) a..time_expression b..gmt 
\[at\] numeric gmt z 
4 Experimental Results 
In this section we report two types of experimental results. One 
is the parsing results on two sets of unseen data TEST and 
TEST' (discussed in Section 2) using the syntactic grammar de- 
fined purely in terms of part-of-speech. Tl~e other is the parsing results on the same sets of data using the grammar which com- 
bines lexicalized semantic grammar rules and syntactic grammar rules. The results are compared with respect to the parsing cov- 
erage and the misparse rate. These experimental results are also 
compared with the parsing results with respect to the lexicalized 
semantic grammar discussed in Section 2. 
4.1 Experimental Results on Data Set TEST 
"-Total .No. of sentences i iii 
I No. of parsed sentences i 84/ili (75.7%) ', 
\[.No. of misparsed sentences 24/84 (29%) i 
Table 3: TEST Data Evaluation Results on the Syntactic 
G r am m ar 
I Total .No. of sentences i iIi i 
No. of parsed sentences i 86/III (77%) ! No. of misparsed sentences 9/86 (i0%) 
Table 4: TEST Data Evaluation Results on the Mixed 
Grammar 
In terms of parsing coverage, the two grammars perform equallv 
W -- -- * ell (around 76%). In terms of misparse rate, however, the gram- 
mar which utilizes only syntactic categories shows a much higher 
122 
'! I 
adver~ 
when 
t,~- : 
vverO 
(:let 
lntercepte~he 
nn_head 
range o~ 
prep 
sentence 
¢ull_parse 
statement 
predicate 
vp_actlve 
~Inlte_comp 
~Inlte_statement 
subject 
o_np 
PP 
q_np 
clet nn_i~esd ;:p r 
I prep ._~,p 
nn_head 
the alrcra?t :o enterpr lsewas 
lln~_comp complement 
¢L.np 
cardinal nn_head 
30 nm 
Figure 2: Misparse due to incorrect verb subcategorization 
subject 
i 
cl_np 
nn_head 
spencer 
sentence 
I 
?ull_parse I 
statement 
vver~ 
ensased 
preOicate \[ 
vp_active 
o_np 
det nn_heaa pp 
prep q_np 
cardlnal nn_nead 
the contact with 12 rounds o? 
prep 
PP 
cLnp 
nn_head pp 
prep q_no 
cardinal nn_head 
I I 
5-1rich at 3000 gOs 
Figure 3: Misparse due to PP-attachment ambiguity 
123 
Ii! • 
L,-: ' 
sentence \[ 
full_parse I 
fragmen~ I 
complement 
~..np 
possessive adjective 
z hostlle 
Oet 
I 
t 
1410 
F:~ " 
nn_heacl 
raid composition 
PP 
prep q-nD 
car'~ ~ na i nn_hearl 
I I 
of Ig aLrcraft 
Figure 4: Misparse due to Omission of Preposition 
pre_adJunct 3 
temporal_clause L 
when_clause det 
when statement l 
partiCipLai_~ 
I 
passive 
I 
vp_intercept. I 
vlntercept 
I 
when 
sentence 
i 
Pull_parse 
I 
statement 
subJect L 
q_np 
nn_head op 
prep q_np 
brace det nnhead pp 
prep q_np i 
en nn_hesd 
E 
intercspte~he range Of the aircraft to enterpPisewas 
lin~_comg complement 
I 
complement_rip 
quant~?~e~a_distance 
I I 
cardinal nautlcal_mLJ 
30 nm 
Figure 5: Parse Tree with Correct Verb Subcategorization 
124 
!! 
subject 
I 
q_np 
r.~_head dkr_object 
I 
vensase 
q_np wlth 
det nn_hesd 
spencer engsled the contact with 
mm 
sentence 
I 
¢ull_parse J 
statement 
predicate 
i 
vp_ensase 
I 
wlth_no 
~.nD 
cardinal nn_head PO 
pre~ ~_np 
I ' nn_heaO l 
12 rounds O~ 5-Inch 
!ocatlve_pp 
at o_no 
cardznal nn_hesd 
i 1 
t 
at 3000 Wds 
Figure 6: Parse Tree with Correct PP-attachment 
pre_adjunct 
I 
time_expression I 
gmt_tLme I 
numer~c_tlme 
cardinal gmt 
I I 
14tO z 
sentence 
t 
?uiL_parse I 
?ragment 
Complement 
I 
q_np 
adjective nn_head pp 
hostile ra id composi t ion 
n_o? q_np 
car~ Lna I nn_head 
I I 
0¢ Ig alrcra?t 
Figure 7: Corrected Parse Tree 
125 
rate of misparse (i.e. 29%) than the grammar which utilizes 
both syntactic and semantic categories (i.e. 10%). Comparing 
the evaluation results on the mixed grammar with those on the 
lexicalized semantic grammar discussed in Section 2, the parsing 
coverage of the mixed grammar is much higher (77%) than that 
of the semantic grammar (59.5%). In terms of misparse rate, 
both grammars perform equally well, i.e. around 9%. 6 
4.2 Experimental Results on Data Set TEST' 
Total No. of sentences I 281 I 
No. of sentences which parse 215/281 (76.5%) 
No. of misparsed sentences 60/215 (28%) 
Table 5: TEST' Data Evaluation Results on Syntactic 
Grammar 
I Total No. of sentences I 289 
No. of parsed sentences 236/289 /82%) 
No. of mlsparsed sentences 23/236 (10%) 
Table 6: TEST' Data Evaluation Results on Mixed Gram- 
mar 
Evaluation results of the two types of grammar on the TEST' 
data, given in Table 5 and Table 6, are similar to those of the 
two types of ~ammar on the TEST data discussed above. 
To summarize, the grammar which combines syntactic rules 
and lexicalized semantic rules fares better than the syntactic 
lgrcal.mm, mar or the semantic grammar. Compared with a lex- lzed semantic grammar, this grammar achieves a higher 
parsing coverage without increasing the amount of ambigu- 
ity/misparsing. When compared with a syntactic grammar, this 
grammar achieves a lower degree of ambiguity/misparsing with- 
out decreasing the parsing rate. 
5 System Engineering 
An input to the parser driven by a grammar which utilizes both 
syntactic and lexicalized semantic rules consists of words (to be 
covered by lexicalized semantic rules) and parts-of-speech (to be 
covered by syntactic rules). To accommodate the part-of-speech input to the parser, the input sentence has to be part-of-speech 
tagged before parsing. To produce an adequate translation out- 
put from the input containing parts-of-speech, there has to be 
a mechanism by which parts-of-speech are used for parsing pur- 
poses, and the corresponding lexical items are used for the se- 
mantic frame representation. 
5.1 Integration of Rule-Based Part-of-Speech 
Tagger 
To accommodate the part-of-speech input to the parser, we have 
integrated the rule-based part-of-speech tagger, (Brill, 1992), 
(Brill, 1995), as a preprocessor to the language understanding 
system TINA, as in Figure 8. An advantage of integrating a 
part-of-speech tagger over a lexicon containing part-of-speech in- 
formation is that only the former can tag words which are new 
to the system, and provides a way of handling unknown words. 
While most stochastic taggers require a large amount of train- 
ing data to achieve high rates of tagging accuracy, the rule-based 
eThe parsing coverage of the semantic grammar, i.e. 34.8%, 
is after discounting the parsing failure due to words unknown to 
the ~rammar. The reason why we do not give the statistics of the 
parsing failure due to unknown words for the syntactic and the 
mixed grammar is because the part-of-speech tagging process, 
which will be discussed in detail in Section 5, has the effect of 
handling unknown words, and therefore the problem does not arise. 
RULE-BASED \] I LANGUAGE I I LANGUAGE I PA RT-OF-SPEECI,-("~ UNDERSTANDiNGI-~ GENERATION I-'~ TEXT 
TAGGER I I TNA I I GENESIS I IOUTPUTI 
Figure 8: Integration of the Rule-Based Part-of-Speech Tag- 
ger as a Preprocessor to the Language Understanding Sys- 
tem 
tagger achieves performance comparable to or higher than that 
of stochastic taggers, even with a training corpus of a modest 
size. Given that the size of our training corpus is fairly small 
(total 7716 words), a transformation-based tagger is wellsuited 
to our needs. 
The transformation-based part-of-speech tagger operates in 
two stages. Each word in the tagged training corpus has an 
entry in the lexicon consisting of a partially ordered list of tags, 
indicating the most likely tag for that word, and all other tags 
seen with that word (in no particular order). Every word is first 
assigned its most likely tag in isolation. Unknown words are 
first assumed to be nouns, and then cues based upon prefixes, 
suffixes, infixes, and adjacent word co-occurrences are used to 
upgrade the most likely tag. Secondly, after the most likely tag 
for each word is assigned, contextual transformations are used to 
improve the accuracy. 
We have evaluated the tagger performance on the TEST Data 
both before and after training on the MUC-II corpus. The re- 
sults are given in Table 7. Tagging statistics 'before training' 
are based on the lexicon and rules acquired from the BROWN 
CORPUS and the WALL STREET JOURNAL CORPUS. Tag- ~ 
ing statistics 'after training' are divided into two categories, 
oth of which are based on the rules acquired from training data 
sets of the MUC-II corpus. The only difference between the two 
is that in one case (After Training I) we use a lexicon acquired 
from the MUC-II corpus, and in the other case (After Training 
II) we use a lexicon acquired from a combination of the BROWN 
CORPUS, the WALL STREET JOURNAL CORPUS, and the 
MUC-II database. 
Training Status 
Before Training 
After Tralnin ~ I 
After Trainin ~ II 
Ta~ging Accuracy 
1125/1287 (87.4%) 
1249/1287 /97%) 
1263/1287 (98%) 
Table 7: Tagger Evaluation on Data Set TEST 
Table 7 shows that the tagger achieves a tagging accuracy of 
up to 98% after training and using the combined lexicon, with 
an accuracy for unknown words ranging from 82 to 87%. These 
high rates of tagging accuracy are largely due to two factors: 
(1) Combination of domain specific contextual rules obtained by 
training the MUC-II corpus with general contextual rules ob- 
tained by training the WSJ corpus; And (2) Combination of the 
MUC-II lexicon with the lexicon for the WSJ corpus. 
5.2 Adaptation of the Understanding System 
The understanding system depicted in Figure 1 derives the se- 
mantic frame representation directly from the parse tree. The 
terminal symbols (i.e. words in general) in the parse tree are 
represented as vocabulary items in the semantic frame. Once we 
allow the parser to take part-of-speech as the input, the parts- of-speech (rather than actual words) will appear as the terminal 
symbols in the parse tree, and hence as the vocabulary items 
in the semantic frame representation. We adapted the system so 
that the part-of-speech tags are used for parsing, but are replaced 
with the original words in the final semantic frame. Generation 
can then proceed as usual. Figures 9 and (11) illustrate the parse 
tree and semantic frame produced by the adapted system for the 
input sentence 0819 z unknown contacts replied incorrectly. 
126 
I(£'- T 
F,:'F' 
H,9": 
pre_adjunct 
i 
time_expression i 
8mtmtlme 
I 
numeric_tlme 
caPdlnal gmt I 
0819 z 
sentence i 
Cull_parse 
i 
statement 
subject ! 
I 
q_np 
adjective nn,_head ) 
1 l 
) 
u~known contact 
predicate 
vp_repiy 
vrepiy adverb_phrase 
I 
adv 
replied ~n¢crrectlg 
Figure 9: Parse Tree Based on the Mix of Word and Part-of-Speech Sequence 
(11) 
{c statement :time_expression {p numeric_time 
:topic {q gmt 
:name "z" } :pred {p cardinal 
:topic "0819" } } :topic {q nn_head 
:name "contact" :pred {p --known 
:global 1 } } :subject 1 
:pred {p reply_v 
:mode "past" 
:adverb {p incorrectly } } } 
6 Summary 
In this paper we have proposed a technique which maximizes the parsing coverage and minimizes the misparse rate for machine 
translation of telegraphic messages. The key to the technique is 
to adequately mix semantic and syntactic rules in the grammar. We have given experimental results of the proposed grammar, 
and compared them with the experimental results of a syntac- tic grammar and a semantic grammar with respect to parsing 
coverage and misparse rate, which are summarized in Table 8 and Table 9. We have also discussed the system adaptation to 
accommodate the proposed technique. 
Grammar Type Parsing Rate Misparse Rate 
Semantic Grammar 34.8% 8.7% 
Syntactic Grammar 75.7% 29% 
Mixed Grammar 77% 10% 
Table 8: TEST Data Evaluation Results on the Three Types 
of Grammar 
Grammar Type Farsin~ Rate Misparse Rate 
Semantic Grammar 43.1% 14.6% 
Syntactic Grammar 76.5% 28% 
Mixed Grammar 82% 10% 
Table 9: TEST' Data Evaluation Results on the Three 
Types of Grammar 

References 
Eric Brill. 1992. A Simple Rule-Based Part of Speech Tagger. 
Proceedings of the Third Conference on Applied Natural Lan- 
guage Processing, A CL, Tcento, Italy. 
Eric Brill. 1995. Transformation-Based Error-Driven Learning 
and Natural Language Processing: A Case Study in Part-of- 
Speech Tagging. Computational Linguistics, 21-4, pages 543- 
565. 
Eric Brill and Philip Resnik. 1993 A Rule-Based Approach 
to Prepositional Phrase Attachment Disambiguation. Techni- 
cal report, Department of Computer and Information Science, 
University of Pennsylvania. 
James Glass, Joseph Polifroni and Stephanie Seneff. 1994. Mul- 
tilingual Language Generation Across Multiple Domains. Pre- 
sented at the 1994 International Conference on Spoken. Lan- 
guage Processing, Yokohama, Japan. 
Ralph Grishman. 1989. Analyzing Telegraphic Messages. Pro- 
ceedings of Speech and Natural Language Workshop, DARPA. 
Stephanie Seneff. 1992. TINA: A Natural Language System for 
Spoken Language Applications. Computational Linguistics, 
18:1, pages 61-88. 
Beth M. Sundheim. Navy Tactical Incident Reporting in a 
Highly Constrained Sublanguage: Examples and Analysis. 
Technical Document 1477, Naval Ocean Systems Center, San 
Diego. 
Clifford Weinstein, Dinesh Tummala, Young-Suk Lee, Stephanie 
Seneff. 1996. Automatic Engish-to-Korean Text Translation 
of Telegraphic Messages in a Limited Domain. To be presented 
at the International Conference on Computational Linguistics 
'96. 
