An Interactive Rewriting Tool for Machine Acceptable Sentences 
Hideki Hirakawa*, Kouichi Nomura**, Mariko Nakamura** 
* Toshiba Corporation, R&D Center ** Toshiba Corporation, Tokyo System Center 
1, Komukaitoshiba-cho, Saiwai-kn, 3-22, Kata-machi, Fuchuu, 
Kawasaki 210, Japan Tokyo 183, Japan 
hirakawa@islxde.toshiba.co.jp { nomura, marl }@docs.sc.fuchu.toshiba.co.jp 
Abstract 
This paper proposes an interactive rewriting tool 
for supporting people in creating machine acceptable 
sentences. The experiment and evaluation of this tool 
conducted when applied to a pre-editing support tool 
in a Japanese-to-English machine translation 
(henceforth, MT) system are also described. 
1. Introduction 
Supporting people in creating machine acceptable 
sentences is one important issue of future NL~ appli- 
cations. Agents in computers will be able to provide 
people with a variety of services by grasping the con- 
tents of documents. An intelligent mail server, for 
example, can read mails and send appropriate replies 
semi or fully automatically if the mails are machine 
acceptable. This is a natural extension of the ideas 
explored in the object lens(Malone, Grant, Lai, Rao 
and Rosenblitt, 1986). 
However, it is difficult for humans to write in 
machine acceptable forms because of gaps between 
humans and machines. First, hnmans use semantics, 
context and common sense in interpreting a sentence, 
while limited resources are available to machines. Sec- 
ond, unlike machines, humans cannot simulate 
sentence analysis proce~ in a machine to check all pos- 
sibilities. As such, man-machine corporation would be 
one solution with the least burdens on humans. 
Prior to a general-purpose rewriting tool, we have 
been developing a pre-editing support tool for an MT 
system(Hirakawa,Nogami and Amano,1991). Since 
our central concern is to analyze human-computer 
interaction for creating machine acceptable sentences, 
MT systems for bilingmd and monolingnal users are 
closely related to our tool. Considering that the dif- 
ficulty of interactive systems lies in "so many 
interactions for a single sentence,"(Hutehins and 
Somers, 1992) we need a framework which produces 
the best result with least interactions. 
2. Interactive Rewriting Tool 
2.1 Design Principles 
The problems of rewriting include difficulties in 
learning its know-how and unpredictable effects of 
rewriting. The following are our design principles: 
1) Interaction-based system: Since rewriting is a 
dynamic process, the process should be interactive to 
deal with changes. 
2) Presentation of rewriting candidates: The system 
should present possible recommended revisions to 
users so that they can select the best choice and guar- 
antee the correctness of the rewriting. 
3) Minimization of interactions: To minimize fre- 
quencies of interactions in rewri6ng, the system 
should fully utilize knowledge available to improve 
the accuracy of the diagnoses. 
4) Optimization of interactions: Scales to measure the 
degree of rewriting(scalability) should be introduced 
to optimize interactions to obtain the maximum 
effects with minimum interactions. 
2.2 System Configuration 
The system consists of two main parts: a sentence 
checker and a user interface unit 
The sentence checker is composed of a sentence ana- 
lyzer, an information extractor and a rewriting 
candidate generator. The information extractor 
extracts the information necessary for rewriting, such 
as morphological and syntactic information, from the 
analysis results of an NIT system, to detect problem- 
atic phrases and generate recommended rewriting 
examples along with guidance messages to help users 
with rewriting. 
The user interface displays the original sentences 
which require correction with their recommended 
telne~II~.TL, tz\[~. g~-~Z'7-~X~F (~CTRL+\] zyt.~3E 
\[Input Sentence \[ \[ which needs rewritin~ 
~r~,:~,~ ~" sl-lg~ -\] 
~-~:. Choices for 
2o~b=r. rewriting 
/ 
,-I'~1"~, I- : r~;,-~ctions g ~e~ ~o~ ~ ~ Items to 
~,~t-c < r~,. rewriting be detected 
~£~. ~ ~ Guidance tz~,. co~B~:~o)~ messages 
Fig. 1 User Interface Unit Screen 
  
 207 
rewriting examples and guidance messages as in Fig. 
1. When there are several problems in one sentence, 
they are presented to users in order of importance. 
Knowledge for rewriting is accumulated as rules 
for the information extractor, which is actually a gen- 
eral-purpose information extracting tool-kit equipped 
with overall linguistic analyzing modules and infor- 
mation extracting/diagnosing modules. 
3. Knowledge for Rewriting Long Sentences 
Of those Japanese expressions that need rewriting 
before machine translation, long sentences which 
should be divided into shorter ones are the most 
important, as shown in previous studies(Kim, Ehara 
and Aizawa, 1992). 
3.1 Criteria for Detecting Long Sentences 
It is empirically known that simple factors, such 
as the number of characters or words in a sentence, are 
not sufficient to determine which sentences should be 
rewritten. Currently, we adopt both the number of 
words and the linguistic patterns to identify long sen- 
tences. This combined algorithm provides the 
precision ratio of 52% and the recall ratio of 96% for 
closed data. The two ratios improved by 9% and 6% 
respectively, compared with the case when the number 
of words in a sentence is the only determining factor. 
3.2 Generation of Rewriting Candidates 
The rewriting rules also generate candidates for 
rewriting expressions. There are four methods of sen- 
tence division: 
1) Simple division (60%): A sentence is divided into 
two at the division point inflecting the ending of 
the in'st sentence and inserting an appropriate con- 
nective at the beginning of the second sentence, 
where necessary. 
2) Supplementation of case Idlers (27%): A case, 
such as the subjective case, is supplemented after 
sentence division. 
3) Supplementation of verbs (7%): Verbs are supple- 
mented after sentence division. 
4) Others (6%): 
Of the above, method 1) has been implemented and 
method 2) is under development using semantic trees 
provided by the information extractor. 
4. Experiment and Evaluation 
The rewriting tool has been evaluated from two 
aspects: the knowledge which forms the basis of our 
pre-editing rules and the tool as a whole, including 
the interface. 
4.1 Evaluation of Rewriting Rules 
We have carried out an experiment on the evalua- 
tion of our rewriting knowledge using a new test 
text(211 sentences). The experiment showed almost 
the same result(62% for the precision ratio and 90% 
for the recall ratio) as is in Section 3.1. 
To evaluate the generation of rewriting candidates 
in terms of rewriting positions and rewriting expres- 
sions, the precision ratio and the recall ratio of these 
two have been calculated for the target text, focusing 
on the sentences for which the tool produced rewrit- 
ing candidates (i.e., type 1) in Section 3.2. 
The precision ratio and the recall ratio for identi- 
fying division points are 63% and 100% respectively; 
the ratios for the rewriting candidates are 10.5% and 
93% respectively. The recall ratios for both division 
points and rewriting expressions exceed 90%, which 
means that the probability of obtaining the correct 
positions and candidates are sufficiently high. The pre- 
cision ratio for substitution generation is low, but 5 
to 6 candidates per division point would not be much 
of a burden on the user. 
4.2 Evaluation of the Operating Cost 
A preliminary experiment was made on five sub- 
jects to evaluate an overall appraisal, including the 
interface, the rate of reduction in operation time and 
improvement in tim quality of rewriting. 
The rate of reduction in rewriting time for the 
five subjects averages 23%. The recall ratio went up 
from 83% to 96%, the precision ratio from 96% to 
99%. For those subjects whose recall ratio scored 
high when the tool was not used, the rate of reduc- 
tion in time also tends to be high. For others whose 
recall ratio is low when the tool was not used, the 
rate of reduction in time does not change much, but 
the recall ratio improved by far. That is, users capa- 
ble of rewriting without outside help can further 
shorten the total time using the tool. Moreover, 
those with low rewriting skills can benefit from the 
tool to improve rewriting quality. 
5. Conclusion 
We have proposed an interactive rewriting tool 
featuring the rewriting candidate generation capabili- 
ties to support people in creating machine acceptable 
sentences. This tool has been applied to MT pre-edit- 
ing in a Japanese-to-English MT system and showed 
promising results in the experimental evaluations. 

References 

Hirakawa, H.; Nogami, H.i and Amano, S. (1991). 
"EJ/JE Machine Translation System ASTRANSAC- 
Extensions toward Personalization." Proc. of MT 
SUMMIT III. 

Hntchins, W. J. and Somers, H. L. (1992). An Intro- 
duction to Machine Translation. ACADEMIC 
PRESS, pp. 154-155, pp. 324-325. 

Kim; Y. B.; F.hara, T.; and Aizawa, T. (1992). 
"Breaking Long Japanese Sentences Based on Morpho- 
logical Information."(in Japanese) Proc. of IPSJ 44th, 
vol. 3. 

Malone, T. W.; Grant, K. R.; Lai, K.; Rao, R.; and 
Rosenblitt, D. (1986). "Semi-Structured Messages are 
Surprisingly Useful for Computer-Supported Coordi- 
nation." Proc. of Computer-Supported Cooperative 
Work. 
