A Three-level Revision Model for Improving 
Japanese Bad-styled Expressions 
Yoshihiko HAYASHI 
NTT Network Information Systems Laboratories 
1-2356, Take, Yokosuka, Kanagawa, 238-03, Japan 
E-mail : hayashi@nttnlz.ntt.jp 
Abstract 
This paper proposes a three-level revision model for 
improving badly-styled Japanese expressions, especially 
in the field of technical communication. The model is a 
mixture of the regeneration-based model and tile 
rewriting-based model. The first level divides tong 
sentences, while the second level improves several 
badly-styled expressions with iterative partial rewriting 
operations. The last level performs regeneration, in 
which word ordering and punctuation to reduce tile 
reading ambiguity are currently involvod. Expelimental 
results show that our model is effective in realizing 
practical revision support systems. 
1 Introduction 
It is well known that "revision is a lmge part of the 
writing process "\[ 1 \]. To provide computational aids for 
revision, several style-checkers and revision support 
systems have been developed\[2\],131, ltowever, few 
systems trove the capability of providing alternatives 
for the expression determined to be badly styledl41. In 
addition, these systems simply show the alternative 
expressions, the user nlust rewrite the original senteuce 
while referring to the suggested expressions. 
We have developed a prototype of our sentence- 
level Japanese revision support system callexl REVISE- 
SI5\]. In the system, the user can improve his/her 
sentences by simply selecting the most appropriate 
alternative from the candidates that the system 
generates. 
This paper proposes a three-level revision model 
for improving badly-styled Japanese expressions. We 
focus on the field of technical communication. All 
architecture of the revision support system based on the 
model is presented. Experimental results are shown that 
prove the effectiveness of the prototype system and tbe 
validity of tile proposed model. 
2 Computational Aids to Revision 
2.1 Targets of the Computational Aids 
Misldmal6\] has summarized the key conditions for 
efficient technical communication via texts as follows: 
(1) The reader must be able to easily understand the 
text (Easy-understanding). 
(2) The readers must be able to correctly understaod 
the text (Correct-understanding). 
(3) The contents of tile text nmst meet tile ieader's 
purposes. 
Front this viewpomt+ the task of revision is text 
rewriting or regenerating to make the original text 
satisfy these conditions. The last condition is too hard 
to support computatiooatly; however, the first two 
conditions are promising because we can apply natural 
language processing technologies. Therefore, we 
concentrate on tile first two conditions in designing the 
colnputer~assisted revision systen/. 
2.2 Revising as Regeneration or Rewriting 
Is there just one computational revision model which is 
suitable for use?. Two models illustrated in Fig.l are 
the two extrenles and will give the basis for 
constructing u practical model: one is tile Regeneration 
based model (a) and the other is the Rewriting-based 
model (b). 
/6'resml,tiv~ '--~ 
Ik (((;¢:tleratio,I (}rltllll\]l~u' 
Fig.1 Two Basic Revision Models. 
The regeneration-based model is a strong model; 
namely, if all of its components are perfectly 
constructed, the output text will be understandable and 
contain no badly-styled expressions. IIowever, if some 
component is incomplete, there is a possibility of 
some inpet text being fatally flawed. Therefore, for 
this model to be practical, as a minimum, the 
following two major problems olust be overcome: 
(1) The Analyzer must correctly capture the 
intermediate representation of the input text at a 
certain processing level. 
ACl'ES DE COLING-92, NANTEs, 23-28 nor\]r 1992 6 6 5 PUOC. OV COLING-92, NANTES, AUG. 23-28, 1992 
(2) The Generator must be equipped with the 
complete set of prescriptive generation grammar. 
The first problem is serious, especially in revision 
support systems, and hard to be overcome. This is 
because input text may contain badly-styled expressions 
that prevent correct computational analyses. Moreover, 
a solution to the second problem is also problematic, 
because no perfect sets of prescriptive generation 
grammar have been developed to date. Furthermore, 
even if such a set could be developed, single-pass 
generation has been pointed out to have many 
drawbacks in producing optimal texts/1 \]. In spite of 
these problems, the regeneration-based model is crucial 
for offsetting the weaknesses of the rewriting-based 
model. 
In the rewriting-based model, on the other hand, 
the original text is iteratively rewritten to improve each 
badly-styled expression that has been detected; only 
detected expressions are revised. This means that the 
rewriting-based model is a weak but practical model. 
Even if the set of revision rules is incomplete, the 
revision process will not destroy the original text 
entirely; the worst case is that the revision will be 
insufficient. Moreover, if the set of revision rules 
successfully cover numerous badly-styled expressions, 
it is expected that the system can achieve good 
performance. Thus, if most of the considered style 
improvements can be handled with this model, we 
should combine it with the regeneration-based model. 
3 Classification of Japanese Badly- 
styled Expressions 
It is obvious front the previous discussion that the 
effectiveness of the rewriting-based model depends on 
how many style improvements can be described as 
individual revision rules. Thus we must investigate 
what patterns of expressions should be considered to be 
badly-styled, especially in the technical 
communications field, and determine how many of 
them can be improved by revision rules. 
Table.1 Classificalion of Typical Japanese Bad-styled Expressions. 
Correct- 
Item Understanding (C) Linguistic Level 
(Example) , Easy- (Scope) 
Jnderstanding (E) 
Too Long Complex Sentences C = E Sentence 
Unclear Inter-clause Connective Expressions ~y b'~).L, ~k#~,~T~o C<E TwoClauses 
Operation Directing Expressions with Rever~ Step 
Partial Prohibitory Expressions C < E Two Clauses 
Improper Voices (lrans./intrans, passive/active) 
7" t~ y" ~ ~, ~ l~@'c" ~ ~" ~,,o C < E Clauses 
Double Negatives ~ff'~ ~' ¢,¢ ~,' ~ ~ a:~ ~,o C < E Two Clauses 
Ambiguous Negatives with Comparing Expression C < E Clauses 
Ambiguous Negatives with Quantifier 
~'~69 7 7 4 )l~7~\[J~J'~ -~ Ca: ~,~ ~ ~ 1;~, -- -- -- C < E Clauses 
Conditional Expressions with Negated Antecedent aud 
Negated Consequence C < E Two Clauses 
Violated Concord Expressions ( Adjective and Predicate ) 
b~ L ~ fi't~ ~:- It ~j L ~ -£ o C = E Clauses 
Violated Concord Expressions ( Subject and Predicate ) 
Light Verb Expressions 
Ambiguous Modification Structures 
C<E 
C<E 
Sentence 
Clauses 
C>E Sentences 
General (G)/ Improvable 
Technical with Revision 
Writing (T) Rule 
G N 
T 
T 
T 
G 
T 
T Y 
T 
T 
G 
T 
T 
G N 
A~ DE COLING-92, NANTES. 23-28 AoL'r 1992 6 6 6 PROC. OF COLING-92, Nx~rrEs, AUG. 23-28. 1992 
To investigate these issues, we have classified 
typical sentence-level Japanese badly-styled 
expressions. The classification was mainly used 
examples fiom several books on technical 
writingl6\],lT\] as well as general writingl81. Textual 
data from published manuals on computer systems was 
also investigated. The result is briefly outlined in 
Table.1. The viewpoints for classification are: 
(1) Whether the item affects easy-understanding or 
correct-utlderstanding? 
(2) In which linguistic structure does tbe item occur? 
(3) Is the item general or peculiar to technical 
writing? 
(4) Can the item be improved with an individual 
revision rule? 
The investigation showed that itelns peculiar to 
technical writing mainly affect easy-understanding, 
while general items principally affect correct- 
understanding. In addition, most of the items peculiar 
to technical writing can be improved by the application 
of discrete revision rules. Fig.2 exemplifies a revision 
rule for a typical badly-styled expression pectdiar to 
technical writing; the expression directs the user's 
actions, but the actions ,are described in reverse order. 
We can identify most badly-styled expressions peculiar 
to technical writing by referring to particular partial 
syntactic structure patterns. As shown irt Fig.2, such 
patterns allow bad-styles to be detected and rewritten. 
Therefore, it is valid to adopt the rewriting-based model 
as the center component of our model. 
Type of the expression: Directing the user's operation 
ensiun verb/ i ~Z'~)) 
Fig.2 A Partial Rewriting Operation as 
Structural Conversion with 
Lexical Operations. 
4 The Three-level Revision Model and 
the Prototype System 
4.1 The Model 
The previous section has shown that tile rewriting- 
based model is applicable for most of the style 
improvements peculiar to technical writing. Table.l, 
however, shows that there are a couple of items which 
are poorly handled by the model. 311ey are: 
(a) excessively long complex sentences, and 
(b) ambiguous modification structures. 
These items cannot be detected and corrected by the 
particular revision rules, because they do not have 
unique syntactic patterns. These errors cannot be 
characterized by particular words aud/or particular 
linguistic attributes such as, tnodality, tense, etc.. Thus 
these badly-styled expressions cannot be easily corrected 
with the particular structural conversion operations. 
We are proposing a three-level revision model 
which combines the rewriting-based and regeneration- 
based models. The first level is for dividing excessively 
long complex sentences and is based on the 
regeneration-based model at the morphological level. 
Tile second level is for improving several badly-styled 
expressions and is based on tile rewriting-based model. 
Tbe third level is lor syntactic/semantic level 
regeneration, in which word ordering and punctuating to 
reduce tile uumber of structural ambiguities are 
involved. 
Our model is a three-level sequential model. 
ltere, the order of the components has the following 
computational significance: 
(1) As shown in 5.1, excessively long complex 
sentences can be identified and divided with 
morphological level informationl9 I. 
(2) If long sentences arc divided at the early stage of 
the total process, processing loads for the remaining 
operations are significantly reduced. 
(3) The style improving process should precede the 
syntactic/semantic level regeneration process, 
because tile regeneration process should stmt with a 
well-formed synlactic/sentantic structure. 
4.2 Issues in Improving Style 
Most style improvements can be realized by sequential 
application of the revision rules, However, there are 
two major design issues. One is how to feedback the 
result of each rewriting operation to the initially 
produced analysis results. The other is the handling of 
structural ambiguity. That is. if the ambiguity is not 
elinrinated, combinatorial explosion is inevitable in 
many aspects of the system. On tile other hand, overall 
structural disambiguation is compotationally expeusive 
due to processes such as selrlanlic analysis and context 
analysis. Moreover, uniform application of these 
processes violates one of the basic requirements of any 
writing aid; that is, it is unacceptable to incur high 
computational costs by processing good expressions 
that require no revision. 
We have three approaches to deal with these 
issues: 
(1) First, we detect all of the potential bad styles while 
accepting structural ambiguity. Each bad style is 
connected to an associated partial rewriting operation 
specified by its pattern. These operations are defined ill 
a rule-base, so that the detection process is the 
activation of these rules. 
(2) We then try to apply activated rules under an 
expectation-driven control strategy. That is, file system 
schedules tile order of rule applications using a priority 
that reflects how important tile rewriting operation is in 
improving the sentence. The scheduled application of a 
rule initiates the structural disambiguation of the 
applicable expression. 
(3) During the revision process, internal data, such as 
that generated by morphological or syntactic analyses 
and by the bad-style detection pnv.:ess, varies as a result 
of the p~u-tial rewriting operations. To avoid duplicative 
ACRES DE COLING-92, NANTES, 23-28 AO~' 1992 6 6 7 PROC. OF COLING-92, NANTES. AUG. 23-28, 1992 
analysis and detection, we accurately know what has 
been revised, and ensure the consistency of the internal 
data with respect to the revision. This scheme solves 
the feedback problem mentioned before. 
4.3 The Architecture of the Prototype 
System 
Figure.3 shows the architecture of the prototype system 
REVISE-S based on the three-level revision model and 
the above design principles. 
I Morphological 1 
Analyzer I 
I Se~icl Diy!d"r. -~- (\[e~:: nee Dividing ) 
Dia~gnoser Ilt.'"" (R:!~o: i?n Rules~ 
-~ Revision Process Controller \[ 
i." I~ /iDa a Co s ste Ic~l 
i I R°"d'°r I-~ '~'Ir-'--# t~anager 
1 
The Morphological Analyzer divides the 
sentence string into word sequences. At this time, basic 
operational units (called 'Bunsetsu') are recognized. The 
sentence dividing algorithm in the Sentence Dividor 
utilizes the result of the morphological analysis, and is 
outlined in 5.1. The sentence dividing process is 
recursively invoked until each divided sentence satisfies 
some predefined condition that prevents further 
division. 
Next, the Syntactic Analyzer finds all possible 
binary relations between modifier Bunsetsu and 
modified Bunsetsu. The result is represented in a 
network called a Kakari-Uke network which represents 
all possible syntactic structure intensionally. 
The Diagnoser, which utilizes the detection 
counterpart of the revision rule, finds all possible 
badly-styled expressions. The result semi-fires the 
conversion counterpart of the associated revision rule 
and constructs the agenda which lists the semi-fired rule 
instances. The Revision Process Controller sequences 
the successive execution of partial rewriting operations, 
and the Data Consistency Manager maintains 
consistency between the current sentence string and the 
internal data during the dynamic rewriting process. 
Finally, the regeneration process is invoked to 
generate a sentence with less reading ambiguity. 
5 Generating Alternative Expressions 
Each component in our revision model generates 
alternative expressions for the user. This section gives 
a brief outline of the generation of alternative 
expressions in each level component. 
5.1 Dividing Long Sentences 
Before dividing a long complex sentences, first the 
component must decide whether the sentence should be 
divided or not. If the sentence is so determined, then, 
the component must identify the division point. The 
top level clause boundary indicates the division point. 
Finally the divided sentences must be generated. These 
processes can be conducted with morphological level 
information; that is, they do not require full syntactic 
parsing or any semantic interpretation. 
In the first step, the decision is made with a 
discriminate function that computes the weighted sum 
of the number of characters, the number of Bunsetsus 
and the number of predicates (verbs and adjectives), etc.. 
Weighting coefficients and the threshold value for 
decision were determined through experiments. 
t~f~ b¢~"6 t. -~'~-~c b7"~o (a) 
Top Level Clause Boundary 
(b-l) 
The process advances while saving the result. 
Thus the result remains, even if error occurs. 
(b-2) 
The result remains, even if error occurs. 
Because the process advances while saving the result. 
Fig.4 An Example of the Sentence Division. 
The second step roughly analyzes the iutra- 
sentence connective structurel9l and produces a shallow 
level intermediate representation as illustrated in 
Fig.4(a). The key to this process is the inter-predicates 
dependency relation analysis which utilizes a set of 
dependency rules. These rules are based on the 
classification of predicate expressions (including modal, 
tense, aspectual suffixes) in terms of the strength in 
forming connective structures. One significant point in 
the process is that the connective structure must not be 
tully disambiguated, because the main purpose of the 
analysis is identification of the division point; namely, 
there are cases where the division point can be uniquely 
identified, nevertheless the connective structure is 
ambiguous. 
AcI'~ DE COLING-92, NA~, 23-28 AOt~ 1992 6 6 g FROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
The final step generates tile divided sentences 
string by applying generation rules to tile intemmdiate 
representation. Fig.4(b) gives tbe generated alternatives 
((b-1),(b-2)) for the example in Fig.4(a). In the process, 
ordering of the divided sentences aml choice of the 
conjunctive expression which provide cohesion between 
divided sentences are major considerations. In Fig.4(a), 
two tOll level clauses are connected with a causal 
relation. Thus associated conjunctive expressions 
(underlined in Fig.4(b)) are generated according to tile 
aheruatives in sentence ordering. To determine which is 
better, contexual processing is required; howevei the 
determination is currently left to the user's selection. 
5.2 Rewriting through Partial Structural 
Conversions 
Main stream of the algorithm in style improvement 
component is smnmarized in Fig.5. Tile rest of this 
subsection briefly introduces topics in each step (details 
are given in \[5\]). 
Detect all Possible Bad-styled Expression ; 
Construct the Agenda and 
the Revision Process Manager ; 
while (1") do 
Select an unmarked rule instance with the highest 
pdodty from the agenda ; 
if there is no such rule instance then 
break ; 
Test its presupposition ; 
if the presupposition holds then { 
Apply the associated partial rewriting operation ; 
if the operation succeeds then 
Analyze the difference and 
maintain data consistency ; } 
Mark the instance as "done" ; 
end while ; 
Fig.5 Main Stream of the Algorithm in 
Style Improvement Component. 
Detection of Badl_v-stvled Ext~ressions 
The Diagnoser detects badly-styled expressions liom the 
Kakari-Uke network which contains all detectable 
syntactic structures. The process is the semi-firing of 
the partial rewriting rules, because each detected badly- 
styled expression is associated with a rewriting rule 
specified by the type of the bad-style pattern. 'Semi- 
firing' means that some of the focused rules are 
deactivated later in response to on-demand structural 
disambiguation or partial rewriting. From the 
computational viewpoint, the detection process should 
be regarded as a sort of feature extraction process. This 
allows the diagnosis process to be realized as an 
interpretation of the data-flow network; namely, the 
terminal node finally activated indicate which associated 
badly-st~,led expression has been detected and node own 
data provides justification. 
~onstructing Agenda and Revisit~tl process Manaeer 
The rules semi-fired through the process described in 
the previous section are instantiated based on their 
justifications. The instances are then placed on the 
agenda, These justifications specify tile partial syntactic 
structures concerned with the detection patterns. 
Therefore, these are presuppusitions to tile application 
of tile associated rewriting operations. 
A justification is represented as a conjunction of 
predicales for ulodification relations between two 
Bunsetsus (called tile Kakari-Uke condition) and 
predicate on the Bunsetsu properties (called the 
Buosctsu property condition). For instance, the 
conjunctive formula stated below is thc justification of 
the detection pattern shown in Fig.2. 
+iTlt~ll~;i(irl _ve'rb(X) A Ib0f0~0 NP(Y) A 
~dxrecLive (?,) A moc\]i ~y (X, Y) A lllodi \[y (Y~ Z) 
Tlle literal of lhe formula is called a primitive 
condition. Litcrals nlust be neated as a sort of 
assumption, because all of tbenl have the possibility of 
becoming unsatisfied due to structural disambiguation 
and/or partial rewriting operations. 
The Revision F'rocess Manager lot managing the 
presuppositions is constructed at the sanle tinle as the 
agenda. It bolds a list of Bunsetsu property conditions 
and a list of the Kakari-Uke conditions. The data 
structure is suitable for nlanaging all presuppositions 
systematically because rule instances lhat sllare the 
same primilive condition are inlnlediately found 
through a data slot wtfich is indexed by the primitive 
conditions, and contains pointers to the rule instances. 
EXp¢¢tation-Driverl Control and Ott-d¢~ 
Disambiguation 
The priorities preassigned 1o the instances on tbe 
agenda sequence the successive application of partial 
rewriting operations witbin the revision process. \]'hat 
is, important rewriting operations arc assigned high 
priority values, and are scheduled to for earlier 
application, even if their presuppositions are not 
confirmed prior to their application. To actually apply a 
scheduled rewriting operation, its presupposition is 
tested first, At this tinle, the Disambiguator which 
involves tile application el heuristic disambiguation 
rules and/or user-interactions is inw~ked, aod tile 
minimum range of structural ambiguities is resolved in 
expectation of applying the scheduled rewriting 
operation. 
Partial Rewriting~ 
If the presupposition is confirmed to be satisfied, the 
associated partial rewriting operation is applied. Before 
commencing any partial rewriting operation, a sub- 
network concerned with scope of the rewriting is first 
extracted fi'om the Kakari-Uke network according to the 
given scope name such as 'simple sentence' and 'noun 
phrase', etc.. Secood, the extracted sub-network is 
converted into a sel of dependency trees, wberein each 
element is an explicitly represented possible syntactic 
structure. Third, tile partial rewriting ride defined by the 
structural conversion with the lexical operations is 
applied to lhe trees. Alternative expressions are 
generated from rule application. A partial rewriting 
operation is completed by user selection or rejection of 
the generated expressions. The partial dependency trees 
giving tbe selected partial expression are then convened 
to the sub-network agaiu, and restored in the Kakari- 
ACRES DE COLJNG-92, NANTES, 23-28 ho~r 1992 6 6 9 P~.oc. OF COLING-92. NANTES, AUG. 23-28. 1992 
Uke network. This process is iterated until the agenda 
has no more applicable rule instances. 
Maintenance of Data Consistencv with Cmtstraint 
er_erouagali~ 
Because the structural disambiguation and the partial 
rewriting operations affect internal data, the system 
must maintain the consistency of internal data 
whenever these operations are invoked, Brand-new 
information may be obtained as a result of the invoked 
operations, i.e., the acceptance/rejection of some 
modification or the change of some Bunsetsu structure. 
The new information can be considered as newly added 
constraints so that data consistency can be maiutained 
by propagating these constraints to the dependent 
intemal data. 
For instance, if the Revision Process Manager is 
notified by the Difference Analyzer that a particular 
Bunsetsu no longer has a certain property according to a 
particular partial rewriting operation, the rule instances 
which share the condition are immediately deactivated. 
Another typical example of the constraint propagation 
is created by structural disambiguation. If some Kakari- 
Uke relation is confirmed by the Disambiguator, 
exclusive Kakari-Uke relations are rejected at this time. 
This causes the deactivation of the rule instances which 
have these rejected Kakari-Uke relations as their 
primitive conditions, 
5.3 Word Ordering and Punctuating as 
Regeneration 
Appropriate word ordering and punctuating help reduce 
the ambiguity in reading. Furthermore, it increases 
readability. In Japanese, however, word order is 
relatively free at the sentence constituent level and there 
are no strict grammatical restrictions on punctuation. 
Thus optimal word order and punctuation can not be 
decided only with syntactic information; reading and 
writing preferences must be considered. 
Our regeneration algorithm takes the syntactic 
structure (dependency tree structure) as its input and 
regenerates a new syntactic structure with less reading 
ambiguities. The algorithm employs the following 
heuristics based on the preferences in word ordering\[ 10J: 
(1) Constituents which include the thematic marker 
(post position 'ha') are put at the head of the 
sentence, and punctuation marks are put after them. 
(2) Punctuation marks are placed on clause 
boundaries. 
(3) Heavier constituents (containing more 
Bunsetsus) are made to precede light constituents on 
the same syntactic level. 
The algorithm first determines the constituent which 
includes the thematic marker. The constituent is 
positioned at the head of the sentence and a punctuation 
marker (Japanese comma) follows it. Next, the 
punctuating mark is added to the Bunsetsu which 
indicates the top level clause boundary. Then, at each 
syntactic level, constituents are sorted by their weight. 
Of course, the initially located constituents that include 
thematic markers are not moved by this constituent 
sorting operation. Finally, if the regenerated sentence 
string differs from the original, it is submitted for user 
confirmation. 
Figure.6 gives an example. In this example, B2 
is the Bunsetsu that contains the thematic marker and 
B5 indicates the top-level clause boundary. According 
to tile regeneration algorithm, the segment (B1-B2) is 
placed at the head of the sentence and the Japanese 
comma is added. The segment (B4-B5) precedes 
segment (B3) because of its weight (two Bunsetsus) and 
anolher comma is added, 
B1 B2 B3 B4 B5 B6 
This device will work manually, if the automatic-mode has 
been canceled. 
Fig.6 An Example of the Regeneration. 
6 Evaluation 
An evaluation experiment to show the effectiveness of 
the prototype system and the validity of the proposed 
revision model was made by using 113 sentences taken 
from published manuals and constructed examples. The 
points for evaluation were how much the the system 
contributes to easy-understanding and correct- 
understanding. 
6.1 Readability 
There is no established way to evaluate 
understandability of texts. In this paper, we treated 
understandability as roughly equivalent to readability, 
because readability is encompassed by 
understandability. 
The readability measure used in the experiment 
was proposed by Tateishi,et.al\[ 11 \] for Japanese texts. 
The method computes the readability with the 
following formula which utilizes surface level 
information. The term RS' indicates the readability; 
higher values indicate the text is more readable. The 
coefficients were determined through statistical analyses 
to normalize the mean value to 50 and the standard 
deviation to 10. 
RS' = -0.12 x Is - 1.37 x la + 7.4 x Ih - 23.18 x Ic 
- 5.4 x lk -4.67 x cp + 115.79 
These terms are, Is: length of tile sentences, 
la:mean length of alphabetical characters run, lh: 
mean length of Hiragana characters run, lc: mean 
length of Kanji character runs, Ik: mean length of 
Katakana character runs, cp: mean number of 
commas per sentence, 
The system increased the RS' value by 42.5 to 
49.0. This means that the readability was increased by 
the system. Sentence division and punctuation were the 
main contributors to this improvement. 
6.2 Structural Ambiguity 
It is also difficult to quantitatively estimate correct- 
understanding. In this paper, we estimate the level of 
correct-understanding from the structural ambiguity, 
because structurally ambiguous sentences/expressions 
AcrEs DE COLING-92, NAmXS, 23-28 not~ 1992 6 7 0 PROC. OF COLING-92, NANTEs, AU6.23-28, 1992 
obviously degrade correct-understaoding, ltowever, 
measuring systematic\[121 or reading ambiguity witt~ 
algorithms is still a difficult problem. Thus we use 
computational ambiguity to approximate systematic 
ambiguity. The Japanese dependency structure analyzer 
developed by Shirai\[13\] was used tot this purpose. 
The original texts led to 18.4 analyses per 
sentence on average. After the texts were corrected by 
the prototype system, only 7.9 analyses were produced. 
This means that the system successfully reduced tbe 
amount of structural ambiguity. 'FILe major contributors 
to this improvement were sentence division and word 
ordering. Style improvements leading tn drastic changes 
in the syntactic structure also contributed to tbis 
improvement. 
Incidentally, after revision, only 4.9 possible 
syntactic structures remained per sentence on average 
within the internal data of the system. This is a fair bit 
less than the result from the reanalysis of the revised 
text. Thus where tile revised text is processed further 
(for instance, translation, summarization), the use of 
the internal data will help to reduce the effect of 
disambiguations on the remaining processes. 
6.3 Validity of the Model 
The validity of the proposed revision model was not 
directly evaluated in the experiments, itowever, the 
validity of the component order is evident, because 
structural ambiguity is continuously reduced with each 
processitlg step. If the style improvement component 
preceded the sentence division component, the structural 
conversion processes to improve the badly-styled 
expressions would handle numerous fruitless syntactic 
structures and generate too many inappropriate 
alternative expressions. Moreover, if the 
syntactic/semantic regeneration component l)receded the 
style improvement component, each structural 
conversion rule would be constructed as to preserve the 
word order and punctuation marks; this would afli~ct the 
writability of the rules. 
7 Concluding Remarks 
This paper has proposed a three-level revision model for 
improving badly-styled Japanese expressions aod 
introduced a prototype revision support system based 
on the model, Experimental results show that the 
system successfully improves the readability of texts 
and reduces tire contained structural ambiguities. The 
three-level model effectively realizes a practical revision 
support system. 
However, a remaining requirement from the real 
technical writing field is that expert knowledge from 
technical writers should be accumulated to cover a wider 
variety of badly-styled expressions. In addition, 
contextual information must be handled, both for 
providing contextually adequate alternative expressions 
and lbr improving contextually-poor expressions and 
rhetorical structures. 
The proposed model and the prototype system ~s 
its embodiment, will give a powerful foundation to 
some other applications, including intellectual tutoring 
systems and pre-editing systems for machine- 
translation. 
Acknowledgments 
The author wishes to exteod his gratitude to Gen-ichiro 
Kikui, who developed the rewriting rule application 
mechanism and to Eiji Takeishi, for his contributions 
in developing the sentence division module. Thanks are 
also due to the members of the Message Processing 
Systems l,aboratory Ibr their helpful discussions, 

References 

Ill Vaughan,MM. and McDooald.D.D. (19881 A 
Model of Revision in Natural l,anguage Generation 
System, t'roc, t~f the 26th Annual Meeting of the 
Association for Computational Linguistics, 90-96. 

121 Thurmair,G. (1990) Parsing for Grammar and Style 
Checking, Proc. of the 13th International ConJ~rence 
on Computational I,inguistics, 365-370. 

131 Richardson,S.D. and Borden-Harder, L.C. (19881 
"File Experience of Developing a l,arge-scale Natural 
Language Text Processing System: CRITIQUE, Proc. 
of the 2nd Conference on Applied Natural Language 
Processing. 195-202. 

141 Hakomori,S. et at. (19881 A Correction System for 
Japanese Text (in Japanese). IPS, I SIG-NL,65-7, 

\[ 5 \[ l layashi,Y. (1991 ) Improving Bad Japanese Writing 
Styles through Partial Rewriting Operations, Proc.of 
the Natural Language Processing PaciJic Rim 
Symposium, 30-37. 

\[6\] Mishimu,l I. (I 99(/) Technical Writingjor Engineers 
and Students (in Japanese), Kynuritsu-Shuppan, Tokyo. 

\[7J Technical Communication Associates (FAs.) (1988) 
An Exciting Stylebook j?)r Ducumentation (in 
JapmLese), NIKKEI-BP, Tokyo. 

18\] lwafuchi,E. (Eds.) (19881 Bad styles tile 3rd FAition 
(in Japanese), Nihon-llyouronsya, Tokyo. 

\[9\] Takeishi,E. and llayasbi,Y. (19901 A Method to 
Decide Division Points of Japanese Complex Sentences 
(in Japanese), Proc. of the 4th Annual ConJe.renee of 
JSAI, 9-6, 

\[10l Saeki,T. (1975) Word Order in Modern Japanese 
(in Japanese), Kasama-syoin, Tokyo. 

\[11 \] Tatcishi,K. et al, (1988) Derivation of Readability 
l.'ormula of Japanese Texts (in Japanese). IPSJ SIG- 
DPHI, 18-4. 

\[121 Hindle,D. and Rooth.M. (1991) Structural 
Ambiguity and Lcxical Relations, Proc. of the 29th 
Annual Meeting of the ACL, 229-236. 

I 131 Shirai,S. (19871 Table-driven Japanese Phrase 
Dependency Analysis in Japanese-to-Englush 
Translation System ALT-J/E (in Japanese), 7'he 34th 
Anmtal Convension IPS Japan, 5W~5. 
