Word Order Acquisition from Corpora 
Kiyotaka Uchimoto I, Masaki Murata t, Qing Ma*, 
Satoshi Sekine*, and Hitoshi Isahara t 
tCommunications Research Laboratory 
Ministry of Posts and Telecommunications 
588-2, Iwaoka, Iwaoka-cho, Nishi-ku 
Kobe, Hyogo, 651-2492, Japan 
\[uchimoto ,murata, qma, isahara\] @crl. go. jp 
*New York University 
715 Broadway, 7th floor 
New York, NY 10003, USA 
sekYne~cs, nyu. edu 
Abstract 
In this paper we describe a method of acquiring word 
order fl'om corpora. Word order is defined as the or- 
der of modifiers, or the order of phrasal milts called 
'bunsetsu' which depend on the stone modifiee. The 
method uses a model which automatically discovers 
what the tendency of the word order in Japanese is 
by using various kinds of information in and around 
the target bunsetsus. This model shows us to what 
extent each piece of information contributes to de- 
ciding the word order mid which word order tends to 
be selected when several kinds of information con- 
flict. The contribution rate of each piece of informa- 
tion in deciding word order is eiIiciently learned by a 
model within a maximum entropy framework. The 
performance of this traiimd model can be ewfluated 
by checking how many instances of word order st- 
letted by the model agree with those in the original 
text. In this paper, we show t, hat even a raw cor- 
pits that has not been tagged can be used to train 
the model, if it is first analyzed by a parser. This 
is possible because the word order of the text in the 
corpus is correct. 
1 Introduction 
Although it is said tha~ word order is free in 
Japanese, linguistic research shows that there art 
certain word order tendencies -- adverbs of time, for 
example, tend to t)recede subjects, mM bunsetsus in 
a sentence that are modified by a long modifier tend 
to precede other bunsetsus in the sentence. Knowl- 
edge of these word order tendencies would be useful 
in analyzing and generating sentences. 
Ii1 this paper we define word order as the order of 
nrodifiers, or the order of bunsetsns wlfich depend on 
the same modifiee. There arc several elements which 
contribute to deciding the word order, and they are 
summarized by Saeki (Saeki, 1.998) as basic condi- 
tions that govern word order. When interpreting 
these conditions according to our definition, we era: 
summarize them ,~ tbllows. 
Componentlal eonditlons 
• A bunsetsu having a deep dependency tends 
to precede a bunsetsu having a shallow depen- 
dency. 
When there is a long distance between a modifier 
and its modifiee, the modifier is defined as a bun- 
setsu having a deep dependency. For example, 
the usual word order of modifiers in Japanese 
is tlm following: a bunsetsu which contains an 
interjection, a bunsetsu which contains an ad- 
verb of time, a bunsetsu which contains a sub- 
ject, and a bunsetsu which contains an object. 
Here, the bunsetsu containing an adverb of time 
is defined as a bunsetsu having deeper depen- 
dency than the one containing a subject. We 
call the concept representing the distance be- 
tween a modifier and its modifiee the depth of 
dependency. 
A bunsetsu having wide dependency tends to 
precede a bunsetsu having narrow dependency. 
A bunsetsu having wide dependency is defined 
as a bunsetsu which does not rigidly restrict its 
modifiee. For example, the bunsetsu "~btqlo_c 
(to Tokyo)" often depends on a bunsetsu whicll 
contains a verb of motion such as "ihu (go)" 
while the bunsetsu "watashi_.qa (I)" can depend 
on a bunsetsu which contains any kind of verb. 
Here, the bunsetsu "watashi_ga (I)" is defined as 
a bunsetsu having wider dependency than 1;11o 
tmnsetsu ':Tok~./o_c (to Tokyo)." We call the 
concept of how rigidly a modifier restricts its 
modifiee the width of dependency. 
Syntactic conditions 
• A bunsetsu modified by a long inodifier ton(Is to 
precede a bunsetsu modified by a short lnodifier. 
A long modifier is a long clause, or a clause that 
contains many bunsetsus. 
• A bunsel, su containing a reference pronoun tends 
to precede other bunsetsus in the sentence. 
• A bunsetsu containing a repetition word tends 
to precede other bunsetsus in the sentence. 
A repetition word is a word referring to a word 
in a preceding sentence. For example, Taro 
mid Hanako in the following text are repetition 
words. "Taro and Hanako love each other. Taro 
is a civil servant and Hanako is a doctor." 
• A bunsetsu containing the case marker "wa" 
tends to precede other bunsetsus in the sentence. 
A mnnber of studies have tried to discover the rela- 
tionship between these conditions and word order in 
871 
Japanese. Tokunaga and Tanalca proposed a model 
for estimating JaI)anese word order based on a dic- 
tionary. They focused on the width of dependency 
(Tokunaga and Tanal~a, 1991). Under their model, 
however, word order is restricted to the order of case 
elements of verbs, and it is pointed out that the 
model can deal with only the obligatory case and 
it cmmot deal with contextual information (Saeki, 
1998). An N-gram model fbr detecting word order 
has also been proposed by Maruyama (Maruyama, 
1994), but under this model word order is defined as 
the order of morpheines in a sentence. The problem 
setting of Maruyama's study thus differed fl'om ours, 
and the conditions listed above were not taken into 
account in that study. As for estimating word or- 
der in English, a statistical model has been proposed 
by Shaw and Hatzivassiloglou (Shaw and Hatzivas- 
siloglou, 1999). Under their model, however, word 
order is restricted to the order of premodifiers or 
modifiers depending on nouns, and the model does 
not simultaneously take into account many elements 
that contribute to determining word order. It would 
be difficult to apply the model to estimating word 
order in Japanese when considering the many condi- 
tions as listed above. 
In this paper, we propose a method for acquiring 
from corpora the relationship between the conditions 
itemized above and word order in Japanese. The 
method uses a model which automatically discovers 
what the tendency of the word order in Japanese is 
by using various kinds of information in and around 
the target bunsetsus. This model shows us to what 
extent each piece of information contributes to decid- 
ing the word order and which word order tends to be 
selected when several kinds of information conflict. 
The contribution rate of each piece of information in 
deciding word order is efficiently learned by a model 
within a maximum entrot)y (M.E.) framework. The 
performance of the trained model can be evaluated 
according to how many instances of word order se- 
lected by the model agree with those in the original 
text. Because the word order of the text in the corpus 
is correct, the model can be trained using a raw co> 
pus instead of a tagged corpus, if it is first analyzed 
by a parser. In this paper, we show experimental re- 
sults demonstrating that this is indeed possible even 
when the parser is only 90% accurate. 
This work is a part of the corpus based text gen- 
eration. A whole sentence can be generated in the 
natural order by using the trained model, given de- 
pendencies between bunsetsus. It could be helpful 
for several applications such as refinement support 
and text generation in machine translation. 
2 Word Order Acquisition and 
Estimation 
2.1 Word Order Model 
This section describes a model which estimates the 
likelihood of the appropriate word order. We call 
this model a word order model, and we implemented 
it within an M.E. framework. 
Given tokenization of a test corpus, the problem 
of word order estimation in Japanese can be reduced 
to the problem of assigning one of two tags to each 
relationship between two modifiers. A relationship 
could be tagged with "1" to indicate that the order 
of the two modifiers is appropriate, or with "0" to in- 
dicate that it is not. Ordering all modifiers so as to 
assign the tag "1" to all relationshit)s indicates that 
all modifiers art in the appropriate word order. The 
two tags form the space of "futures" in the M.E. 
formulation of our estimation problem of word or- 
der between two modifiers. The M.E. model, as well 
as other similar models allows the computation of 
P(flh) for any f in the space of possible futures, F, 
and for every h in the space of possible histories, H. 
A "history" in maximum entropy is all of the condi- 
tioning data that enable us to make a decision in the 
space of futures. In the estimation problem of word 
order, we could reformulate this in terms of finding 
the probability of f associated with the relationship 
at index t in the test cortms as: 
P(flht) = P(fl hfformation derivable 
from the test corpus 
related to relationship t) 
The computation of P(flh) in any M.E. models is 
dependent on a set of "features" which should be 
hdpful in making a prediction about the flmlre. Like 
most current M.E. models in computational linguis- 
tics, our model is restricted to features which are 
binary functions of the history and future. For in- 
stance, one of our features is 
1. : if has(h,x) = true, 
x = "Mdfl'l - Head- 
g(h,f) = POS(Major) : verb" (1) 
&f=l 
0 : otherwise. 
Here "has(h,x)" is a binary flmction which returns 
true if the history h has feature z. We focus on the 
attributes of a bunsetsu itself and on the features 
occurring between bunsetsus. 
Given a set of features and some training data, 
the maximum entropy estimation process produces a 
model ill which every feature .qi has associated with it 
a parameter ai. This allows us to compute the con- 
ditional probability as follows (Berger et al., 1996): 
ag~ (h .f) 
P(/Ih)- 1L ' (2) 
Z (h) 
ct i . (3) 
Y i 
The maximum entropy estimation technique guaran- 
tees that for every feature gi, the expected value of 
gi according to the M.E. model will equal the empir- 
ical expectation of gi in the training corpus. In other 
words: 
P(h,/). Mh, f) 
h,f 
= (4) 
h / 
Here /5 is an empirical probability and l~Ie is the 
872 
Table l: Example of estimating the probabilities of word orders. 
• ) I"I~{H (yesterday) / ¢=x~ (tennis) / ~(f~l~l$ (Taro) / bLo (l)It~yed.)" /I~,,'~<~,~ x llail.-)'nxt, x 1.7:-.x,~j~m ::: 0.6 x 0.8 x 0.3 0.144 
) \["NI!Iii:k (Taro)) I~1!11 (yesterday) / ¢:LX ~ (tennis) / bt:° (played.)" IP:mi~ ~H x I~*H,)-: x,,< x t±r,l~.¢::x,: := 0.4 x 0.8 × 0.7 0.224 
\[")kl'!l~l:~ (Taro) / -Y ::x'~ (tennis) / 'a~H (yesterday) / b£:o (plowed.)" I/)~:,,~ll x tg.=.x'~.,~H x P,k~a*~L-)::×~ =: 0.4 x 0.2 x 0.7 0.05(i 
1"9:-':;¢ ~ (tennis) / II~{H (yesl.erd;ty) / >kfl\[~l~: (~l'&l'O) / \[,~:.° (played.)" \[13*lll,2<I,u:l. x l:':.xt, jall x l!).:.x.~,.j<r~m :: 0.6 x 0.2 x 0.3 0.036 
\["5:=x ~ (tennis) / )kf~lll:t (Taro) / I~I~Ft (yesterday) / blz0 (played.)" ~¢H x l~;.:.xr, ~,~ZI± x l¢y:.x~l~,~ =: 0.4 x 0.2 x 0.3 0.024 
prol)ability assigned by the M.E. model. 
We detine a word order model as a model which 
learns the at)l)ropriate order of each pair of nlodifiers 
which depend on the same modifiee. 'l'his model is 
derived from Eq. (2) as follows° Assmne that there 
are two bunsetsus 231 and 23~ which depend on the 
buusetsu B and that It is the information derivable 
from the test corpus. \]?lie probability that "B\] B2" is 
the at)propriate order is given by the following equa- 
tion: 
\]iik: :1 ffi(l ,b) (~'i ,i 
where .qi(1 < i < k) is a fl,atm'e and "1" indicates 
that the order is at)propriate. The terms cq,i and 
(~0,i are estimated fl'oln a eorl)us which is nlorpho- 
logically and syntactically analyzed. When there are 
three or more b\]msetsus that det)end on tit('. S}tlne 
? moditiee, the probability is estimated as follows: I or 
~'t bunsetsus 231, 232, ..., 23n which depend on the 
bmtsetsu B and for the information h derivaMe from 
the test corpus, the prot)ability t;hat "23\] 23~ ... 23," 
is the at)propriate order, or P(llh), is represented ass 
the probability that every two bunsetsus "Bi ~i-Fj 
(1 _< i < n - 1,1 < j < 'n - i)" are the appropri~ 
ate. order, or P({14~i,i_l.j = l\[l< i <. n-- 1, l :< j < 
n--i}lh), ilere "I4Li+j -- l" represents that ".l~i 23i-Fj" 
is the appropriate order. Let us assume that 
every 14Q,~:+j is independent each other. Then 1)(1 Ih,) 
is derived as follows: 
2'(lib) = \]1\]. i ,.- \], 
1 < .i _< ,,,- i}lh) 
n--| n-i 
i--I j--1 
= ll 1I j), 
i=l j=l 
where \[Si,i+ j is the information derivable when fb- 
cusing on the bunsetsu 13 m~d its modifiers 13i and 
Bi+j. 
For example, in the sentence "I~ U (kinou, yester- 
day) / :kflll ~ (Taro_wa, Taro) / -P ~ x ~ (tcnnis_wo, 
tennis) / b t:o (sita., l)layed.)," where a "/" repre- 
sents a bunsetsu boundary, there are three bunset- 
sus that depend on the verb "b ~: (sita)." We train 
a word order inodel under the assmnl)tion that the 
orders of three t)airs of modifiers -"I~l U" and "~ 
f$1.~," "Net\] " and "¢ 7-:7, ~ ," and ":kl~l*l~" and "5: 
m :7, ~" .... are al)ttropriate. We use various ldnds of 
intormation in and around the target bunsetsus as 
features. For example, the information or the feature 
that a noun of time i)recedes a t)rot)er noun is deriv- 
able fl'om the order "IP} H (yesterday) / Y;fll~ I~ (Taro) 
/ b 1=o (pl~\yed.)," and the feature that a case fol- 
lowed by a case marker "w£' precedes a case followed 
by a caqe marker "wo" is derivable from the order ":~ fll~ It. ( Taro_wa, 
Taro) / ¢ ~ 7. ?k (tennis_wo, tennis) / 
b 2C_o (sita., t)layed.)." 
2.2 Word Order Estimation 
This section describes the algorithm of estimating 
the word order by using a trained word order model. 
The word order estimation is defined as deciding 
the order of ntoditiers or bunsetsus which depend 
on the same modifiee. The input of this task con- 
sists of modifiers and informat, ion necessary to know 
whether or not features are found. The output is 
the order of the inodifiers. We assume that lexical 
selection in each bunsetsu is already done and all 
del)endencies in a sentence are found. The informa- 
tion necessary to know whether or not features are 
found is morphological, syntactic, semmltic, and ('on- 
textual information, and the locations of bunsetsu 
bonndaries. The features used in our ext)eriments 
are described in Section 3. 
Word order is estimated in the following steps. 
Procedures 
1. All possible orders of modifiers are found. 
2. For each, the probability that it is apt)ropriate 
is estimated by a word order model, or Eq. (6). 
3. The order with the highest probability of 1)eing 
approl)riate is selected. 
l)br example, given the sentence "1~ U (kinou, 
yesterday) /:kfilIl~ (Taro_wa, Taro) /¢::x ~ (tcn- nis_wo, 
temfis) / b t:o (sita., played.)," tim modi- 
tiers of a verb "b ?:_ (played)" are three tmnsetsus, 
"l~ U (yesterday)," ":k~/i ~ (Taro)," "¢ = x ~ (ten- 
nis)." Their apt)ropriate order is estimated in the 
following steps. 
1. The probabilities that the orders of the three 
pairs of modifiers "N- LI " and ":is: BII l~ ," "I~ 
U" and "¢~:7,~," and "~fllIl±" and "¢ 
c.x ~" are appropriate are estimated. As- 
sume, for example, ~-H ,;k~l~ta, PrI~ It ,¢ :-~. ~, and 
P;kfzlIla,~ ca ~ are respectively 0.6, 0.8, and 0.7. 
2. As shown in Table 1, probabilities are estimated 
for all six possible orders. The order "I~ U / :k 
fill IS / -7- ~- y. ~ / b \]Co ," which has the highest 
probability, is selected as the most apt)ropriate 
order. 
2.3 Performance Evaluation 
The pcrformancc of a word order model can be eval- 
uated in the following way. First, extract from a 
test corpus bunsetsus having two or more modifiers. 
Then, using those 1)unsetsus and their modifiers as 
873 
Data 
~Jtlnsetsll lJtlnsetstl nHIYiber Of babel 
nlllllber modifier 
0 1 P 
1 5 
2 3 
3 4 
4 5 P 
5 
Table 2: Example of modifiers extracted fl'om a corpus. 
Modifiers (Bunsetsu number) 
Strings in a bunsetsu 
>kflli ~ ( Taro_to, Taro and) 
~Y'{a ( lIanako_to, llanako) 
-Y- = x q~ (tennis_no, tennis) 
~lc. (sial_hi, tournament) 
lll'C, (dete,, participate,) 
~@ b t=, (yusyo_sita., won.) 
Moditiers whose modiliee is the bunsetsu 
in the left column. 
~,~ a (0) 
¢=x0~ (2) 
~a (0) ~-r-~a 0) '~: (a) ~e (o) ~¢-~* (1) If'~ (4) 
input, estimate the orders of the modifiers as de- 
scribed in Section 2.2. The percentage of the modi- 
flees whose modifiers' word order agrees with that in 
the original text then gives what we call the agree- 
ment rate. It is a measure of how close the word 
order estimated by the model is to the actual word 
order in the training corpus. 
We use the following two measurements to calcu- 
late the agreement rate. 
Pair of modifiers The first measurement is the 
percentage of the pairs of modifiers whose word 
order agrees with that in the test corpus. For 
exmnple, given the sentence in a test corpus "N 
kl (kinou, yesterday) / ~tIla (Taro_wa, Taro) 
/ -7- = 2` ~2 (tennis_wo, tennis) / t. ~:o (sita., 
played.)," if the word order estimated by the 
model is "~ H (yesterday) / -7" -- 2. ~ (tennis) / 
~1~ ~:~ (Taro) / b too (played.)," then the or- 
ders of the pairs of modifiers in the original sen- 
tence are "N H / ;k~l~ ~ ," "15 H / -7- =- :7, ~ ," and 
"~lit:~ / ~--2` ~ ," and those in the estimated 
word order are "~H / -~---2`~," "~H / 
1~1~ la~ ," and "Y- = 2` ~ / %:t~ll lak ." The agreement 
rate is 67% (2/3) because two of the three orders 
are the same as those in the original sentence. 
Complete agreement The second measurement is 
the percentage of the modifiees whose modifiers' 
word order agrees with that in the test corpus. 
3 Experiments and Discussion 
In our experiment, we used the Kyoto University text 
corpus (Version 2) (Kurohashi mid Nagao, 1997), a 
tagged corpus of the Mainichi newspaper. For train- 
ing, we used 17,562 sentences from newspaper arti- 
cles appearing in 1995, from January 1st to Jmmary 
8th and from Jmmary 10th to June 9th. For testing, 
we used 2,394 sentences fl'om articles appearing on 
January 9th and from June 10th to June 30th. 
3.1 Definition of Word Order in a Corpus 
In the Kyoto University corpus, each bunsetsu has 
only one modifiee. When a bunsetsu Bm depends on 
a bunsetsu Bd and there is a bunsetsu /3p that de- 
pends on and is coordinate with \])d, Bp has not only 
the information that its modifiee is \]~d but also a la- 
bel indicating a coordination or the information that 
it is coordinate with B d. This information indirectly 
shows that the bunsetsu Bm can depend on both \]3p 
and Bd. In this case, we consider Bm a modifier of 
both Bv and B d. 
Under this condition, modifiers of a bunsetsu B 
are identified in the following steps. 
1. Bunsetsus that depend on a bunsetsu B are clas- 
sifted as modifiers of B. 
2. When B has a label indicating a coordination, 
bunsetsus that are to tile left of 13 and depend on 
the same modifiee as B are classified as modifiers 
of B. 
3. Bunsetsus that depend on a modifier of B and 
have a label indicating a coordination are clas- 
sifted as modifiers of B. The third step is re- 
peated. 
When the above procedure is completed, all bunset- 
sus that coordinate with each other are identified as 
modifiers which depend oi1 the same nmdifiee. For 
example, from the data listed on the left side of To- 
ble 2, the modifiers listed in the right-hand column 
are identified for each bunsetsu. "Nt~I; ~ (Taro_to, 
Taro and)," "?~g-~ IS (Hanako_to, Hanako)," "ql "(, 
(dete,, participate,)" are all identified as modifiers 
which depend on the same modifiee "~ b 7=° 
(yusyo_sita., won.)." 
3.2 Experimental Results 
The features used in our experiment are listed in Ta- 
bles 3 and 4. Each feature consists of a type and 
a value. The features consist basically of some at- 
tributes of the bunsetsu itself, and syntactic and con- 
textual information. We call the features listed in 
Tables 3 'basic features.' We selected them man- 
ually so that they reflect the basic conditions gov- 
erning word order that were sunmmrized by Saeki 
(Saeki, 1998). The features in Table 4 are combina- 
tions of basic features ('combined features') and were 
also selected manually. They are represented by the 
nmne of the target bunsetsu plus the feature type of 
the basic features. The total number of features was 
about 190,000, and 51,590 of them were observed in 
the training cortms three or more times. These were 
the ones we used in our experiment. 
The following terms are used in these tables: 
Mdfrl, Mdfr2, Mdfe: The word order model de- 
scribed in Section 2.1 estimates the probability 
that modifiers are in the appropriate order as 
the product of the probabilities of all pairs of 
modifiers. When estimating the probability tbr 
each pair of modifiers, the model assmnes that 
the two modifiers are in the appropriate order. 
Here we call the left modifier Mdfrl, the right 
modifier Mdfr2, and their modifiee Mdfe. 
Head: the rightmost word in a bunsetsu other than 
those whose major pro't-of-speech I category is 
1Part-of-speech categories follow those of JUMAN (Kuro- 
hashi and Nagao, 1998). 
874 
Table 3: Basic features. 
Basic features 
Feature values (Number of type) Feature type tegors\] Target 1)tmsetsus 
1 Mdfrl, Mdfr2, 
Mdfe 
2 Mdfrl, Mdfr2, 
Mdfe 
3 Mdfrl, Mdfr2, 
Mdfe 
4 Mdfrl, Mdh'2, 
Mdfe 
mm 
\[\]ead-POS(Major) 
\[tead-POS(Minor) 
\[lead-hff(Major) 
\[Iead-Inf(Minor) 
\[Iead-SemFeat(110) 
\[1ead-SemFeat(111 ) 
\[Iead-SemFeat(433) 
5 Mdfrl, Mdff2, rype(String) 
Mdfe rype(Major) 
type(Minor) 
6 Mdfrl, Mdfr2, lOSIlll(String) 
Mdfe lOSIIIl(Minor) 
lOSII12(String) 
lOSUI2(Minor) 
7 Mdfrl, Mdfr2, Period 
Mdfe 
8 Mdfrl, Mdfr2 Numl)erOfMdfrs 
Mdfe NumberOfMdfrs 
9 Mdfrl, Mdfr2, 
Mdfe 
10 Mdfl'l, Mdfr2 
Coordination 
(Total : 90) 
Mdfrl-MdfrType-ll )to-Mdfr2-Typc 
Mdfl'2-MdfrTypeql )to-Mdfl-I -q'ype 
Mdfi'l -Md frType-lDto- Md fr2-MdfrType 
11 Mdfrl, Mdfr2, Rel)etition-llead-l,ex 
Mdfe Repetition-Md fr-lleadq,ex 
12 Mdfrl, Mdfr2 ReferencePronoun 
i~efereneePronou n (String) 
',.%066) 
~u (verb), s~u (adjective), ~,'~ (noun) .... (11) 
~'~:~ (common noun), m'~ (quantifier) .... (24) 
~1~ (vowel verb) .... (30) 
'.'~ (stem), t~*~ (fandamental form) .... (60) 
rrue (1) 
true (1) 
true (1) 
:~, :a, <-u<, t:u, ~, ~=, t .... (7:3) 
uJ:~ (post-positional particle), ... (43) 
¢,~JJ'~ (e~use marker), ~*$ (imperative form) ... (102) 
'~',5, ~<', a~, ,., l,~. .... (63) 
)ill\], ~$m (ease marker), ... (5) ~, ~, *, ~,*,, ... ((~3) 
~,~;~1 (ease marker) .... (41 
\[nil\], \[exist\] (2) 
A(0), B(1), C(2), 1)(3 or more) (4) 
A(2), B(:3), C(4 or more) (3) 
P(Coordim~te), A(Apposition), I)(otherwise) (3) 
\]'rue, False (2) 
rrue, False (2) 
true, False (2) 
86.65% 73.87% \[--0.79%) 
(--1.54%) 
87.07% 75.03% 
',--O.37%) (--0.38%) 
87.39% 75.20% ',-0.05%) (-0.m%) 
87.21% 75.20% 
\[--0.23%) (--0.21%) 
84.78% 70.03% (-2.66%) (-5.38%) 
87.32% 75.14% 
(-0.12% (--0.27%) 
87.39% 7~ 
(-0.05%) (+0.13%) 
87.14% 74.86% 
(-0.30%) (-0.55%) 
87.40% 70.30¢./o 
(-0.04%) (-0.0~%) 
8~.2~% 73.61% 
(--1.18% (--1.80%) 
87.34% 75.09% 
(--0.10%) (--0.'32%) 
\[nil\],\[exist\] (2) 87.31% 75.id% 
\[nil\], \[exist\] (2) (--0.13%) (--0.27%) 
\[nil\], \[exist\] (2) 87.27% 75.12% 
:~, :a, :~.~,~=.~,, ~ ..... (42) (--0.17%) (--0.29%) 
'%} @ (special marks)," "112 N (1)ost-posi~ioual 
particles)," or "}~N~? (suffixes)." 
Head-Lex: the fllndalnental forth (unintlected 
forln) of the head word. Only words with a fre- 
quency of tlve or more are used. 
Head-Inf: the inflection type of a head. 
SemFeat: We use the upper third layers of bunrui 
.qoihyou (NLl/I(National Language Research In- 
stitute), 19641 as semantic features. Bunrui goi- 
hyou is a Japanese thesaurus that has a tree 
structure and consists of seven layers. The tree 
has words in its leaves, and each word has a fig- 
ure indicating its category number. For exam- 
ple, the figure in parenthesis of a feature "Head- 
SemFeat(ll0)" in Table 3 shows the upper three 
digits of the category number of the head word 
or the ancestor node of the head word in the 
third layer in the tree. 
Type: the rightmost word other than those whose 
major part-of-speech category is "~@ (special 
marks)." If the major category of the word 
is neither "NJN (post-positional particles)" nor 
"}~/~'~ (suffixes)," and the word is inflectable, 2 
then the type is represented by the inflection 
type. 
JOSHI1, JOSHI2:JOSHI1 is the rightmost post- 
positional particle in the bunsetsu. And if there 
are two or more post-positional particles in the 
bunsetsu, JOSHI2 is the second-rightmost post- 
positiolml particle. 
NmnberOfMdfrs: number of modifiers. 
2The inflection types follow those of J UMAN. 
Mdfrl-MdfrType, Mdfr2-MdfrType: Types of 
tile modifiers of Mdfi'l and Mdfr2. 
X-IDto-Y: X is identical to Y. 
Repetition-Head-Lex: a ret)etition word allpear- 
ing ill a preceding senteuce. 
ReferencePronourl: a reference pronoun appear- 
ing in the target bunsetsu or ill its modifiers. 
Categories 1 to 6 ill Table 3 reI)resent attributes 
in a bunsetsu, categories 7 to 10 represent syntac- 
tic information, and categories 11 and 12 represent 
contextual information. 
The results of our experiment are listed in Table 5. 
The first line shows tlle agreement rate when we esti- 
mated word order for 5,278 bunsetsus that have two 
or more modifiers and were extracted from 2,394 sen- 
tences al)pearing on Jmmary 9th and from June 10th 
to June 301tl. \Ve used bunsetsu boundary informa- 
tion and syntactic and contextual information which 
were derivable froln the test corpus and related to 
the input bunsetsus. As syntactic ilffOrlnation we 
used dependency inforlnation, coordinate structure, 
and information on whether the target bunsetsu is at 
the eM of a sentence. As contextual information we 
used the preceding sentence. The values in the row 
labeled Baseline1 in Table 5 are the agreement rates 
obtained when every order of all pairs of modifiers 
was selected randolnly. And values in the B&seline2 
row are the agreement rates obtained when we used 
the following equation instead of Eq. (5): 
freq(w12) 
PMu'(llh) = freq(w12) + frcq(w21)" (7) 
875 
Table 4: Combined features. 
Accuracy without 
the feature 
Pair of Complete 
modifiers :tgreement 
87.23% 74.65% (-0.21%) (-0.76%) 
Combined features 
-- Twin tbatures 
(Mdfr 1-Type, Mdfr2-Type), 
(Mdfr 1-Type, Mdfe-IIead-Lex), 
(Mdfr 1-Type, Md fe-ttead-POS), 
(Mdfr 1-Type, Mdfr 1-Coordlnrttion), 
(Mdfr 1-Type, Mdfr2-MdfrType-IDto-Md fr 1-2"ypc), 
(MdfrE-Type, Mdfe-Head-Lex), 
(Mdfrg-Type, Mdfe-Head-POS), 
(Mdfr2-Type, Mdfr2-Ooordination), 
(Mdfr2-Type, Md fr 1-MdfrType- lDto-Mdfr2-Type), 
Mdfr 1-Head-Lex, Mdfe-Period), 
Mdfr 1-nead-POS, Mdfe-Perlod), 
Mdfr 1-1tead-POS, Mdfr 1-Repetltlon-Head-I,ex), 
Mdfr2-Itead-Lex, Mdfe-Petlod), 
Mdfr2-Ite,~d-POS, Mdfe-Perlod), 
Mdfr2-IIead-POS, Mdfr2-Itepetltlon-Ilead-Lex) 
't~iplet tcatures 87.22% i 74.86% 
Mdfrl-Wype, Mdfr2-Type, Mdfe-llead-Lex), (--0.220./0) (--0.55%) 
Mdfrl-Type, Mdfr2-Type, Mdfe-Head-POS), 
Mdfrl-Type, Mdfrl-Coordinatlon, Mdfe-Type), 
MdfrE-Type, Mdfr2-Coordlnatlon, Mdfe-Type), 
Mdfrl-JOSHI1, Mdfrl-JOSHI2, Mdfe-IIead-Lex), 
Mdfrl-aOSHll, Mdfrl-JOSHI2, Mdfe-IIead-POS), 
Mdfr2-JOSHI1, Mdfr2-JOSIII2, Mdfe-Head-Lex), 
Mdfr2-aOSH\[1, Mdfr2-JOSl~12, Mdfo-llead-POS) 
All of above combined features 85.79% 77.67% 
~--1.65%) (--3.74%) 
Table 5: Results of agreement rates. 
Agreement rate 
Pair of modifiers Coml)lete agreement 
Our method 87.44%(72,367/14,137) 75.41% (3,980/5,278) 
Baseline1 48.96% (6,921/14,137) 35.10% (7,747/5,278) 
Baseline2 49.20% (6,956/14,137) 53.84% (1,786/5,278) 
IIere we assume that B1 and \]32 are modifiers, their 
modifiee is B, the word types of B1 and \]32 are re- 
spectively Wl and we. The values frcq(wr2) and 
frcq(w.27 ) then respectively represent the fl'equencies 
with which w7 and w,2 appeared in the order "WT, we, 
mid w" and "w2, WT, and w" in Malnichi newspaper 
articles fl'om 1991 to 1997. a Equation (7) means 
that given the sentence "~t~lt I:t (Taro_wa) / ~- -- ~, 
(tennis_wo) / b ~-:o (sita.)," one of two possibili- 
ties, "1$ (wa) / ~ (wo) / t, ~:o (sita.)" and "#c (wo) 
/ tS (wa) / b ~:o (sita.)," which has the higher fre- 
quency, is selected. 
3.3 Features and Agreement Rate 
This section describes how much each feature set con- 
tributes to improving the agreement rate. 
The values listed in the rightmost columns in Ta- 
bles 3 and 4 shows the performance of the word or- 
der estimation without each feature set. The values 
in parentheses are the percentage of improvement or 
degradation to the formal experiment. In the exper- 
iments, when a basic feature was deleted, the com- 
bined features that included the basic feature were 
also deleted. The most useful feature is the type of 
3When wl and w2 were the same word, we used the head 
words in Bt and 132 as Wl and w2. When one offreq(wt2) and freq(w21) 
was zero and the other was five or more, we used 
the flequencies when they appeared in the order "Wl ws" and 
"w2 wt," respectively~ instead of frcq(wi2) al,d freq(wsl). 
When both freq(wl.2) and freq(w27) were zero, we instead 
used random figures between 0 and t. 
bunsetsu, which basically signifies the case marker or 
inflection type. This result is close to our expecta- 
tions. 
We selected features that, according to linguistic 
studies, as mudl as possible reflect the basic condi- 
tions governing word order. The rightmost column 
in Tables 3 and 4 shows the extent to which each con- 
dition contributes to improving the agreement rate. 
However, each category of features might be rougher 
than that which is linguistically interesting. For ex- 
ample, all case markers such as "wa" and "wo" were 
classified into the same category, and were deleted 
together in the experiment when single categories 
were removed. An experiment that considers each 
of these markers separately would help us verify the 
importance of these markers separately. If we find 
new features in future linguistic research on word or- 
der, the experiments lacking each feature separately 
would help us verify their importance in the same 
manner. 
3.4 Training Corpus and Agreement Rate 
The agreement rates for the training corpus and the 
test corpus are shown in Figure 1 as a function of 
the amount of training data (ntunber of sentences). 
The agreement rates in the "pair of modifiers" and 
'®!19~ ..... %:I:: 'i, ~ ......... ~':z~-,~',:: : : i 
= °I 
90 90 
!' S5 ~5 i ~ - 
E 
75 ~ 75 
7O 7O 
65 ........ (,5 0 .... 0 2000 400{) 6000 80O0 Io(mO 120{}0 14O00 16000 18000 2000 4000 6,300 ~000 IODO0 12(;00 14000 16O00 la00{1 
\] 11o NtllObor o~ Snnloncos \]o 111o Traiflin9 Data llm Number ol Sentences in llm T f 4dlli11U \[)ala 
Figure 1: Relationship between tile amount of training 
data and the agreement rate. 
"Complete agreement" measurements were respec- 
tiw~ly 82.54% and 68.40%. These values were ob- 
tained with very small training sets (250 sentences). 
These rates m'e considerably higher than those of 
the baselines, indicating that word order in Japanese 
can be acquired fl'om newspaper articles even with a 
small training set. 
With 17,562 training sentences, the agreemenl, 
rate in the "Complete agreement" measurement was 
75.41%. We randomly selected and analyzed 100 
modifiees from 1,298 modifiees whose modifiers' word 
order did not agree with those in the original text. 
We found that 48 of them were in a natural order 
and 52 of them were in an unnatural order. The 
former result shows that the word order was rela- 
tively fl'ee and several orders were acceptable. The 
latter result shows that the word order acquisition 
was not sufficient. To complete the acquisition we 
need more training corpora and features which take 
into account different information than that m Ta- 
bles 3 mid 4. We found many idiomatic expres- 
876 
sions in the uimatural word order results, such as "~ 
ffl\[~il~:5~ (houchi-kokka_ga, a country under the rule 
of law) /\[l}l~ (kiitc, to listen) /~#t~ (alcireru, 
to disgust), ~rj ~ b ?= a ~ (souan-s~,ta-no_ga, orlgl- 
,ration) / ~ *o ~- *o co (somosomo-no, at all) / ~t~ ~ U 
(hg~'ima~'4 the beginning)," and ""~ l~ (g#-~4 taste) / 
~'~B (seikon, one's heart and soul) /g~ 6 (homcru, 
to trot somethil,g into soinething)." We think that 
the apt)ropriate word order for these idiomatic ex- 
pressions could be acquired if we had more training 
data. We also found several coordinate structures in 
the Ulnlatural word order results, suggesting that we 
should survey linguistic studies on coordinate struc- 
tures and try to find efllcient features for acquiring 
word order from coordinate structures. 
We (lid not use the results of semantic and con- 
textual analyses as input because corpora with se- 
mantic and contextuM tags were not available. If 
such corpora were available, we could more et\[iciently 
use features dealing with seinantic features, reference 
pronouns, and repetition words. We plan to make 
corpora with semantic and contextual tags and use 
these tags as input. 
3.5 Acquisition from a Raw Corpus 
In this section, we show that a raw cortms instead of 
a tagged corpus can be used to train the lnodel, if it 
is first analyzed by a parser. We used the lnorl)holog- 
ical analyzer JUMAN and a tmrser KNP (Kurohashi, 
11198) which is based on a det)endency grainlnar, 
it, order to extract iuforumtion from a raw corpus 
for detecting whether or not each feature is found. 
'l?tm accuracy of JUMAN for detecting inorphologi- 
cal boundaries and part-of-speech tags is about 98%, 
and the parsecs dependency accuracy is about 90%. 
These results were obtained from analyzing Mainichi 
newspaper articles. 
We used 217,562 sentences for training. When 
these sel~t, ences were all extracted from a raw corlms , 
the agreement rate was 87.64% for "pair of modifiers" 
and was 75.77% for "Colnplete agreement." When 
the 217,562 training sentences were sentences fl'oln 
the tagged cortms (17,562 sentences) used in our for- 
real exl)eriment aInl froln a raw cortms, the agree- 
" e S :~ ment rate for "pair of lno(hfi.r, was 87.66% and 
for "Complete agreement" was 75.88%. These rates 
were about 0.5% higher than those obtained when we 
used only sentences from a tagged corlms. Thus, we 
can acquire word order by adding inforlnation froln 
a rmv corpus even if we do not have a large tagged 
corpus. The results also indicate that the parser ac- 
curacy is not so significant for word order acquisition 
and that an accuracy of about 90% is sufficient. 
4 Conclusion 
This paper described a method of acquiring word or- 
der froln corpora. We defined word order as the order 
of lnodifiers which depend on tile same lnodifiee. The 
lnethod uses a model which estimates the likelihood 
of the apt)ropriate word order. The lnodel automat- 
ically discovers what the tendency of the word order 
in Japanese is by nsing various ldnds of information 
in and arouud the target bunsetsus plus syntactic 
and contextual inforlnation. The contribution rate 
of each piece of inforination in deciding word order 
is efficiently learned by a model implemented within 
an ),,I.E. framework. Comparing results of experi- 
ments controlling for each piece of information, we 
found that the type of inforinatiou having the great~ 
est influence was the case marker or inflection type in 
a bunsetsu. Analyzing the relationship between the 
amount of training data and the agreement rate, we 
fimnd that word order could be acquired even with 
a small set of training data. We also folmd that a 
raw cortms as well as a tagged cortms can be used to 
train the model, if it is first, analyzed by a parser. The 
agreement rate was 75.41% for the Kyoto University 
corpus. We analyzed the lnodifiees whose modifiers' 
word order did not agree with that in the original 
text, and folmd that 48% of theln were in a natural 
order. This shows that, in umny cases, word order 
in Japanese is relatively free and several orders are 
acceptable. 
The text we used were lmwspaper articles, which 
tend to have a standard word order, but we think 
that word orders tend to differ between ditferent 
styles of writing. We would therefore like to carry 
out experiments with other types of texts, such as 
novels, having styles different froln that of newspa- 
pers. 
it has been (lift\]cult to evaluate tile reslflts of text 
generation objectively becmlse there have been no 
good stmldards for ewlllmtion. By using the stan- 
(lard we describe in this paper, however, we can evN- 
uate results objectively, at least for word order esti- 
mation in text, generation. 
We expect that our lnodel can be used for several 
applications as well as linguistic veritication, such as 
text; refinement silt)port and text generation in nla- 
chine translation. 

References 

Adam L. Better, Stephen A. I)ella Pietra, and Vincent J. Della 
Pietra. 199(L A Maximum t'\]ntropy Approach to N~ttural l,~tn- 
gmtge Processing. Computational Linguistics, 22(11:39-71. 

Sadao Kurohashi and Makol.o Nagao. 1997. Kyoto University 
Text Corl)uS Project. In Proceedings of The Third Annual 
Mectin9 of The Association for Natural Language Process- 
ing, pages 115-118. (in Japanese). 

Sadao Kurohashi and Makoto Nagao, 1998. Japanese Morpho- 
logical Analysis System JUMAN Version 3.6. l)epartment of 
Informatics, Kyoto University. 

Sadao Kurohashi, 11198. Japanese Dependency/Case Structure 
Analyzer KNP Version 2.0b6. Department of Inforln~tties, Ky- 
ore University. 

IIiroshl Maruyama. 1994. Experhnents on V~rord-Order Recovery 
Using N-Cram Models. In YTte \]~9th Annual (;onvention IPS 
Japan. (in Japanese). 

NLRl(National Language Research Institute). 1964. Word List 
by Semantic Pri~ciples. Syuei Syuppan. (in Japmlese). 

Tetsuo Saekl. 1998. Yousetsu nihongo no 9ojun (Survey: Word 
Order in Japanese). Kuroshio Syupl)an. (in Japanese). 

James Shaw and Vasileios Ilatzivassiloglou. 1999. Ordering 
Among l'remodifiers. In Proceedings of the 37th Annual Meet- 
ing of the Association for Computational Linguistics (,4 CL), 
pages 135-143. 

Takenobu Tokunaga and IIozumi Tanaka. 1991. On Estimat- 
ing Japanese Word Order B~used on Valency Information. 
Keiryo Kokugogaku (Mathematical Linguistics), 18(21:53-(;5. 
(in Japanese). 
