Reducing Parsing Complexity by Intra-Sentence Segmentation 
based on Maximum Entropy Model 
Sung Dong Kim, ByoungoTak Zhang, Yung Taek Kim 
School of Computer Science and Engineering, 
Seoul National University, Korea 
{sdkim,btzhang}@scai. snu. ac. kr, ytkim@cse, snu. ac. kr 
Abstract 
Long sentence analysis has been a critical 
problem because of high complexity. This pa- 
per addresses the reduction of parsing com- 
plexity by intra-sentence segmentation, and 
presents maximum entropy model for deter- 
mining segmentation positions. The model 
features lexical contexts of segmentation posi- 
tions, giving a probability to each potential 
position. Segmentation coverage and accu- 
racy of the proposed method are 96% and 
88% respectively. The parsing efficiency is im- 
proved by 77% in time and 71% in space. 
1 Introduction 
Long sentence analysis has been a critical 
problem in machine translation because of 
high complexity. In EBMT (example-based 
machine translation), the longer a sentence 
is, the less possible it is that the sentence 
has an exact match in the translation archive, 
and the less flexible an EBMT system will be 
(Cranias et al., 1994). In idiom-based ma- 
chine translation (Lee, 1993), long sentence 
parsing is difficult because more resources are 
spent during idiom recognition phase as sen- 
tence length increases. A parser is often un- 
able to analyze long sentences owing to their 
complexity, though they have no grammatical 
errors (Nasukawa, 1995). 
In English-Korean machine translation, 
idiom-based approach is adopted to overcome 
the structural differences between two lan- 
guages and to get more accurate translation. 
The parser is a chart parser with a capabil- 
ity of idiom recognition and translation, which 
is adapted to English-Korean machine trana- 
lation. Idioms are recognized prior to syn- 
tactic analysis and the part of a sentence for 
an idiom takes an edge in a chart (Winograd, 
1983). When parsing long sentences, an am- 
biguity of an idiom's range may cause more 
edges than the number of words included in 
the idiom (Yoon, 1994), which increases pars- 
ing complexity much. A parser of practical 
machine translation system should be able to 
analyze long sentences in a reasonable time. 
Most context-free parsing algorithms have 
O(n 3) parsing complexities in terms of time 
and space, where n is the length of a sen- 
tence (Tomita, 1986). Our work is moti- 
vated by the fact that parsing becomes more 
efficient, if n becomes shorter. This paper 
deals with the problem of parsing complex- 
ity by way of reducing the length of sentence 
to be analyzed. This reduction is achieved 
by intra-sentence segmentation, which is 
distinguished from inter--sentence segmen- 
tation that is used for text categorization 
(Beeferman et al., 1997) or sentence boundary 
identification (Palmer and Hearst, 1997) (Rey- 
nar and Ratnaparkhi, 1997). Intra-sentence 
segmentation plays a role as a preliminary 
step to a chart-based, context-free parser in 
English-Korean machine translation. 
There have been several methods for 
reducing parsing complexities by intra- 
sentence segmentation. In (Lyon and Frank, 
1995)(Lyon and Dickerson, 1997), they took 
advantage of the fact that the declarative 
sentences almost always consist of three seg- 
ments: \[pre-subject : subject:predicate\]. 
The complexity could be reduced by decom- 
posing a sentence into three sections. Pattern 
rules (Li et al., 1990) and sentence patterns 
(Kim and Khn, 1995) were used to segment 
long English sentences. They showed low seg- 
mentation coverage, which means that many 
of long sentences are not segmented by the 
pattern rules or sentence patterns. And they 
require much human efforts to construct pat- 
tern rules or collect sentence patterns. These 
factors may prevent them being applicable to 
practical machine translation sYstems. 
This paper presents a trainable model for 
identifying potential segmentation positions 
164 
in a sentence and determining appropriate 
segmentation positions. Given a corpus anno- 
tated with segmentation positions, our model 
automatically learns the contextual evidences 
about segmentation positions, which relieves 
human of burden to construct pattern rules or 
sentence patterns. These evidences are com- 
bined under the maximum entropy framework 
(Jaynes, 1957) to estimate the probability for 
each position. By intra-sentence segmenta- 
tion based on the proposed model, we achieve 
more improved parsing efficiency by 77% in 
time and 71% in space. 
In Section 2 we introduce the maximum en- 
tropy model. Section 3 describes features in- 
corporated into the model and the process of 
identifying potential segmentation positions. 
The determination schemes of segmentation 
positions are described in Section 4. Segmen- 
tation performance of the model is presented 
with the degree of contribution to efficient 
parsing by the segmentation in Section 5. We 
also compare our approach with other intra- 
sentence segmentation approaches. Section 6 
draws conclusions and presents some further 
works. 
2 Maximum Entropy Modeling 
Sentence patterns or pattern ruels specify the 
sub-structures of the sentences. That is, seg- 
mentation positions are determined in view of 
the global sentence structure. If there is no 
matched rules or patterns with a given sen- 
tence, the sentence could not be segmented. 
We assume that whether a word is a segmenta- 
tion position depends on its surrounding con- 
text. We try to find factors that affect the de- 
termination of segmentation positions. Maxi- 
mum entropy is a technique for automatically 
acquiring knowledge from incomplete infor- 
mation, without making any unsubstantiated 
assumptions. It masters subtle effects so that 
we may accurately model subtle dependencies. 
It does not make any unwarranted assump- 
tions, which means that maximum entropy 
learns exactly what the data says. Therefore 
it can perform well on unseen data. 
The idea is to construct a model that as- 
signs a probability to each potential segmen- 
tation position in a sentence. We build a prob- 
ability distribution p(ylx), where y • {0, 1} 
is a random variable specifying the potential 
segmentation position in a context x. A fea- 
ture of a context is a binary-valued indicator 
function \] expressing the information about a 
specific context. 
Given a training sample of size N, 
(Xl,Yl),..-, (XN,YN), an empirical proba- 
bility distribution can be defined as 
y) = y) 
N 
where #(x, y) is the number of occurrences of 
(x, y). The expected value of feature fi with 
respect to the empirical distribution i~(x, y) is 
expressed as 
x,y 
and the expected value of fi with respect to 
the probability distribution p(ylx) is 
p(.fi) -~ ~(x)pCylx).h(x, y), 
x~y 
where l~(x) is the empirical distribution of x 
in the corpus. We want to build probability 
distribution p(ylx) that is required to accord 
to the feature fi useful in selecting segmenta- 
tion positions: P(fi) = IS(fi) for all fi • .T, 
where Y is the set of candidate features. This 
makes the probability distribution be built on 
only training data. 
Given a feature set .T, let C be the subset 
of all distributions P that satisfies the require- 
ment P(fi) = P(fi): 
C ~- {p • ~ \] P(fi) = P(fi), for all fi • .T}. 
(1) 
We choose a probability distribution consis- 
tent with all the facts, but otherwise as uni- 
form as possible. The uniformity of the prob- 
ability distribution p(ylx) is measured by the 
conditional entropy: 
H(p) = - ~p(x, y) logp(ylx) 
x~y 
=   (x)PCulx) logp(ylx) 
x,y 
Thus, the probability distribution with maxi- 
mum entropy is the most uniform distribution. 
In building a model, we consider the linear 
exponential family Q given as 
1 Q(f) = {p(ylx)= ~exp( E ~ifi(x,y))}, 
$ (2) 
165 
where Ai are real-valued parameters and 
ZA(x) is a normalizing constant: 
= exp(  y)). 
y i 
An intersection of the class Q of exponential 
models with the class of desired distribution 
(1) is nonempty, and the intersection contains 
the maximum entropy distribution and fur- 
thermore it is unique (Ratnaparkhi, 1994). 
Finding p. E C that maximizes H(p) is a 
problem in constrained optimization, which 
cannot be explicitly written in general. There- 
fore, we take advantage of the fact that the 
models in Q that satisfy p(fi) = 15(fi) can 
be explained under the maximum likelihood 
framework (Ratnaparkhi, 1994). Maximum 
likelihood principle also gives the unique dis- 
tribution p., the intersection of the class Q 
with C. 
We assume each occurrence of (x,y) is 
sampled independently. Thus, log-likelihood 
L#(p) of the empirical distribution i5 as pre- 
dicted by a model p can be defined as 
L~) _= log II p(ylx) ~(~'y) = ~\]p(x, y) logp(ylx ). 
x,y x,y 
That is, the model we want to build is 
p. = arg  xc = arg max qE~ 
The parameters A~ of exponential model (2) 
are obtained by the Generalized Iterative Scal- 
ing algorithm (Darroch and Ratcliff, 1972). 
3 Construction of Features 
This section describes the features. From a 
corpus, contextual evidences of segmentation 
positions are collected and combined, result- 
ing in features. The features are used in iden- 
tifying potential segmentation positions and 
included in the model. 
3.1 Segmentable Positions and Safe 
Segmentation 
A sentence is constructed by the combina- 
tion of words, phrases, and clauses under the 
well-defined grammar. A sentence can be seg- 
mented into shorter segments that correspond 
to the constituents of the sentence. That is, 
segments correspond to the nonterminal sym- 
bols of the context-free grammar 1. The posi- 
1Nonterminal symbols include the ones for phrases, 
such as NP (noun phrase) and VP (verb phrase), 
tion of a word is called segmentable posi- 
tion that can be a starting position of a spe- 
cific segment. 
Though the analysis complexity can be re- 
duced by segmenting a sentence, there is 
a mis-segmentation risk that causes pars- 
ing failures. A segmentation can be called 
safe segmentation that results in a coherent 
blocks of words. In English-Korean transla- 
tion, safe segmentation is defined as the one 
which generates safe segments. A segment is 
safe, when there is a syntactic category sym- 
bol N P dominating the segment and the seg- 
ment can be combined with adjacent segments 
under a given grammar. In Figure 1, (a) is an 
unsafe segmentation because the second seg- 
ment cannot be analyzed into one syntactic 
category, resulting in parsing failure. By the 
safe segmentation (b), the first segment cor- 
responds to a noun phrase and the second to 
a verb phrase, so that we can get a correct 
analysis result. 
(a) IThe students I 
who study hard will pass the exam\] 
(b) I The students who study hard I 
I will pass the exam\[ 
Figure h Examples of unsafe/safe segmenta- 
tion in English-Korean translation. 
3.2 Lexical Contextual Constraints 
A lexical context of a word includes seven- 
word window: three to the left of a word and 
three to the right of a word and a word itself. 
It also includes the part-of-speeches of these 
words, subcategorization information for two 
words to the left, and position value. The 
position value posi_v of the ith word wi is cal- 
culated as 
pos _v = r × R\], 
where n is the number of words and R 2 repre- 
sents the number of regions in the sentence. 
Region is the sequentially ordered block of 
and the ones for clauses like RLCL (relative clause), 
SUBCL (subordinate clause). 
sit is a heuristically set value, and we set R as 4. 
166 
words in a sentence, and posi_v represents the 
region in which a word lies. It is included to 
reflect the influence of the position of a word 
on being a segmentation position. Thus, the 
lexical context of a word is represented by 17 
attributes as shown in Figure 2. 
s_position? 
wordi 
Wi-3~ •.., Wi+3 
Pi-3, " •, Pi+3 
8.-Cgtti-2, S-carl- 1 
posi_v 
Figure 2: The structure of lexical context. 
An example of a training data and a re- 
sulting lexical context is shown in Figure 3. 
A symbol '#' represents a segmentation posi- 
tion marked by human annotators. Therefore, 
the lexical context of word when includes the 
value 1 for attribute s_position? and follow- 
ings: three words to the left of when (became, 
terribly, and worried) and part-of-speeches 
of each word (VERB ADV ADJ), three words 
to the right (they, saw, and what) and part- 
of-speeches (PRON VERB PRON), subcat- 
egorization information for two words to the 
left (0 1), and position value (2). 
Of course his parents became terribly worried 
#when they saw what was happening 
to Atzel. 
( 1 when became terribly worried they saw 
what VERB ADV ADJ PRON VERB 
PRON 0 1 2 ) 
Figure 3: An example of a training data and 
a lexical context. 
To get reliable statistics, much training 
data is required. To alleviate this prob- 
lem, we generate lexical contextual con- 
stralnts by combining lexical contexts and 
collect statistics for them. To generate lex- 
ical contextual constraints and to identify 
segmentable positions, we define two oper- 
ations join (E9) and consistency (=). Let 
(al,...,an) and (bl,...,bn) be lexical con- 
texts and (C1,... ,On) be lexical contextual 
constraint. The operation join is defined as 
(al,..., an) • (bl,.-., bn) = (C1,..., Ca), 
,.i if ai # bi 
Ci = ai if ai = bi ' 
where ',' is don't-care term accepting any 
value. A lexical contextual constraint is gen- 
erated as a result of join operation. The 
consistency is defined as 
1 if (Ci=aiorCi='*r) foralll<i<n 
k = 0 otherwise 
The algorithm for generating lexical contex- 
tual constraints is shown in Figure 4. 
• Input: a set of active lexical contexts 
LCw = {lcl... lcn} for word w, 
where lcc/= (al,..., an). 
• Output: a set of lexical contextual 
constraints LCCw = {/ccl .../cck}, 
where lcc/= (C1,..., Cn). 
1. Initialize LCCw = 0 
2. Do the followings for each l~ E LCw 
(a) For all lcj(j # i), Count(lcj) = # of 
matched attributes with Ic/ 
(b) max_cnt = arg maxlc¢ eLC. Count( Icj ) 
(c) For all lcj, where Count(lcj) = max..cnt, 
Icc= lc~ • lc~, LCCw e- LCC, U {/cc} 
Figure 4: Algorithm for generating lexical 
contextual constraints. 
A Icc plays the role of a feature. Following 
is an example of a feature. 
f(x,y) = 
1 if Xward = "that" and 
xi-1 = "say" and y = 1 
0 otherwise 
We collect the statistics for each Icc. The fre- 
quency of each lcc is counted as the number 
of lexical contexts that satisfy the consistency 
operation with the lcc. 
n 
i=1 
167 
Identifying segmentable positions is per- 
formed with the consistency operation with 
the lexical context of word w and lcc E LCCw. 
The word whose lexical context is consistent 
with lcc is identified as a segmentable posi- 
tion. 
4 Determination Schemes of 
Segmentation Positions 
Segmentation positions are determined 
through two steps: identifying segmentable 
positions and selecting the most appropriate 
position among them. Segmentable positions 
are identified using the consistency operation. 
Maximum entropy model in Section 2 gives a 
probability to each position. 
Segmentation performance is measured in 
terms of coverage and accuracy. Coverage is 
the ratio of the number of actually segmented 
sentences to the number of segmentation tar- 
get sentences that are longer than ot words, 
where o~ is a fixed constant distinguishing long 
sentences from short ones. Accuracy is evalu- 
ated in terms of the safe segmentation ratio. 
They are defined as follows: 
# of actually segmented Sent. 
coverage = ~ of Sent. to be segmented 
(3) 
# of Sent. with safe segmentation 
accuracy = ~ of actually segmented Sent. (a) 
4.1 Baseline Scheme 
No contextual information is used in identify- 
ing segmentable positions. They are empiri- 
cally identified. A word that is tagged as a 
segmentation position more than 5 times is 
identified as a segmentable position. A set of 
segmentable positions, 9, is as follows. 
~D = {wi \[ wi is tagged as segmentation position 
and #(tagged wi) >_ 5} 
In order to select the most appropriate po- 
sition, the segmentation appropriateness of 
each position is evaluated by the probability 
of word wi: 
# of tagged wi 
p(Wi) = # of wi in the corpus 
p(wi) represents the tendency that word wi 
will be used as a segmentation position. A 
segmentation position w. is selected as the one 
that has highest p(wi) value: 
w. = arg max p(wi). wiE~ 
This scheme serves as a baseline for comparing 
the segmentation performance of the models. 
4.2 A Scheme using Lexical 
Contextual Constraints 
Lexical contextual constraints are used in 
identifying segmentable positions. Compared 
with the baseline scheme, this scheme con- 
siders contextual information of a word. All 
consistent words with the defined lexical con- 
textual constraints form a set of segmentable 
positions 79. 
The maximum likelihood principle gives a 
probability distribution for p(y I lcc~), where 
y E {0, 1}. Segmentation appropriateness is 
evaluated by p(1 I lcew,). A position with the 
highest p(1 I lcc~) becomes a segmentation 
position: 
w. = arg max p(1 I/CCwi). 
wi E~ 
4.3 A Scheme using Lexical 
Contextual Constraints with 
Word Sets 
Due to insufficient training samples for con- 
structing lexical contextual constraints, some 
segmentable positions may not be identified. 
To alleviate this problem we introduce word 
sets whose elements have linguistically similar 
features. We define four word sets: coordinate 
conjunction set, subordinate conjunction set, 
interogative set, auxiliary verb set. The cate- 
gories of word sets and the examples of their 
members are shown in Table 1. 
Table 1: The word sets and examples. 
Word Set Examples 
Coordinate Conjunctions and, or, but 
Subordinate Conjunctions if, when, ... 
Interogatives how, what, ... 
Auxiliary Verbs can, should, ... 
Coordinate conjunctions haveonly 3 mem- 
bers, but they frequently apprear in long sen- 
tences. Subordinate conjunctions have 25 
168 
members, interogatives 5 members, and aux- 
iliary verbs have 12 members now. The words 
belonging to each word set are treated equally. 
Lexical contextual constraints are constructed 
for words and word sets, so the statistics is 
collected for both of them. The set of seg- 
mentable positions T~ is defined somewhat dif- 
ferently as: 
:D = {wi, wsj I (Icc, v, -= lcc~,) = 1 
or (Icws~ =-- IcC.ws~) = 1}, 
where wsj denotes a word set to which the jth 
word in a sentence belongs. 
In this scheme, p(1 I Iccc,,,) or p(1 \] lccws,) 
expresses the segmentation appropriateness of 
the position. Therefore, a segmentation posi- 
tion is determined by 
w, = arg max {p(1 I lcc ,), p(1 I lcc s )}. {w,,ws~}~9 
5 Experiments 
5.1 Corpus and Construction of the 
Maximum Entropy Model 
We construct the corpus from two different 
domains, where the sentences longer than 15 
-words are extracted 3. The training portion is 
used to generate lexical contextual constraints 
and to collect statistics for maximum entropy 
model construction. From high school English 
texts, 1500 sentences are tagged with segmen- 
tation positions by human. Two people who 
have some knowledge about English syntactic 
structures read sentences, and marked words 
as segmentation positions where they paused. 
After generating lexical contextual con- 
straints, we constructed the maximum en- 
tropy model p(ylx), where x is a lexical con- 
textual constraint and y E {0,1}. The model 
incorporates features that occur more than 5 
times in the training data. 3626 candidate fea- 
tures were generated without word sets and 
3878 features with word sets. In Table 2, 
training time and the number of active fea- 
tures of the model are shown. 
Segmentation performance is evaluated us- 
ing test portion that consists of 1800 sentences 
ffrom two domains: high school English texts 
and the Byte Magazine. 
3The sentences with commas are excluded because 
comma is an explicit segmentation position. Segments 
resulting from a segmentation at commas may be the 
manageable-sized ones. Our work is to segment long 
sentences without explicit segmentation positions. 
Table 2: Construction of models. 
Training # of 
Time Active Features 
Without 10 rain 2720 
Word Sets 
With 12 mln 2910 
Word Sets 
5.2 Segmentation Performance 
In addition to coverage and accuracy, SC 
value is also defined to express the degree of 
contribution to efficient parsing by segmenta- 
tion. It is the ratio of the sentences that can 
benefit from intra-sentence segmentation. If a 
long sentence is not segmented or is segmented 
at unsafe segmentation positions, the sentence 
is called a segmentation error sentence. 
SC value is calculated as 
# of segmentation error sentences SG= I- 
# of segmentation target sentences" 
A sentence longer than vt words is con- 
sidered as the segmentation target sentence, 
where c~ is set to 12. Table 3 compares seg- 
mentation performance for each determina- 
tion scheme. 
Table 3: Segmentation performance of the de- 
termination schemes of segmentation position. 
Determination Coverage/ 
Schemes Accuracy (%) 
Baseline 100/77.6 
LCC 90.7/89 
LCC with 95.8/87.9 
Word Sets 
SC 
0.776 
0.808 
0.865 
By the comparison of the baseline scheme 
with others, the accuracy is observed to de- 
pend on the context information. Word sets 
are helpful for increasing coverage with less 
degradation of accuracy. Each scheme has su- 
periority in terms of the different measures. 
But in terms of applicability to practical sys- 
tems, the third scheme is best for our purpose. 
Table 4 shows the segmentation performance 
of the scheme using LCC with word sets. 
SU value for the sentences from the same 
domain as training data is about 0.88, and 
169 
Table 4: Segmentation performance of LCC 
with word sets. 
Domain Sent. Coverage/ I 
Length Accuracy(%) I 
15~19 
High-School 
English Text 
Byte 
Magazine 
Total 
20~24 
25~29 
30~ 
15,-,19 
20,-~24 
25,,~29 
30,-~ 
99.0/95.9 
100/94.0 
96.0/81.3 
100/67.5 
94.0/92.6 
91.0/91.2 
92.5/94.6 
93.5/86.1 
1800 95.8/87.9 \] 
8C 
0.95 
0.94 
0.78 
0.6g 
0.87 
0.83 
0.88 
0.81 
0.87 
about 0.85 for the sentenes from the Byte 
Magazine. Though they slightly differ be- 
tween test domains, about 87% of long sen- 
tences can be parsed with less complexity and 
without causing parsing failures. It suggests 
that the intra-sentence segmentation method 
can be utilized for efficient parsing of the long 
sentences. 
5.3 Parsing Efficiency 
Parsing efficiency is generally measured by 
the required time and memory for parsing. 
In most cases, parsing sentences longer than 
30 words could not complete without intra- 
sentence segmentation. Therefore, the parsing 
is performed for the sentences longer than 15 
and less than 30 words. Ultra-Sparc 30 ma- 
chine is used for experiments. The efficiency 
improvement was measured by 
EItime tunseg -- tseg = ,,, x 100, 
tunseg 
EImemory = rnunseg --mseg X 100, 
~T~unse9 
where $unseg and rrbanseg are time and memory 
during parsing without segmentation and tseg, 
rnseg are for the parsing with segmentation. 
Table 5 summarizes the results. 
By segmenting long sentences into several 
manageable-sized segments, we can parse long 
sentences with much less time and space. 
5.4 Comparison with Related Works 
The intra-sentence segmentation method 
based on the maximum entropy model is corn- - 
pared with other approaches in terms of the 
Table 5: Comparison of parsing efficiency 
with/without segmentation. 
With 
Segmentation 
Without 
Segmentation 
Improvement 
High-School Byte 1 
English Text Magazine 
4.6 sec 5.4 sec 1 
0.9 MB 
19.6 sec 
3.4 MB 
76.5% 
73.5% 
1.1 MB 
'25.1 sec 
3.7 MB 
78.5% 
70.3% 
segmentation coverage and the improvement 
of parsing efficiency. 
In (Lyon and Frank, 1995)(Lyon and Dick- 
erson, 1997), a sentence is segmented into 
three segments. Though parsing efficiency can 
be improved by segmenting a sentence, this 
method may be applied to only simple sen- 
tences 4. Long sentences are generally coordi- 
nate sentences 5 or complex sentences 6. They 
have more than two subjects, so applying this 
method to such sentences seems to be inap- 
propriate. 
In (Kim and Kim, 1995), sentence patterns 
are used to segment long sentences. This 
method improve parsing efficiency by 30% in 
time and 58% in space. However collecting 
sentence patterns requires much hnman efforts 
and segmentation coverage is only about 36%. 
Li's method (Li et al., 1990) for sentence 
segmentation also depends upon manual- 
intensive pattern rules. Segmentation cover- 
age seems to be unsatisfactory for practical 
machine translation system. 
The proposed method can be applied to co- 
ordinate and complex sentences as well as sim- 
ple sentences. It shows segmentation coverage 
of about 96%. In addition, it needs no other 
human efforts except for constructing training 
data. Human ~.nnotators have only to read 
sentences and mark segmentation positions, 
which is more simple than collecting pattern 
rules or sentence patterns. We can also get 
much improved parsing efficiency: about 77% 
in time and about 71% in space. 
4A simple sentence has one subject and one predi- 
cate. 
5A coordinate sentence results ~om the combina- 
tion of several simple sentences by the coordinate con- 
junctions. - 
6A complex sentence consists of a main clause and 
several subordinate clauses. 
170 
6 Conclusion and Future Work 
Practical machine translation systems should 
be able to accommodate long sentences. Thus 
intra-sentence segmentation is required as a 
means for reducing parsing complexity. This 
paper presents a method for intra-sentence 
segmentation based on the maximum entropy 
model. The method builds statistical models 
automatically from a text corpus to provide 
the segmentation appropriateness for safe seg- 
mentation. 
In the experiments with 1800 test sentences, 
about 87% of them were benefited from seg- 
mentation. The statistical intra-sentence seg- 
mentation method can also relieve human of 
the burden of constructing information, such 
as segmentation rules or sentence patterns. 
Experiments suggest that the proposed maxi- 
mum entropy models can be incorporated into 
the parser for practical machine translation 
systems. 
Further works can be done in two direc- 
tions. First, studies on recovery mecha- 
nisms for unsafe segmentation before parsing 
seem necessary since ungafe segmentation may 
cause parsing failures. Second, parsing control 
mechanisms should be studied that exploit the 
-characteristics of segmentation positions and 
the parallelism among segments. This will en- 
hance parsing efficiency further. 

References 
D. Beeferman, A. Berger, and J. Lafferty. 1997. 
Text Segmentation using Exponential Models. 
In Second Conference on Empirical Methods in 
Natural Language Processing. Providence, RI. 
Lambros Cranias, Harris Papageorgiou, and Ste- 
lios Piperdis. 1994. A Matching Technique in 
Example-Based Machine Translati on. In Pro- 
ceedings of 1995 COLING, pages 100--104. 
J.N. Darroch and D. Ratcliff. 1972. Generalized 
Iterative Scaling for Log-linear Models. The 
Annals of Mathematical Statistics, 43(5):1470- 
1480. 
E.T. Jaynes. 1957. Information Theory and Sta- 
tistical Mechanics. Physical Review, 106:620- 
630. 
Sung Dong Kim and Yung Taek Kim. 1995. 
Sentence Analysis using Pattern Matching in 
English-Korean Machine Translation. In Pro- 
ceedings of the 1995 ICCPOL, Oct. 25-28. 
Ho Suk Lee. 1993. Automatic Construction of 
Transfer Dictionary based on the Corpus for 
English-Korean Machine Translation. Ph.D. 
thesis, Seoul National University. In Korean. 
Wei-Chuan Li, Tzusheng Pei, Bing-Huang Lee, 
and Chuei-Feng Chiou. 1990. Parsing Long Ea- 
ghsh Sentences with Pattern Rules. In Proceed- 
ings of 25th Conference of COLING, pages 410-- 
412. 
Caroline Lyon and Bob Dickerson. 1997. Reduc- 
ing the Complexity of Parsing by a Method of 
Decomposition. In International Workshop on 
Parsing Technology, September. 
Caroline Lyon and Ray Frank. 1995. Neural Net- 
work Design for a Natural Language Parser. 
In International Conference on Artificial Neural 
Networks. 
Tetsura Nasukawa. 1995. Robust Parsing Based 
on Discourse Information. In 33rd Annual 
Meeting of the A CL, pages 33-46. 
David D. Palmer and Marti A. Hearst. 1997. 
Adaptive Multilingual Sentence Boundary 
Disambiguation. Computational Linguistics, 
23(2):241-265. 
A. Ratnaparkhi. 1994. A Simple Introduction 
to Maximum Entropy Models for Natural Lano 
gnage Processing. Technical report, Institute 
for Research in Cognitive Science, University of 
Pennsylvania 3401 Walnut Street, Suite 400A 
Philadelphia, PA 19104-6228, May. IRCS Re- 
port 97-08. 
J.C. Reynar and A. Ratnaparkhi. 1997. A Maxi- 
mum Entropy Approach to Identifying Sentence 
Boundaries. In Proceedings of the Fifth Confer- 
ence on Applied Natural Language Processing, 
pages 16--19. Washington D.C. 
Masaru Tomita. 1986. E~icient Parsing for Nat- 
ural Language. Kluwer Academic Publishers. 
T. Winograd. 1983. Language as a Cognitive Pro- 
cess: Syntax, volume 1. Addison-Wesley. 
Sung Hee Yoon. 1994. Efficient Parser to Find 
Bilingual Idiomatic Expressions for English- 
Korean Mvchine Translation. In Proceedings of 
the 1994 ICCPOL, pages 455-460. 
