A HARDWARE ALGORITHM 
FOR HIGH SPEED MORPHEME EXTRACTION 
AND ITS IMPLEMENTATION 
Toshikazu Fukushima, Yutaka Ohyama and Hitoshi Miyai 
C&C Systems Research Laboratories, NEC Corporation 
1-1, Miyazaki 4-chome, Miyamae-ku, Kawasaki City, Kanagawa 213, Japan 
(fuku@tsl.cl.nec.co.jp, ohyama~tsl.cl.nec.co.jp, miya@tsl.cl.nec.co.jp) 
ABSTRACT 
This paper describes a new hardware algorithm 
for morpheme extraction and its implementation 
on a specific machine (MEX-I), as the first step 
toward achieving natural language parsing accel- 
erators. It also shows the machine's performance, 
100-1,000 times faster than a personal computer. 
This machine can extract morphemes from 10,000 
character Japanese text by searching an 80,000 
morpheme dictionary in I second. It can treat 
multiple text streams, which are composed of char- 
acter candidates, as well as one text stream. The 
algorithm is implemented on the machine in linear 
time for the number of candidates, while conven- 
tional sequential algorithms are implemented in 
combinational time. 
1 INTRODUCTION 
Recent advancement in natural language pars- 
ing technology has especially extended the word 
processor market and the machine translation sys- 
tem market. For further market extension or new 
market creation for natural language applications, 
parsing speed-up as well as improving parmng ac- 
curacy is required. First, the parsing speed-up 
directly reduces system response time required in 
such interactive natural language application sys- 
tems as those using natural language interface, 
speech recognition, Kana-to-Kanjl i conversion, 
which is the most popular Japanese text input 
method, and so on. Second, it also increases the 
advantage of such applications as machine transla- 
tion, document proofreading, automatic indexing, 
and so on, which are used to treat a large amount 
of documents. Third, it realizes parsing meth- 
ods based on larger scale dictionary or knowledge 
database, which are necessary to improve parsing 
accuracy. 
Until now, in the natural language processing 
field, the speed-up has depended mainly on perfor- 
mance improvements achieved in sequential pro- 
cesslng computers and the development of sequen- 
tial algorithms. Recently, because of the further 
IKan~ characters are combined consonant and vowel 
symbols used in written Japanese. Kanjl characters ~re 
Chinese ideographs. 
speeded-up requirement, parallel processing com- 
puters have been designed and parallel parsing al- 
gorithms (Matsumoto, 1986) (Haas, 1987) (Ryt- 
ter, 1987) -(Fukushima, 1990b) have been pro- 
posed. However, there are many difficult problems 
blocking efficient practical use of parallel process- 
ing computers. One of the problems is that ac- 
cess confiicts occur when several processors read 
or write a common memory simultaneously. An- 
other is the bottle-neck problem, wherein commt- 
nication between any two processors is restricted, 
because of hardware scale limitation. 
On the other hand, in the pattern processing 
field, various kinds of accelerator hardware have 
been developed. They are designed for a special 
purpose, not for general purposes. A hardware 
approach hasn't been tried in the natural language 
processing field yet. 
The authors propose developing natural lan- 
guage parsing accelerators, a hardware approach 
to the parsing speed-up (Fukushima, 1989b) 
-(Fukushima, 1990a). This paper describes a new 
hardware algorithm for high speed morpheme ex- 
traction and its implementation on a specific ma- 
chine. This morpheme extraction machine is de- 
signed as the first step toward achieving the nat- 
ura\] language parsing accelerators. 
2 MACHINE DESIGN 
STRATEGY 
2.1 MORPHEME EXTRACTION 
Morphological analysis methods are generally 
composed of two processes: (1) a morpheme ex- 
traction process and (2) a morpheme determina- 
tion process. In process (1), all morphemes, which 
are considered as probably being use<\] to construct 
input text, are extracted by searching a morpheme 
dictionary. These morphemes are extracted as 
candidates. Therefore, they are selected mainly 
by morpheme conjunction constraint. Morphemes 
which actually construct the text are determined 
in process (2). 
The authors selected morpheme extraction as 
the first process to be implemented on specific 
hardware, for the following three reasons. First 
is that the speed-up requirement for the morpho- 
logical analysis process is very strong in Japanese 
307 
Input Text ........ 
~.p)i~ C. ...... ~ Iverb 
! ! i ' ' i I noun 
; I i ,1", ; ~'~,~: I noun 
~MorphemeExtraction~l fi~ inoun ~.~ Process 
..,) ,ti~ inou n 
~ i Morpheme Dictionary 
!~; postposition 
i .....  su,,x 
!~, :verb 
I I 
I I : , ~,~ noun 
i .......... d 
'" ........ "1 i ~)f :suffix Extracted 
= ' Morphemes 
i i~#~. :noun 
= , ......... / 
I I 
..... !vo,  
~)f ! no,,n 
; ......... I 
Figure h Morpheme Extraction Process for 
Japanese Text 
2.2 STRATEGY DISCUSSION 
In conventional morpheme extraction methods, 
which are the software methods used on sequential 
processing computers, the comparison operation 
between one key string in the morpheme dictio- 
nary and one sub-string of input text is repeated. 
This is one to one comparison. On the other hand, 
many to one comparison or one to many compar- 
ison is practicable in parallel computing. 
Content- addressable mem- 
ories (.CAMs) (Chlsvln, 1989) (Yamada, 1987) re- 
allze the many to one comparison. One sub-string 
of input text is simultaneously compared with all 
key strings stored in a CAM. However, presently 
available CAMs have only a several tens of kilo- 
bit memory, which is too small to store data for a 
more than 50,000 morpheme dictionary. 
The above mentioned parallel processing com- 
puters realize the one to many comparison. On 
the parallel processing computers, one processor 
searches the dictionary at one text position, while 
another processor searches the same dictionary at 
the next position at the same time (Nakamura, 
1988). However, there is an access conflict prob- 
lem involved, as already mentioned. 
The above discussion has led the authors to the 
following strategy to design the morpheme extrac- 
tion machine (Fukushima, 1989a). This strategy is 
to shorten the one to one comparison cycle. Simple 
architecture, which will be described in the next section, can realize this strategy. 
text parsing systems. This process is necessary for 
natural language parsing, because it is the first 
step in the parsing. However, it is more labo- 
rious for Japanese and several other languages, 
which have no explicit word boundaries, than for 
Engllsh and many European languages (Miyazald, 
1983) (Ohyama, 1986) (Abe, 1986). English text 
reading has the advantage of including blanks be- 
tween words. Figure 1 shows an example of the 
morpheme extraction process for Japanese text. 
Because of the disadvantage inherent in reading 
difficulty involved in all symbols being strung to- 
gether without any logical break between words, 
the morpheme dictionary, including more than 
50,000 morphemes in Japanese, is searched at al- 
most all positions of Japanese text to extract mor- 
phemes. The authors' investigation results, indi- 
cating that the morpheme extraction process re- 
quires using more than 70 % of the morphologi- 
cal analysis process time in conventional Japanese 
parsing systems, proves the strong requirement for 
the speed-up. 
The second reason is that the morpheme ex- 
traction process is suitable for being implemented 
on specific hardware, because simple character 
comparison operation has the heaviest percentage 
weight in this process. The third reason is that 
this speed-up will be effective to evade the com- 
mon memory access conflict problem mentioned in Section 1. 
308 
3 A HARDWARE ALGO- 
RITHM FOR MOR- 
PHEME EXTRACTION 
3.1 FUNDAMENTAL 
ARCHITECTURE 
A new hardware algorithm for the morpheme 
extraction, which was designed with the strategy 
mentioned in the previous section, is described in 
this section. 
The fundamental architecture, used to imple- 
ment the algorithm, is shown in Fig. 2. The main 
components of this architecture are a dictionary 
block, a shift register block, an index memory, an 
address generator and comparators. 
The dictionary block consists of character mem- 
ories (i.e. 1st character memory, 2nd character 
memory, ..., N-th character memory). The n-th 
character memory (1 < n < N) stores n-th charac- 
ters of all key strings \]-n th~ morpheme dictionary, 
as shown in Fig. 3. In Fig. 3, "iI~", "~f", "@1:~ 
", "~", "~", and so on are Japanese mor- 
phemes. As regarding morphemes shorter than 
the key length N, pre-deflned remainder symbols 
/ill in their key areas. In Fig. 3, '*' indicates the remainder symbol. 
The shift register block consists of character reg- 
isters (i.e. 1st character register, 2nd character reg- 
ister,..., N-th character register). These registers 
Address~'~._____J Index J,,~ enerator~/'--"--\] Memory 
cM ~*(~,comlpStrator~*~ lstCRli 
 iiiiiiiiiii i iii i!ii; ! ii!ili! i;i 
I I' ,i 
TI N-th CM mparator~ 
, ............. ...... ..--.-.-~.-~ 
Mazcn ~lg Dictionary Block 
CM --- Character Memory 
.... t 
N-th CR,I 
Text Register Block 
CR = Character Register 
Figure 2: Fundamental Architecture 
.j 
Index Memory 
I 
il: IIm~ ~= 
\[in * 
I1: 
I1~ 
I1~ * 
I1: I 
1 2 
| • 
! 
! * 
"3(" "X'li. ...... l "X" 
• !, *Ii ...... ~, * ii 
li. 3 4 N-th 
Character Memory 
Figure 3: Relation between Character Memories 
and Index Memory 
2 
3 ~: 
4 J~ Shift Shift 
7, 
8 Ul I~1 L~ 
(a) (b) (c 
ggg gg 
(d) (e) 
Figure 4: Movement in Shift Register Block 
store the sub-string of input text, which can be 
shifted, as shown in Fig. 4. The index memory re- 
ceives a character from the 1st character register. 
Then, it outputs the top address and the number 
of morphemes in the dictionary, whose 1st char- 
acter corresponds to the input character. Because 
morphemes are arranged in the incremental order 
of their key string in the dictionary, the pair for the 
top address and the number expresses the address 
range in the dictionary. Figure 3 shows the rela- 
tion between the index memory and the character 
memories. For example, when the shift register 
block content is as shown in Fig. 4(a), where '~' 
is stored in the 1st character register, the index 
memory's output expresses the address range for 
the morpheme set {"~", "~", "~\]~", "~\]~ 
~\[~", "~\]~", ..., "~J"} in Fig. 3. 
The address generator sets the same address to 
all the character memories, and changes their ad- 
dresses simultaneously within the address range 
which the index memory expresses. Then, the dic- 
tionary block outputs an characters constructing 
one morpheme (key string with length N ) simul- 
taneously at one address. The comparators are 
N in number (i.e. 1st comparator, 2nd compara- 
,or, ..., N-th comparator). The n-th comparator 
compares the character in the n-th character reg- 
ister with the one from the  -th character mem- 
ory. When there is correspondence between the 
two characters, a match signal is output. In this 
comparison, the remainder symbol operates as a 
wild card. This means that the comparator also 
outputs a match signal when the ~-th character 
memory outputs the remainder symbol. Other- 
wise, it outputs a no match signal. 
The algorithm, implemented on the above de- 
scribed fundamental architecture, is as follows. 
• Main procedure 
Step 1: Load the top N characters from the 
input text into the character registers in 
the shift register block. 
309 
Step 2: While the text end mark has not ar- 
rived at the 1st character register, im- 
plement Procedure 1. 
• Procedure 1 
Step I: Obtain the address range for the 
morphemes in the dictionary, whose ist 
character corresponds to the character in 
the 1st character register. Then, set the 
top address for this range to the current 
address for the character memories. 
Step 2: While the current address is in this 
range, implement Procedure 2. 
Step 3: Accomplish a shift operation to the 
shift register block. 
• Procedure 2 
Step 1: Judge the result of the simultane- 
ous comparisons at the current address. 
When all the comparators output match 
signals, detection of one morpheme is in- 
dicated. When at least one comparator 
outputs the no match signal, there is no 
detection. 
Step 2: Increase the current address. 
For example, Fig. 4(a) shows the sub-string in 
the shift register block immediately after Step 
1 for Main procedure, when the input text is 
"~J~}~L~ bfc...". Step 3 for 
Procedure I causes such movement as (a)-*(b), 
(b)--*(c), (c)---*(d), (d)--*(e), and so on. Step 1 
and Step 2 for Procedure 1 are implemented in 
each state for (a), (b), (c), (d), (e), and so on. 
In state (a) for Fig. 4, the index memory's out- 
put expresses the address range for the morpheme 
set {"~", "~"~", "~'~", "~;", "~:~\]~", ..., 
"~J"} if the dictionary is as shown in Fig. 3. 
Then, Step 1 for Procedure 2 is repeated at 
each address for the morpheme set {"~:", "~", 
,,~f~f,,, ,,~:~,,, ,,~f,,, ..., ,,~,,}. 
Figure 5 shows two examples of Step 1 for Pro- 
cedure 2. In Fig. 5(a), the current address for 
the dictionary is at the morpheme "~". In 
Fig. 5(b), the address is at the morpheme "~$; 
\]~". In Fig. 5(a), all of the eight comparators 
output match signals as the result of the simul- 
taneous comparisons. This means that the mor- 
pheme "~" has been detected at the top po- 
sition of the sub-string "~~j~:~ ~ L". On 
the other hand, in Fig. 5(b), seven comparators 
output match signals, but one comparator, at 2nd 
position, outputs a no match slgual, due to the 
discord between the two characters, '~' and '~\[~'. 
This means that the morpheme "~\]~" hasn't 
been detected at this position. 
Key String Text Sub-string from Dictionary Block in Shift Register Block 
/Comparators ~ comParators\ 
2 2 ,.*X~ 2 
3 3 ~ 3 
4 .~C~ 4 ,,,(~. 4 
$ $ 
"~-~)~" is detected. "~" is NOT detected. 
(a) (b) 
0 shows match in a comparator. 
X shows no match in a comparator. 
Figure 5: Simultaneous Comparison in Fundamen- 
tal Architecture 
3.2 EXTENDED 
ARCHITECTURE 
The architecture described in the previous sec- 
tion treats one stream of text string. In this sec- 
tion, the architecture is extended to treat multi- 
ple text streams, and the algorithm for extract- 
ing morphemes from multiple text streams is pro- 
posed. 
Generally, in character recognition results or 
speech recognition results, there is a certain 
amount of ambignJty, in that a character or a syl- 
lable has multiple candidates. Such multiple can- 
didates form the multiple text streams. Figure 
6(a) shows an example of multiple text streams, 
expressed by a two dimensional matrix. One di- 
mension corresponds to the position in the text. 
The other dimension corresponds to the candi- 
date level. Candidates on the same level form one 
stream. For example, in Fig. 6(a), the character 
at the 3rd position has three candidates: the 1st 
candidate is '~', the 2nd one is '~' and the 3rd 
one is '\]~'. The 1st level stream is "~\]:~:.~...". 
The 2nd level stream is "~R...". The 3rd 
level stream is "~R ~... ". 
Figure 6(b) shows an example of the morphemes 
extracted from the multiple text streams shown in 
Fig. 6(a)..In the morpheme extraction process for 
the multiple text streams, the key strings in the 
morpheme dictionary are compared with the com- 
binations of various candidates. For example, "~ 
~", one of the extracted morphemes, is com- 
posed of the 2nd candidate at the 1st position, 
the 1st candidate at the 2nd position and the 3rd 
candidate at the 3rd position. The architecture 
described in the previous section can be easily ex- 
tended to treat multiple text streams. Figure 7 
310 
(a) Multiple Text Streams 
*-Position in Text--* 
1234 
Candidate Level 2 ;1~ ~ ~ 
~verb 
! 
.~ inoun 
\[\] inoun 
i~ I~ i noun (b) Extracted \[p) i suffix 
Morphemes \[~\]i .,~ !noun 
noun noun 
I verb 
~: i nou. 
• '~ iverb 
i ......... • 
Figure 6: Morpheme Extraction from Multiple 
Text Streams 
Address~. \] Index '1~ enerator  Memory 
..... 
..I , • "'1 I 
b\[ 1st CM ~'( comlpStrator}*~ 
li '1 I ======================= 
I! I , 2nd , 
I~';, I 2ndCM I'~(Comparator)' ~ 
......... ...... 
Shift Register 
._.....~ Block 
"':'."'11" ..... 
li; ......... I;: ..... !l N-th CM \[k.C~C°m;arat°r~ 2-N CR . 
......... ~ .... bl~¥E~i,;h-~:: D,cttonary Block 'g 1st Le~el 2ndlLevel M~h Level 
Stream St\[earn Stream CM 
= Character Memory m-n CR = m-th Level n-th Character Register 
Figure 7: Extended Architecture 
311 
shows the extended architecture. This extended 
architecture is different from the fundamental ar- 
chitecture, in regard to the following three points. 
First, there are M sets of character registers in 
the shift register block. Each set is composed of 
N character registers, which store and shift the 
sub-string for one text strearn. Here, M is the 
number of text streams. N has already been in- 
troduced in Section 3.1. The text streams move 
simultaneously in all the register sets. 
Second, the n-th comparator compares the char- 
a~'ter from the n-th character memory with the M 
characters at the n-th position in the shift regis- 
ter block. A match signal is output, when there 
is correspondence between the character from the 
memory and either of the M characters in the reg- 
isters. 
Third, a selector is a new component. It changes 
the index memory's input. It connects one of the 
registers at the 1st position to sequential index 
memory inputs in turn. This changeover occurs 
M times in one state of the shift register block. 
Regarding the algorithm described in Section 
3.1, the following modification enables treating 
multiple text streams. Procedure 1 and Pro- 
cedure 1.5, shown below, replace the previous 
Procedure 1. 
• Procedure 1 
Step 1: Set the highest stream to the current 
level. 
Step 2: While the current level has not ex- 
ceeded the lowest stream, implement 
Procedure 1.5. 
Step 3: Accomplish a shift operation to the 
shift register block. 
• Procedure 1.5 
Step 1: Obtain the address range for the 
morphemes in the dictionary, whose 1st 
character corresponds to the character in 
the register at the 1st position with the 
current level. Then, set the top address 
for this range to the current address for 
the character memories. 
Step 2: While the current address is in this 
range, implement Procedure 2. 
Step 3: Lower the current level. 
Figure 8 shows an example of Step 1 for Proce- 
dure 2. In this example, all of the eight compara- 
tors output the match signal as a result of simulta- 
neous comparisons, when the morpheme from the 
dictionary is "~:". Characters marked with 
a circle match the characters from the dictionary. 
This means that the morpheme "~:" has been 
detected. 
When each character has M candidates, the 
worst case time complexity for sequential mor- 
pheme extraction algorithms is O(MN). On 
the other hand, the above proposed algorithm 
(Fukushima's algorithm) has the advantage that 
the time complexity is O(M). 
Sub-Strings 
Key String for Multiple Text Streams 
from Dictionary Block in Shift Regoster Block 
Comparators ,,~ 
"o  l®l L 
4 ~ ,=*(~ i i 
! ! ! 
~. 1 2 3 
"~/i" is detected. 
Figure 8: Simultaneous Comparison in Extended 
Architecture 
,-- MEX-I 
PC-9801VX 
Hamaguchi's hardware algorithm (Ham~guchi, 
1988), proposed for speech recognition systems, is 
similax to Fukushima's algorithm. In Hamaguchi's 
algorithm, S bit memory space expresses a set of 
syllables, when there are S different kinds of syl- 
lables ( S = 101 in Japanese). The syllable candi- 
dates at the saxne position in input phonetic text 
are located in one S bit space. Therefore, H~n- 
aguchi's algorithm shows more advantages, as the 
full set size of syllables is sm~ller s~nd the num- 
ber of syllable candidates is larger. On the other 
ha~d, Fukushima's ~Igorithm is very suitable for 
text with a large character set, such as Japanese 
(more than 5,000 different chaxacters are com- 
puter re~able in Japanese). This algorithm ~Iso 
has the advantage of high speed text stream shift, 
compared with conventions/algorithms, including 
Hamaguchi's. 
4 A MORPHEME EX- 
TRACTION MACHINE 
4.1 A MACHINE OUTLINE 
This section describes a morpheme extraction 
machine, called MEX-I. It is specific hardware 
which realizes extended architecture and algo- 
rithm proposed in the previous section. 
It works as a 5ackend machine for NEC Per- 
sons/Computer PC-9801VX (CPU: 80286 or V30, 
clock: 8MHz or 10MHz). It receives Japanese text 
from the host persona/computer, m~d returns mor- 
phemes extracted from the text after a bit of time. 
312 
Figure 9: System Overall View 
Figure 9 shows an overall view of the system, in- 
cluding MEX-I and its host persona/ computer. 
MEX-Iis composed of 12 boards. Approximately 
80 memory IC chips (whose total memory storage 
capacity is approximately 2MB) and 500 logic IC 
chips are on the boards. 
The algorithm parameters in MEX-I axe as fol- 
low. The key length (the maximum morpheme 
length) in the dictionary is 8 (i.e. N = 8 ). 
The maximum number of text streams is 3 (i.e. 
M = 1, 2, 3). The dictionary includes approxi- 
mately 80,000 Japanese morphemes. This dictio- 
nary size is popular in Japanese word processors. 
The data length for the memories a~d the registers 
is 16 bits, corresponding to the character code in 
Japanese text. 
4.2 EVALUATION 
MEX-I works with 10MHz clock (i.e. the clock 
cycle is lOOns). Procedure 2, described in Sec- 
tion 3.1, including the simultaneous comparisons, 
is implemented for three clock cycles (i.e. 300ns). 
Then, the entire implementation time for mor- 
pheme extraction approximates A x D x L x M x 
300n8. Here, D is the number of all morphemes in 
the dictionary, L is the length of input text, M is 
the number of text streams, and A is the index- 
ing coef~dent. This coei~cient means the aver- 
age rate for the number of compared morphemes, 
compared to the number of all morphemes in the 
dictionary. 
31ementation Time \[sec\] Im A=O.O05 
6 • Newspapers .," l i 
r o • Technical Reports / 
5 • Novels ,'" 
,," • A=0.003 
o" 
4 / • 
• •" so 
3 / • 
• s~ ao ~° 
2 /• .I A=0.001 
j/ o. 
• so ° • ......--'''''" 
1 o ° o o ._..-'" 
ss o• ~ ...---" 
.... I'" I I 1 I I ) 
O 10,000 20,000 30,000 40,000 50,000 60,000 
Number of Candidates in Text Streams (=LXM) 
Figure 10: Implementation Time Measurement 
Results 
The implementation time measurement results, 
obtained for various kinds of Japanese text, are 
plotted in Fig. 10. The horizontal scale in Fig. 10 
is the L x M value, which corresponds to the num- 
ber of characters in all the text streams. The ver- 
tical scale is the measured implementation time. 
The above mentioned 80,000 morpheme dictio- 
nary was used in this measurement. These re- 
sults show performance wherein MEX-I can ex- 
tract morphemes from 10,000 character Japanese 
text by searching an 80,000 morpheme dictionary 
in 1 second. 
Figure 11 shows implementation time compari- 
son with four conventional sequential algorithms. 
The conventional algorithms were carried out on 
NEC Personal Computer PC-98XL 2 (CPU: 80386, 
clock: 16MHz). Then, the 80,000 morpheme dic- 
tionary was on a memory board. Implementation 
time was measured for four diferent Japanese text 
samplings. Each of them forms one text stream, 
which includes 5,000 characters. In these measure- 
ment results, MEX-I runs approximately 1,000 
times as fast as the morpheme extraction pro- 
gram, using the simple binary search algorithm. 
It runs approximately 100 times as fast as a pro- 
gram using the digital search algorithm, which has 
the highest speed among the four algorithms. 
Morpheme Extraction Methods Text1 Text2 Text3 Text4 
Programs Based on Sequential Algorithms \[sec\] 
• Binary Search Method (Knuth, 197S) 564 642 615 673 
• Binary Search Method 133 153 147 155 
Checking Top Character Index 
• Ordered Hash Method (~e. 1074) 406 440 435 416 
• Digital Search Method (Knuth, 1973) 52 56 54 54 
with Tree Structure Index 
MEX-I 0.56 0.50 0.51 0.44 
Figure lh Implementation Time Comparison for 
5,000 Character Japanese Text 
toward achieving natural language parsing accel- 
erators, which is a new approach to speeding up 
the parsing. 
The implementation time measurement results 
show performance wherein MEX-I can extract 
morphemes from 10,000 character Japanese text 
by searching an 80,000 morpheme dictionary in 1 
second. When input is one stream of text, it runs 
100-1,000 times faster than morpheme extraction 
programs on personal computers. 
It can treat multiple text streams, which are 
composed of character candidates, as well as one 
stream of text. The proposed algorithm is imple- 
mented on it in linear time for the number of can- 
didates, while conventional sequential algorithms 
are implemented in combinational time. This is 
advantageous for character recognition or speech 
recognition. 
Its architecture is so simple that the authors be- 
lieve it is suitable for VLSI implementation. Ac- 
tually, its VLSI implementation is in progress. A 
high speed morpheme extraction VLSI will im- 
prove the performance of such text processing ap- 
plications in practical use as Kana-to-Kanji con- 
version Japanese text input methods and spelling 
checkers on word processors, machine translation, 
automatic indexing for text database, text-to- 
speech conversion, and so on, because the mor- 
pheme extraction process is necessary for these 
applications. 
The development of various kinds of accelera- 
tor hardware for the other processes in parsing 
is work for the future. The authors believe that 
the hardware approach not only improves conven- 
tional parsing methods, but also enables new pars- 
ing methods to be designed. 
5 CONCLUSION 
This paper proposes a new hardware algorithm 
for high speed morpheme extraction, and also de- 
scribes its implementation on a specific machine. 
This machine, MEX.I, is designed as the first step 
313 
REFERENCES 
Abe, M., Ooskima, Y., Yuura~ K. mad Takeichl, 
N. (1986). "A Kana-Kanji Translation System for Non-segmented Input Sentences Based on Syntac- 
tic and Semantic Analysis", Proc. 11th Interna- tional Conference on Computational Linguistics: 
280-285. Amble, O. and Knuth, D. E. (1974). "Ordered 
Hash Tables", The Computer Journal, 17(~): 135-142. 
Bear, J. (1986). "A Morphological r.e, eognizer with Syntactic and Phonological Rules, 
Proe. llth International Conference on Computational 
Linguistics: 272-276. Chisvin, L. and Duckworth, R. J. (1989). 
"Content-Addressable and Associative Memory: Alternatives to the Ubiquitous RAM", 
Computer. 51-64. 
Fukushlma, T., Kikuchi, Y., Ohya~a~ Y. and Miy~i, H. (1989a). "A Study of the Morpheme 
Extraction Methods with Multi-matching Tech- nique" (in Japanese), 
Proc. 3gth National Conven- 
tion of Information Processing Society of Japan: 591-592. 
Fukuskima, T., Ohyam% Y. and Miy~i, H. (1989b). "Natural Language Parsing Accelera- 
tors (1): An Experimental Machine for Morpheme Extraction" (in Japanese), 
Proc. 3gth National Convention o.f Inlormation Processing Society oJ 
Japan: 600--601. Fukushima, T., Ohyama, Y. and Miy~i, H. 
1990a). "Natural Language Parsing Accelerators I): An Experimental Machine for Morpheme Ex- 
traction" (in Japanese), SIC, Reports of Informa- tion Processing Society of Japan, NL75(9). 
Fukushima, T. (19901)). "A Parallel Recogni- tion Algorithm of Context-free Language by Ar- 
ray Processors"(in Japanese), Proc. 40t1~ National 
Convention oJ Information Processing Society of 
Japan: 462-463. Haas, A. (1987). "Parallel Parsing for Unifi- 
cation Grammar", Proc. l Oth International Joint 
Conference on Artificial Intelligence: 615-618. Hamaguehl, 
S. mad Suzuki, Y. (1988). "Haxdwaxe-matchlng Algorithm for High Speed Linguistic Processing in 
Continuous Speech-recognitlon Systems", $~stems 
and Computers in Japan, 19(_7~. 72-81. Knuth, D. E. (1973). 
Sorting and Search- ing, The Art of Computer Programming, Vol.3. 
Addlson-Wesley. Koskenniemi, K. (1983). "Two-level Model for 
Morphological Analysis", Proe. 8th International Joint Conference on Artificial Intelligence: 683-- 
685. 
Matsumoto, Y. (1986). "A Parallel Parsing Sys- tem for Natural Language Analysis", 
Proc. 3rd International Conference of Logic Programming, 
Lecture Notes in Computer Science: 396-409. Miyazakl, M., Goto, S., Ooyaxna, Y. and ShiraJ, 
S. (1983). "Linguistic Processing in a Japanese- text-to-speech-system", 
International Conference on Text Processing with a Large Character Set: 
315-320. Nak~mura, O., Tanaka, A. and Kikuchi, H. 
(1988). "High-Speed Processing Method for the 
314 
Morpheme Extraction Algorithm" (in Japanese), Proc. 
37th National Convention oJ Information 
Processing Society of Japan: 1002-1003. Ohyama, Y., Fukushim~, T., Shutoh, 2". and 
Shutoh, M. (1986). "A Sentence Analysis Method for a Japanese Book Reading Machine for the 
Blind", Proc. ~4th Annual Meeting of Association 
for Computational Linguistics: 165--172. Russell, G. J., Ritchie, G. D., Pulmaa, S. G. and 
Black, A. W. (1986). "A Dictionary and Morpho- logical 
Analyser for English", Proc. llth Interna- 
tional Conference on Computational Linguistics: 
277-279. Rytter, W. (1987). "Parallel Time O(log n) 
Recognition of Unambiguous Context-free Lan- 
guages", Information and Computation, 75: 75-- 
86. Yamad~, H., Hirata, M., Nag~i, H. and Tal~- 
h~hi, K. (1987). "A High-speed String-search En- gine", 
IEEE Journal of Solid-state Circuits, SC- ~(5): 829-834. 
