File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/a94-1017_intro.xml
Size: 17,655 bytes
Last Modified: 2025-10-06 14:05:34
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1017"> <Title>Real-Time Spoken Language Translation Using Associative Processors</Title> <Section position="4" start_page="0" end_page="105" type="intro"> <SectionTitle> 2 TDMT and its Cost Analysis 2.1 Outline of TDMT </SectionTitle> <Paragraph position="0"> In TDMT, transfer knowledge is the primary knowledge, which is described by an example-based framework (Nagao, 1984). A piece of transfer knowledge describes the correspondence between source language expressions (SEs) and target language expressions (TEs) as follows, to preserve the translational equivalence:</Paragraph> <Paragraph position="2"> Eij indicates the j-th example of TEi. For example, the transfer knowledge for source expression &quot;X no Y&quot; is described as follows~:</Paragraph> <Paragraph position="4"> TDMT utilizes the semantic distance calculation proposed by Sumita and Iida (Sumita and Iida, 1992). Let us suppose that an input, I, and each example, Eij, consist oft words as follows:</Paragraph> <Paragraph position="6"> Then, the distance between I and Eij is calculated as follows:</Paragraph> <Paragraph position="8"> The semantic distance d(Ik, Eijk) between words is reduced to the distance between concepts in a thesaurus (see subsection 3.2 for details). The weight Wk is the degree to which the word influences the selection of the translation 3.</Paragraph> <Paragraph position="9"> The flow of selecting the most plausible TE is as follows: (1) The distance from the input is calculated for all examples.</Paragraph> <Paragraph position="10"> (2) The example with the minimum distance from the input is chosen.</Paragraph> <Paragraph position="11"> (3) The corresponding TE of the chosen example is extracted.</Paragraph> <Paragraph position="12"> Processes (1) and (2) are called ER (Example-Retrieval) hereafter.</Paragraph> <Paragraph position="13"> Now, we can explain the top-level TDMT algorithm: null (a) Apply the transfer knowledge to an input sentence and produce possible source structures in which SEs of the transfer knowledge are combined. null (b) Transfer all SEs of the source structures to the most appropriate TEs by the processes (1)-(3) above, to produce the target structures. (c) Select the most appropriate target structure from among all target structures on the basis of the total semantic distance.</Paragraph> <Paragraph position="14"> For example, the source structure of the following Japanese sentence is represented by a combination of SEs with forms such as (X no Y), (X ni Y), (X de Y), (X ga Y) and so on: dainihan no annaisyo n| { second version, particle, announcement, particle, kaigi de happyou-sareru ronbun conference, particle, be presented, paper, no daimoku ga notte-orimasu particle, title, particle, be written }</Paragraph> <Section position="1" start_page="101" end_page="102" type="sub_section"> <SectionTitle> 2.2 The Analysis of Computational Cost </SectionTitle> <Paragraph position="0"> Here, we briefly investigate the TDMT processing time on sequential machines.</Paragraph> <Paragraph position="1"> For 746 test sentences (average sentence length: about 10 words) comprising representative Japanese sentences 4 in a conference registration task, the average translation time per sentence is about 3.53 seconds in the TDMT prototype on a sequential machine (SPARCstation2). ER is embedded as a subroutine call and is called many times during the translation of one sentence. The average number of ER calls per sentence is about 9.5. Figure 1 shows rates for the ER time and other processing time. The longer the total processing time, the higher the rate for the ER time; the rate rises from about 43% to about 85%. The average rate is 71%. Thus, ER is the most dominant part of the total processing time. In the ATR dialogue database (Ehara et al., 1990), which contains about 13,000 sentences for a conference registration task, the average sentence length is about 14 words. We therefore assume in the remainder of this subsection and subsection 3.5 that the average sentence length of a Japanese spoken sentence is 14 words, and use statistics for 14-word sentences when calculating the times of a large-vocabulary TDMT system. The expected translation time of each 14-word sentence is about 5.95 seconds, which is much larger than the utterance time. The expected number of ER calls for each 14-word sentence is about 15. The expected time and rate for ER of the 14-word sentence are about 4.32 seconds and about 73%, respectively.</Paragraph> <Paragraph position="2"> Here, we will consider whether a large-vocabulary TDMT system can attain a real-time response.</Paragraph> <Paragraph position="3"> In the TDMT prototype, the vocabulary size and the number of examples, N, are about 1,500 and 12,500, respectively. N depends on the vocabulary size. The vocabulary size of the average commercially-available machine translation system is about 100,000. Thus, in the large-vocabulary sys4We have 825 test sentences as described in footnote 1 in section 1. These sentences cover basic expressions that are used in Japanese ability tests conducted by the government and Japanese education courses used by many schools for foreigners (Uratani et al., 1992). The sentences were reviewed by Japanese linguists. In the experiments in this paper, we used 746 sentences excluding sentences translated by exact-match.</Paragraph> <Paragraph position="4"> tern, N is about 830,000 (-~ 12,500 x 100,000/1,500) in direct proportion to the vocabulary size. For the sake of convenience, we assume N = 1,000,000.</Paragraph> <Paragraph position="5"> The ER time is nearly proportional to N due to process (1) described in subsection 2.1. Therefore, the expected translation time of a 14-word sentence in the large-vocabulary system using a SPARCstation2 (28.5 MIPS) is about 347.2 (=\[ER time\]+\[other processing timeS\]=\[4.32 x 1,000,000/12,500\]+\[5.95 4.32\]=345.6+1.63) seconds. ER consumes 99.5% of the translation time.</Paragraph> <Paragraph position="6"> A 4,000 MIPS sequential machine will be available in 10 years, since MIPS is increasing at a rate of about 35 % per year; we already have a 200 MIPS machine (i.e. DEC alpha/7000). The translation time of the large-vocabulary system with the 4,000 MIPS machine is expected to be about 2.474 (~_ 347.2 x 28.5/4,000) seconds. Of the time, 2.462 (_~ 345.6 x 28.5/4,000) seconds will be for ER. Therefore, although the 1500-word TDMT prototype will run quickly on the 4,000 MIPS machine, sequential implementation will not be scalable, in other words, the translation time will still be insufficient for real-time application. Therefore, we have decided to utilize the parallelism of associative processors.</Paragraph> <Paragraph position="7"> Careful analysis of the computational cost in the sequential TDMT prototype has revealed that the ER for the top 10 SEs (source language expressions) accounts for nearly 96% of the entire ER time. The expected number of ER calls for the top 10 SEs of each 14-word sentence is about 6. Table 1 shows rates of the ER time against each SE in the transfer knowledge. Function words, such as &quot;wa&quot;, &quot;no&quot;, &quot;o&quot;, &quot;ni&quot; and &quot;ga&quot;, in the SEs are often used in Japanese sentences. They are polysemous, thus, their translations are complicated. For that reason, the number of examples associated with these SEs is very large. In sum, the computational cost of retrieving examples including function words is proportional to the square of the frequency of the function words. In an English-to-Japanese version of TDMT, the number of examples associated with the SEs, which include function words such as &quot;by&quot;, &quot;to&quot; and &quot;of&quot;, is very large as well. With this rationale, we decided to parallelize ER for the top 10 SEs of the Japanese-to-English transfer knowledge.</Paragraph> </Section> <Section position="2" start_page="102" end_page="102" type="sub_section"> <SectionTitle> 3.1 ER on Associative Processors (APs) </SectionTitle> <Paragraph position="0"> As described in the previous subsection, parallelizing ER is inevitable but promising. Preliminary experiments of ER on a massively parallel associative processor IXM2 (Higuchi et al., 1991a; Higuchi et al., 1991b) have been successful (Sumita et al., 1993).</Paragraph> <Paragraph position="1"> The IXM2 is the first massively parallel associative processor that clearly demonstrates the computing power of a large Associative Memory (AM). The AM not only features storage operations but also logical operations such as retrieving by content. Parallel search and parallel write are particularly important operations. The IXM2 consists of associative processors (APs) and communication processors. Each AP has an AM of 4K words of 40 bits, plus an IMS T801 Transputer (25 Mttz).</Paragraph> </Section> <Section position="3" start_page="102" end_page="103" type="sub_section"> <SectionTitle> 3.2 Semantic Distance Calculation on APs </SectionTitle> <Paragraph position="0"> As described in subsection 2.1, the semantic distance between words is reduced to the distance between concepts in a thesaurus. The distance between concepts is determined according to their positions in the thesaurus hierarchy. The distance varies from 0 to 1. When the thesaurus is (n + 1) layered, (k/n) is connected to the classes in the k-th layer from the bottom (0 _< k _~ n). In Figure 2, n is 3, k is from 0 to 3, and the distance d is 0/3 (--0), 1/3, 2/3 and 3/3 (=1) from the bottom.</Paragraph> <Paragraph position="1"> The semantic distance is calculated based on the thesaurus code, which clearly represents the thesaurus hierarchy, as in Table 2, instead of traversing the hierarchy. Our n is 3 and the width of each layer is 10. Thus, each word is assigned a three-digit decimal code of the concept to which the word corresponds.</Paragraph> <Paragraph position="2"> Here, we briefly introduce the semantic distance calculation on an AM (Associative Memory) referring to Figure 3. The input data is 344 which is the The input code and example code are CI =</Paragraph> <Paragraph position="4"> thesaurus code of the word &quot;uchiawase\[meeting\]&quot;.</Paragraph> <Paragraph position="5"> Each code (316, 344) of the examples such as &quot;teisha\[stopping\]&quot;, &quot;kaigi\[conference\]&quot;, and so on is stored in each word of the AM. The algorithm for searching for examples whose distance from the input is 0, is as follows6: (I) Give a command that searches for the words whose three-digit code matches the input. (The search is performed on all words simultaneously and matched words are marked.) (II) Get the addresses of the matched words one by one and add the distance, 0, to the variable that corresponds to each address.</Paragraph> <Paragraph position="6"> The search in process (I) is done only by the AM and causes the acceleration of ER. Process (II) is done by a transputer and is a sequential process.</Paragraph> </Section> <Section position="4" start_page="103" end_page="103" type="sub_section"> <SectionTitle> 3.3 Configuration of TDMT Using APs </SectionTitle> <Paragraph position="0"> According to the performance analysis in subsection 2.2, we have implemented the ER of the top 10 SEs.</Paragraph> <Paragraph position="1"> Figure 4 shows a TDMT configuration using APs in which the ER of the top 10 SEs are implemented. The 10 APs (AP1,AP2,&quot;.,AP10) and the transputer (TP) directly connected to the host machine (SPARCstation2) are connected in a tree configuration 7.</Paragraph> <Paragraph position="2"> SAn algorithm that searches for examples whose distance from the input is 1/3, 2/3 or 3/3, is similar. 7The tree is 3-array because the transputer has four connectors. The TDMT main program is described with Lisp language and is executed on the host machine. The ER routine is programmed with Occam2 language, which is called by the main program and runs on the TP and The algorithm for ER in the TDMT using APs is as follows: (i) Get input data and send the input data from the host to TP.</Paragraph> <Paragraph position="3"> (ii) Distribute the input data to all APs.</Paragraph> <Paragraph position="4"> (iii) Each AP carries out ER, and gets the minimum distance and the example number whose distance is minimum.</Paragraph> <Paragraph position="5"> (iv) Each AP and the TP receive the data from the lower APs (if they exist), merge them and their own result, and send the merged result upward. With the configuration shown in Figure 4, we studied two different methods of storing examples. The two methods of storing examples are as follows: Homo-loading (HM) Examples associated with one SE are stored in one AP. That is, each AP is loaded with examples of the same SE.</Paragraph> <Paragraph position="6"> Hetero-loading (HT) Examples associated with one SE are divided equally and stored in 10 APs. That is, each AP is loaded with examples of 10 different SEs.</Paragraph> </Section> <Section position="5" start_page="103" end_page="104" type="sub_section"> <SectionTitle> 3.4 Experimental Results </SectionTitle> <Paragraph position="0"> Figure 5 plots the speedup of ER for TDMT using APs over sequential TDMT, with the two methods.</Paragraph> <Paragraph position="1"> It can be seen that the speedup for the HT method is greater than that for the HM method, partly because the sequential part of ER is proportional to the example number in question. With the HT method, the average speedup is about 16.4 (=\[the average time per sentence in the sequential TDMT\]/\[the average time per sentence in the HT method\]~ 2489.7/152.2(msec.)). For the 14-word sentences, the average speedup is about 20.8 (2 4324.7/208.0(msec.)) and the ER time for the top 10 SEs is about 85.4 milliseconds out of the total 208.0 milliseconds.</Paragraph> <Paragraph position="2"> Figure 6 shows a screen giving a comparison between TDMT using APs and sequential TDMT.</Paragraph> </Section> <Section position="6" start_page="104" end_page="105" type="sub_section"> <SectionTitle> 3.5 Sealability </SectionTitle> <Paragraph position="0"> In this subsection, we consider the scalability of TDMT using APs in the HT method. Here, we will estimate the ER time using 1,000,000 examples which are necessary for a large-vocabulary TDMT system (see subsection 2.2).</Paragraph> <Paragraph position="1"> Assuming that the number of examples in each AP is the same as that in the experiment, 800 (= 1,000,000/12,500) APs are needed to store 1,000,000 examples. Figure 7 shows 800 APs in a tree structure (~L=I 3 ~ _> 800; L(minimum)=6 layers). In the remainder of this subsection, we will use the statistics (time, etc.) for the 14-word s sentences.</Paragraph> <Paragraph position="2"> The translation time is divided into the ER time on APs and the processing time on the host machine.</Paragraph> <Paragraph position="3"> The former is divided into the computing time on each AP and the communication time between APs.</Paragraph> <Paragraph position="4"> The ER time on APs in the experiment is about 85.4 milliseconds as described in subsection 3.4. The computing time per sentence on each AP is the same as that in the experiment and is approximately 84.1 milliseconds out of the 85.4 milliseconds. The communication time between APs is vital and increases SThis is the average sentence length in the ATR dialogue database. See subsection 2.2.</Paragraph> <Paragraph position="6"> as the number of APs increases. There are two kinds of communication processes: distribution of input data 9 and collection of the resulting data of ER 1deg.</Paragraph> <Paragraph position="7"> The input data distribution time is the sum of distribution times TP--* AP1, AP1 --* AP2, ..', AP4--~AP5 and APs--*AP6, that is, 6 multiplied by the distribution time between two APs that are directly connected (see Figure 7), because a transputer can send the data to the other transputers directly connected in parallel (e.g., APs--*APs, AP~AP7, APs--+APs). The average number of ER calls is about 6 and the average distribution time between directly-connected APs is about 0.05 milliseconds.</Paragraph> <Paragraph position="8"> Therefore, the total input data distribution time per sentence in the configuration of Figure 7 is nearly 1.8 (= 0.05 x 6 x 6) milliseconds.</Paragraph> <Paragraph position="9"> The time required to collect the resulting data is the sum of the processing times in process (iv), which is explained in subsection 3.3, at the TP, APt, &quot;-.,AP4 and APs, illustrated in Figure 7. It takes about 0.04 milliseconds, on average, for each AP to receive the resulting data from the lower APs and it takes about 0.02 milliseconds, on average, for the AP to merge the minimum distance and the example numbers. Therefore, it is expected that the total collection time is about 2.2</Paragraph> <Paragraph position="11"> Thus, the total communication time is about 4.0</Paragraph> <Paragraph position="13"> cessing time on APs is about 88.1 (= 84.1 +4.0) milliseconds. This is 3,920 (2 345.6/0.0881) times faster than the SPARCstation2 n. It is clear then that the communication has little impact on the scalability because it is controlled by the tree depth and small coefficient.</Paragraph> <Paragraph position="14"> Therefore, the TDMT using APs becomes more scalable as the number of examples increases and can attain a real-time response.</Paragraph> </Section> </Section> class="xml-element"></Paper>