File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2054_metho.xml

Size: 19,650 bytes

Last Modified: 2025-10-06 14:12:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-2054">
  <Title>A POLYNOMIAL--ORDER ALGORITHM FOR OPTIMAL PHRASE SEQUENCE SELECTION FROM A PHRASE LATTICE AND ITS PARALLEL LAYERED IMPLEMENTATION</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper deals with a problem of selecting an optimal phrase sequence from a phrase lattice, which is often encountered in language processing such as word processing and post-processing for speech recognition.</Paragraph>
    <Paragraph position="1"> The problem is formulated as one of combinatorial optimization, and a polynomial order algorithm is derived. This algorithm finds an optimal phrase sequence and its dependency structure simultaneously, and is therefore particularly suited for an interface between speech recognition and various language processing. What the algorithm does is numerical optimization rather than symbolic operation unlike conventional parsers. A parallel and layered structure to implement the algorithm is also presented, Although the language taken up here is Japanese, the algorithm can be extended to cover a wider :family of languages.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In Japanese language processing related to speech recognition and word processing, we often encounter a problem of selecting a phrase :sequence which constitutes the most acceptable sentence from a phrase lattice, that is, a set of phrases with various starting and ending positions, By solving this problem, linguistic ambiguities and/or uncertainties coming from the inaccuracy in speech :recognition are expected to be resolved. null This problem can be solved, in principle, by enumerating all the possible combinations of the phrases and measuring the syntactic and semantic acceptability of each phrase sequence as a sentence. Obviously, however, the amount of computation in this enumerative method grows exponentially with respect to the length of the sequence and becomes intractable even for a moderate problem size.</Paragraph>
    <Paragraph position="1"> In this paper we formulate this task as a combinatorial optimization problem and derive a set of recurrence equations, which leads to an algorithm of polynomial order in time and space. We utilize the idea of dependency grammar \[Hays 64\] for defining the acceptability of a phrase sequence as a Japanese sentence.</Paragraph>
    <Paragraph position="2"> With a review of recent theoretical development on this topic, a parallel and layered implementation of the algorithm is presented. null</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Dependency Structure of Japanese
</SectionTitle>
    <Paragraph position="0"> In Japanese, words and morphemes are concatenated to form a linguistic unit called 'bnnsetsu', which is referred to as simply 'phrase' here. h typical phrase consists of a content word followed by some functional morphemes, h Japanese sentence is a sequence of phrases with a structure which can be described by a diagram as in Fig. 1 \[Hashimoto 463. For a sequence of phrases XlXZ...x n to be a well-formed Japanese sentence, it must have a structure satisfying the following constraints \[Yoshida 72\]: (el) For any i (l&lt;i&lt;n-1), there exists unique j (i&lt;j&lt;n) such that x i modifies xj in a wide sense.</Paragraph>
    <Paragraph position="1"> (c2) For any i,j,k,1 (l&lt;i&lt;j&lt;k&lt;l&lt;n), it never occurs that x i modifies x k and xj modifies x I.</Paragraph>
    <Paragraph position="2"> A structure satisfying these constraints is called a dependency structure here. Mere formally we define a dependency structure as follows \[Ozeki 86a\], Definition 1  (1) If x 0 is a phrase, then &lt;x0&gt; is a dependency structure, (2) If X 1 ..... X n are dependency structures  and x 0 is a phrase, then &lt;Xl...X n x0&gt; is a dependency structure.</Paragraph>
    <Paragraph position="3"> A dependency structure &lt;XI...X n x0&gt; (Xi=&lt;...xi&gt;) implies that each x i, which is the last phrase in X i, modifies x 0, It is easily verified that a structure satisfying the constraints (el) and (c2) is a dependency structure in the sense of Definition 1 and vice versa \[Ozeki 86a3.</Paragraph>
    <Paragraph position="4"> When a dependency structure X is composed of phrases Xl,X 2 ..... x n we say that X is a dependency structure on XlX2...x n. The set of all the dependency structures on XlX2...x n is denoted as K(XlX2...Xn): and for a sequence of phrase sets A1,A 2 ..... A n , we define KB(A 1 ,A 2 ..... A n) ={X\[XeK(XlX2...Xn), xieh i (l&lt;i&lt;n)}.</Paragraph>
    <Paragraph position="5"> Fig.1 Example of dependency structure in Japanese. A,B .... are phrases.</Paragraph>
    <Paragraph position="6">  3. Acceptability of a Dependency Structure  For a pair of phrases x 1 and x 0' we can think of a penalty imposed on a modifier-modificant relation between x 1 and x 0. This non-negative value is denoted as pen(xl;x0). The smaller value of pen(xl;x 0) represents the more natural linguistic relation. Although it is very important to establish a way of computing pen(xl;x0), we will not go into that problem in this paper. Based on the 'local' penalty, a 'global' penalty P(X) of a dependency structure X is defined recursively as follows \[0zeki 86a\].</Paragraph>
    <Paragraph position="7"> Definition 2  (1) For X=&lt;x&gt;, P(X)=O.</Paragraph>
    <Paragraph position="8"> (2) For X=&lt;Xl...X n xo&gt;, where Xi=&lt;...xi&gt; (I&lt;i&lt;n) is a dependency structure,</Paragraph>
    <Paragraph position="10"> +pen(xl;xo)+.../pen(xn;XO).</Paragraph>
    <Paragraph position="11"> Note that P(X) is the sum of the penalty of all the phrase pairs which are supposed to be in modifier-modificant relation in the dependency structure X. This function is invariant under permutation of X 1 ..... X n in accordance with the characteristic of Japanese. null 4. Formulation of the Problem For simplicity, let us begin with a special type of phrase lattice composed of a sequence of phrase sets BI,B 2 ..... B N as shown in Fig.2, which we call phrase matrix. Suppose we are given a phrase matrix and a reliability function s: BIUB2U...UB N --&gt; R+, where R+ denotes the set of non-negative real numbers. The smaller value of s(x) represents the higher reliability of x. We encounter this special type of phrase lattice in isolated phrase speech recognition. In that case B i is the set of output candidates for the ith utterance, and s(x) is the recognition score for a candidate phrase x. For a dependency structure X on a phrase sequence XlX2...x N, the total reliability of X is defined as S(X)= S(Xl)+...+S(XN).</Paragraph>
    <Paragraph position="12"> Combining the acceptability and the reliability, we define an objective function</Paragraph>
    <Paragraph position="14"> Fig.2 Phrase matrix. B 1 ..... B N are phrase sets.</Paragraph>
    <Paragraph position="15"> Then the central problem here is formulated as the following combinatorial optimization problem \[Matsunaga 86, 0zeki 86a\]. Problem Find a dependency structure XeKB(B1,B 2 ..... B N) which minimizes the objective function F(X). By solving this problem, we can obtain the optimal phrase sequence and the optimal dependency structure on the sequence simultaneously. null</Paragraph>
    <Paragraph position="17"> where C denotes combination. This oecomes a huge number even for a moderate problem size, rendering an enumerative method prac- null tically impossible.</Paragraph>
    <Paragraph position="18"> 5. Recurrence equations and a resulting  algorithm Combining two dependency structures X and Y=&lt;YI ..... Ym,Y&gt;, a new dependency structure &lt;X,Y 1 ..... Ym,y&gt; is obtained which is denoted as X O V. Conversely, any dependency struc null ture Z with length greater than 1 can be decomposed as Z= X@ Y, where X is the top dependency structure in Z. Moreover, it is easily verified from the definition of the objective function that</Paragraph>
    <Paragraph position="20"> where x and y are the last phrases in X and Y, respectively. The following argument is based on this fact.</Paragraph>
    <Paragraph position="21"> We denote elements in B i as Xjl,Xi2 ..... For l&lt;i&lt;j&lt;N and l&lt;p&lt;lBj\[,'where \[Bj\['denotes the number of elements in Bj, we define</Paragraph>
    <Paragraph position="23"> Then the following recurrence equations  hold for opt(i,j;p) and opts(i,j;p), respectively \[Ozeki 86a\].</Paragraph>
    <Paragraph position="24"> Proposition 1 For l&lt;i~jJN and I~p&lt;\[Bj\[ (1) if i=j, then opt(i,j;p)=s(Xjp), (2) and if i&lt;j, then</Paragraph>
    <Paragraph position="26"> (1) if i=j, then opts(i,j;p)=&lt;Xjp&gt;, (2) and if i&lt;j, then</Paragraph>
    <Paragraph position="28"> where *k is the best segmentation point and *q is the best phrase number in Bgk: (*k,*q)=argmin{f(k,q)\[i~k&lt;j-l,l&lt;q~\[Bk\[}.</Paragraph>
    <Paragraph position="29"> According to Proposition 1, if the values of opt(i,k;q) and opt(k/l,j;p) are known for l~k&lt;j-1 and l&lt;q&lt;\[Bk\[, it is possible to calculate the value of opt(i,j:p) by searching the best segmentation point and the best phrase number at the segmentation point.</Paragraph>
    <Paragraph position="30"> This fact enables us to calculate the value  of opt(1,N'p) recursively, starting with opt(i,i;q) (lJi&lt;N,lJqJlBiI). This is the principle of dynamic programming \[Bellman 57\].</Paragraph>
    <Paragraph position="31"> Let *p= argmin{opt(1,N'p) ll&lt;p&lt;lBN\[}, then we have the final solution</Paragraph>
    <Paragraph position="33"> The opts(1,N'*p) can be calculated recursively using Proposition 2. Fig.3 illustrates an algorithm translated from these recurrence equations \[Ozeki 86a\]. This algorithm uses two tables, tablel and table2, of upper triangular matrix form as shown in Fig.4. The (i,j) element of the matrix has \[Bil 'pigeon-holes'. The value of opt(i,j;p) ts &amp;quot; stored in tablel and the pair of the best segmentation point and the best phrase number is stored in tableZ. It should be noted that there is much freedom in the order of scanning i,j and p, which will be utilized when we discuss a parallel implementation of the algorithm.</Paragraph>
    <Paragraph position="35"> to fill tablel is O(M2N3).</Paragraph>
    <Paragraph position="36"> These recurrence equations and algorithm can be easily extended so that they can handle a general phrase lattice. A Phrase lattice is a set of phrase sets, which looks like Fig.5. B(i,j) denotes the set of phrases beginning at character position i and ending at j. A phrase lattice is oh-rained, for example, as the output of a continuous speech recognition system, and also as the result of a morphological analysis of non-segmented Japanese text spelled in kana characters only. We denote the elements of B(~j~ as Xijl,Xij 2 ..... and in parallel wi be definition of opt and opts, we define opt' and opts' as follows.</Paragraph>
    <Paragraph position="37"> For l&lt;i&lt;m&lt;j(N and Xmj p, opt'(i,j,m;p) =the minimum value of \[P(X)iS(X)\] as X runs over all the dependency structures on all the possible phrase sequences beginning at i and ending at j with the last phrase being fixed as Xmj p, and opts'(i,j,m;p) =the dependency structure which gives the above minimum.</Paragraph>
    <Paragraph position="38"> Then recurrence equations similar to Proposition 1 and Proposition 1' hold for</Paragraph>
    <Paragraph position="40"> (2) and if i&lt;m, then</Paragraph>
    <Paragraph position="42"> (2) and if i&lt;m, then</Paragraph>
    <Paragraph position="44"> where *k is the best segmentation point, *n is the top position of the best phrase at the segmentation point and *q is the best phrase number in B(*n,*k): (~k,$n,*q) =argmin{f(k,n,q) li&lt;n&lt;k&lt;m-l,lJqJIB(n,k)\[}. The minimum is searched on 3 variables in this case. It is a straight forward matter to translate these recurrence equations into an algorithm similar to Fig.3 \[Ozeki 88b, Kohda 86\]. In this case, the order of amount of computation is O(M2NS), where M=IB(i,j)I and N is the number of starting and ending positions of phrases in the top layer</Paragraph>
    <Paragraph position="46"> Also, we can modify the algorithm in such a way that up to kth optimal solutions are obtained.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6. Parallel and Layered Implementation
</SectionTitle>
    <Paragraph position="0"> When only one processor is available, the amount of computation dominates the processing time. On the other hand, when there is no limit as to the number of processors, the processing time depends on how much of the computation can be executed in parallel.</Paragraph>
    <Paragraph position="1"> There exists a tidy parallel and layered structure to implement the above algorithm.</Paragraph>
    <Paragraph position="2"> For simplicity, let us confine ourselves to a phrase matrix case here. Furthermore, let us first consider the case where there is only one element x i in each of the phrase set B i. If we define opt''(i,j)=min{P(X)lXeK(x i ..... xj)} then Proposition 1 is reduced to the following simpler form.</Paragraph>
    <Paragraph position="3"> Proposition 3 For lJiJjJN,  (1) if i=j, then opt&amp;quot;(i,j)=O, (2) and if i&lt;j, then</Paragraph>
    <Paragraph position="5"> It is easy to see that opt''(i,j) and opt&amp;quot;(i/m,j/m) (m~O) can be calculated independently of each other. This motivates us to devise a parallel and layered computation structure in which processing elements are arranged in a 2-dimensional array as shown in Fig.6. There are N(N+I)/2 processing elements in total. The node(i,j) has an internal structure as shown in Fig.7, and is connected with node(i,k) and node(k/l,j) (lJk&lt;j-1) as in Fig.8. The bottom elements, node(i,i)'s (l&lt;i&lt;N), hold value 0 and do nothing else. The node(i,j) calculates the value of opt&amp;quot;(i,j) and holds the result in memory i together with the optimal segmentation point in memory 2. Within a layer all the nodes work independently in parallel and the computation proceeds from the lower to upper layer. An upper node receives information about a longer sub-sequence than a lower node: an upper node processes more global information than a lower node. When \[. oinio;zatio.</Paragraph>
    <Paragraph position="6"> ...</Paragraph>
    <Paragraph position="7"> x, '&amp;quot;ut t on  the top element, node(1,N), finishes its iob, each node holds information which is uecessary to compose the optimal dependency '.~tructure on XlX2...x N. This computation ~;tructure, having many simple inter-related computing elements, might be reminiscent of a conneetionist model or a neural network.</Paragraph>
    <Paragraph position="8"> This result can be easily extended, based ,:)n Proposition 1, to the case in which each phrase set has more than one elements. In i:his case processing elements are arranged in a 3-dimensional array as shown in Fig.9.</Paragraph>
    <Paragraph position="9"> The bottom elements, node(i,i;p)'s, hold the value of s(Xip). The node(i,jp) calculates I:he value of opt(i,j;p). The computation i,roceeds from tile lower to upper layer just as in the previous simpler case. Further extension of this str.ucture is also possible :',o that it can handle a general phrase latl;ice. null</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
?. Related Works
</SectionTitle>
    <Paragraph position="0"> The problem of selecting an appropriate ?hrase sequence from a phrase lattice has been treated in the field of Japanese word ?recessing, where a non-segmented Japanese t:ext spe\].led in kana character must be converted into an orthographic style spelled in kana and kanji. Several practical methods have been devised so far. Among them, the approach in \[Oshima 86\] is close in idea to the present one in that it utilizes the Japanese case grammar in order to disambiguate a phrase lattice. However, their method is enumeration-oriented and some kind of heuristic process is necessary to reduce the size of the phrase lattice before syntactic analysis is performed.</Paragraph>
    <Paragraph position="1"> In order to disambiguate the result of speech recognition, an application of dependency analysis was attempted \[Matsunaga 86, Matsunaga 87\]. The algorithm used is a bottom-up, depth-first search, and it is reported that it takes considerable processing time. By introducing a beam search technique, computing time can be very much reduced \[Nakagawa 87\], but with loss of global optimality.</Paragraph>
    <Paragraph position="2"> Perhaps tile most closely related algorithm will be (extended)CYK algorithm with probabilistic rewriting rules \[Levinson 85, Ney 87, Nakagawa 87\]. In spite of the difference in the initial ideas and the formulations, both approaches lead to similar bottom-up, breadth-first algorithms based on the principle of dynamic programming.</Paragraph>
    <Paragraph position="3"> In Fig.2, if each phrase set has only one phrase, and the value of between-phrase penalty is 0 or 1, then the algorithm reduces to the conventional Japanese dependency analyzer \[Hitaka 80\]. Thus, the algorithm presented here is a twofold extension of the conventional Japanese dependency analyzer: the value of between-phrase penalty can take an arbitrary real number and it can analyze not only a phrase sequence but a phrase matrix and a phrase lattice in polynomial time.</Paragraph>
    <Paragraph position="4"> We have considered a special type of dependency structure ill this paper, in which a modificant never precedes the modifier as is normally the case in Japanese. It has been shown that the algorithm can be extended to cover a more general dependency structure \[Katoh 893.</Paragraph>
    <Paragraph position="5"> The fundamental algorithm presented here has been modified and extended, and utilized for speech recognition \[Matsunaga 88\].</Paragraph>
    <Paragraph position="6"> 8. Concluding Remarks In the method presented here, the linguistic data and the algorithm are completely separated. The linguistic data are condensed in the penalty function which measures the naturalness of modifier-modificant relation between two phrases. No heuristics has slipped into the algorithm. This makes the whole procedure very transparent.</Paragraph>
    <Paragraph position="7"> The essential part of the algorithm is execution of numerical optimization rather than symbolic matching unlike conventional parsers. Therefore it can be easily implemented on an arithmetic processor such as DSP (Digital Signal Processor). The parallel 5 315 and layered structure will fit LSI implementation. null An obvious limitation of this method is that it takes account of only pair-wise relation between phrases. Because of this, the class of sentences which have a low penalty in the present criterion tends to be broader than the class of sentences which we normally consider acceptable. Nevertheless, this method is useful in reducing the number of candidates so that a more sophisticated linguistic analysis becomes possible within realistic computing time in a later stage.</Paragraph>
    <Paragraph position="8"> A reasonable way of computing the penalty for a phrase pair is yet to be established.</Paragraph>
    <Paragraph position="9"> There seems to be two approaches to this problem: a deterministic approach taking syntactic and semantic relation between two phrases into consideration, and a statistical one based on the frequency of co-occufence of two phrases.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML