File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/p91-1038_metho.xml

Size: 19,521 bytes

Last Modified: 2025-10-06 14:12:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1038">
  <Title>A Preference-first Language Processor Integrating the Unification Grammar and Markov Language Model for Speech Recognition-ApplicationS</Title>
  <Section position="3" start_page="293" end_page="294" type="metho">
    <SectionTitle>
2. The Proposed
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="293" end_page="294" type="sub_section">
      <SectionTitle>
Language Processor
</SectionTitle>
      <Paragraph position="0"> The language processor proposed in this paper is shown in Fig. 1, where an acoustic signal preprocessor is included to form a complete speech recognition system. The language processor consists of a language model and a parser. The language model properly integrates the unification grammar and the Markov language model, while the parser is defined based on the augmented chart and the preference-first parsing algorithm. The input speech signal is first processed by the acoustic signal preprocessor; the corresponding word lattice will thus be generated and constructed onto the augmented chart. The parser will then proceed to build possible constituents from the word lattice on the augmented chart in accordance with the language model and the preference-first parsing algorithm. Below, except the preference-first parsing algorithm presented in detail in the next section, all of other elements are briefly summarized.</Paragraph>
      <Paragraph position="1"> The Laneua~e Model The goal of the language model is to participate in the selection of candidate constituents for a sentence to be identified. The proposed language model is composed of a PATR-II-like unification grammar (Sheiber, 1986; Chien, 1990a) and a first-order Markov language model (Jelinek, 1976) and thus, combines many features of the grammatical and statistical language modeling approaches. The PATR-II-Iike unification grammar is used primarily to distinguish between well-formed, acceptable word sequences against ill-formed ones, and then to represent the structural phrases and categories, or to fred the intended meaning depending on different applications. The first-order Markov kmguage model, on the other hand, is used to guide the parser toward correct search directions, such that many noisy word hypotheses can be rejected and many unnecessary constituents can be avoided, and the most promising sentence hypothesis can thus be easily found. In this way the weakness in either the PATR-II-like unification grammar (Sheiber, 1986), e.g., the heavy reliance on rigid linguistic information, or the first-order Markov language model (Jelinek, 1976), e.g., the need for a large training corpus and the local prediction scope can also be effectively remedied.</Paragraph>
      <Paragraph position="2"> The Augmented Chart and the Word l~atticC/ Parsing Scheme Chart is an efficient and widely used working structure in many natural language processing systems (Kay, 1980; Thompson, 1984), but it is basically designed to parse a sequence of fixed and known words instead of an ambiguous word lattice.</Paragraph>
      <Paragraph position="3"> The concept of the augmented chart has recently been successfully developed such that it can be used to represent and parse a word lattice (Chien, 1990b).</Paragraph>
      <Paragraph position="4"> Any given input word lattice for parsing can be represented by the augmented chart through a mapping procedure, in which a minimum number of vertices are used to indicate the end points for all word hypotheses in the lattice, and an inactive edge is used to represent every word hypotheses. Also, specially designed jump edges are constructed to link some edges whose corresponding word hypotheses can possibly be connected but themselves are physically separated in the chart. In this way the basic operation of a chart parser can thus be properly performed on a word lattice. The difference is that two separated edges linked by a jump edge can also be combined as long as the required condition is satisfied. Note that in such a scheme, every constituents (edge) will be constructed only once, regardless of the fact that it may be shared by many different sentence hypotheses.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="294" end_page="295" type="metho">
    <SectionTitle>
3. The Preference-first Parsing Algorithm
</SectionTitle>
    <Paragraph position="0"> The preference-first parsing algorithm is developed based on the augmented chart summarized above, so that the difficult word lattice parsing problem is reduced to essentially a well-known chart parsing problem. This parsing algorithm is a general algorithm, in which various preference-first parsing strategies defined by different construction principles and decision rules can be combined with the island-driven parsing concept, so that the constituent selection and search directions can be appropriately determined by Markovian probabilities, thus rejecting many noisy word hypotheses and significantly reducing the search space. In this way, not only can the features of the grammatical and statistical approaches be combined, but the effects of the two different approaches are reflected and integrated in a single algorithm such that overall performance can be appropriately optimized. Below, more details about the algorithm will be given.</Paragraph>
    <Paragraph position="1"> Example Construction principles: random mincit)le: at 1my ~ nmd~ly select It c~adidatc conslJt ucnt to be constttlct~ probability selection l~rinciole: at any dmc the candi~lltC/ consdtucnt with file highest probability will bC/ constnlcte.d ftrst length ~,cleclion ~Hnc~ole: at any time the candidate constituent with the largest numt component word hypoth~es will be constructed ftrst len~,th~robabilltv xe/ection Drlnci~le: at any tlmC/ the c~mdldatC/ constituent with the highest probability among those with the largest number of component &amp;quot;~td hypotheses wltt b~ C/otts~ctcd tint Example Decision rules: hi~hcst nrc, bab~titv rule; ~fft~r lilt grammatical scntoncc constituents have been {ound, one with the higher probability L~ taken as tlc re~uh ~rst- 1 rulG: the rtrst grlunmatlcal ~:ntcncC/ constilucnt obtained during the con~ of parsing is ulkcn as the Rsuh first-k rule: the sontcncC/ constltmmt with ~hc highest probability among the first k c.o~s~ctC/d C/rammadcal scnunac~ constituents obkaincd during thc course ol'parsi;~ is taken as the result The performance of these various construction principles and decision rules will be discussed in Sections 5 and 6 based on experimental results.</Paragraph>
    <Section position="1" start_page="294" end_page="294" type="sub_section">
      <SectionTitle>
Probabilitv Estimation
for Constructed Constituents
</SectionTitle>
      <Paragraph position="0"> In order to make the unification-based parsing algorithm also capable of handling the Markov language model, every constructed constituent has to be assigned a probability. In general, for each given constituent C a probability P(C) = P(W c) is assigned, where W c is the component word hypothesis sequence of C and P('W c) can be evaluated from the Markov language model. Now, when an active constituent A and an inactive constituent I form a new constituent N, the probability P(N) can be evaluated from probabilities P(A) and P(I). Let W n, W a, W i be the component word hypothesis sequences of N, A, and I respectively. Without loss of generality, assume A is to the left of I, thereby Wn = WaWi = Wal ..... Wam,Wil ..... Win, where wak is the k-th word hypothesis of Wa and Wik the k-th word hypothesis of Wi. Then,</Paragraph>
      <Paragraph position="2"> This can be easily evaluated in each parsing step.</Paragraph>
    </Section>
    <Section position="2" start_page="294" end_page="295" type="sub_section">
      <SectionTitle>
The Preference-first Construction
Princinles and Decision Rules
</SectionTitle>
      <Paragraph position="0"> Since P(C) is assigned to every constituent C in the augmented chart, various parsing strategies can be developed for the preference-first parsing algorithm for different applications. For example, there can be various construction principles to determine the order of constituent construction for all possible candidate constituents. There can also be various decision rules to choose the output sentence among all of the constructed sentence constituents. Some examples for such construction principles and decision rules are listed in the following.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="295" end_page="295" type="metho">
    <SectionTitle>
4. The Experimental System
</SectionTitle>
    <Paragraph position="0"> An experimental system based on the proposed language processor has been developed and tested on a small lexicon, a Markov language model, and a simple set of unification grammar rules for the Chinese language, although the present model is in fact language independent. The system is written in C language and performed on an IBM PC/AT.</Paragraph>
    <Paragraph position="1"> The lexicon used has a total of 1550 words. They are extracted from the primary school Chinese text books currently used in Taiwan area, which arc believed to cover the most frequently used words and most of the syntactic and semantic structures in th~ everyday Chinese sentences. Each word stored in lexicon (word entry) contains such information as the. word name, the pronunciations (the phonemes), the lexical categories and the corresponding feature structures. Information contained in each word entry is relatively simple except for the verb words, because verbs have complicated behavior and will play a central role in syntactic analysis, The unification grammar constructed includes about 60 rules. It is believed that these rules cover almost all of the sentences used in the primary school Chinese text books. The Markov language model is trained using the primary school Chinese text books as training corpus. Since there are no boundary markers between adjacent words in written Chinese sentences, each sentence in the corpus was first segmented into a corresponding word string before used in the model training. Moreover, the test data include 200 sentences randomly selected from 20 articles taken from several different magazines, newspapers and books published in Taiwan area. All the words used in the test sentences are included in the lexicon.</Paragraph>
  </Section>
  <Section position="6" start_page="295" end_page="295" type="metho">
    <SectionTitle>
5. Test Results (I) -- Initial
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="295" end_page="295" type="sub_section">
      <SectionTitle>
Preference-first Parsing
Strategies
</SectionTitle>
      <Paragraph position="0"> The present preference-first language processor is a general model on which different parsing strategies defined by different construction principles and decision rules can be implemented. In this and the next sections, several attractive parsing strategies are proposed, tested and discussed under the test conditions presented above. Two initial tests, test I and II, were first performed to be used as the baseline for comparison in the following. In test I, the conventional unification-based grammatical analysis alone is used, in which all the sentence hypotheses obtained from the word lattice were parsed exhaustively and a grammatical sentence constituent was selected randomly as the result; while in test II the first-order Markov modeling approach alone is used, and a sentence hypothesis with the highest probability was selected as the result regardless of the grammatical structure. The correct rate of recognition is defined as the averaged percentage of the correct words in the output sentences. The correct rate of recognition and the approximated average time required are found to be 73.8% and 25 see for Test I, as well as 82.2% and</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="295" end_page="296" type="metho">
    <SectionTitle>
3 see for Test II, as indicated in the first two rows of
</SectionTitle>
    <Paragraph position="0"> the unification grammar and the Markov language model will be integrated in the language model to obtain better results.</Paragraph>
    <Paragraph position="1"> The parsing strategy 1 uses the random selection principle and the highest probability rule ( as listed in Section 3), and the entire word lattice will be parsed exhaustively. The total number of constituents constructed during the course of parsing for each test sentence are also recorded. The results show that the correct rate of recognition can be as high as 98.3%. This indicates that the language processor based on the integration of the unification grammar and the Markov language model can in fact be very reliable. That is, most of the interferences due to the noisy word hypotheses are actually rejected by such an integration. However, the computation load required for such an exhaustive parsing strategy turns out to be very high (similar to that in Test 13, i.e., for each test sentence in average 305.9 constituents have to be constructed and it takes about 25 sec to process a sentence on the IBM PC/AT. Such computation requirements will make this strategy practically difficult for many applications. All these test data together with the * results for the other three parsing strategies 2-4 are listed in Table 1 for comparison.</Paragraph>
    <Paragraph position="2"> The basic concept of parsing strategy 2 (using the probability selection principle and the first-1 rule, as listed in Section 3 ) is to use the probabilities of the constituents to select the search direction such that significant reduction in computation requirements can be achieved. The test results (in the fourth row of Table 1) show that with this strategy for each test sentence in average only 152.4 constituents are constructed and it takes only about 12 see to process a sentence on the PC~AT, and the high correct rate of recognition of parsing strategy 1 is almost preserved, i.e., 96.0%. Therefore this strategy represents a very good made, off, i.e., the computation requirements are reduced by a factor of 0.50 ( the constituent reduction ratio in the last second column of Table 1 is the ration of the average number of built constituents to that of Strategy 1), while the correct rate is only degraded by 2.3%.</Paragraph>
    <Paragraph position="3"> However, such a speed (12 sac for a sentence) is still  very low especially if real-time operation is considered.</Paragraph>
    <Paragraph position="4"> 6. Test Results (1I) --</Paragraph>
    <Section position="1" start_page="295" end_page="296" type="sub_section">
      <SectionTitle>
Improved Best-first Parsing
Strategies
</SectionTitle>
      <Paragraph position="0"> In a further analysis all of the constituents constructed by parsing strategy 1 were first divided into two classes: correct constituents and noisy constituents. A correct constituent is a constituent without any component noisy word hypothesis; while a noisy constituent is a constituent which is not correct. These two classes of constituents were then categorized according to their length (number of word hypotheses in the constituents). The average probability values for each category of correct and noisy constituents were then evaluated. The results are plotted in Fig. 2, where the vertical axis shows the average probability values and the horizontal axis denotes the length of the constituent. Some observations can be made as in the following. First, it can be seen that the two curves in Fig. 2 apparently diverge, especially for longer constituents, which implies that the Markovian probabilities can effectively discriminate the noisy constituents against the correct constituents (note that all of thoze constituents are grammatical), especially for longer constituents. This is exactly why parsing strateg~ :I and 2 can provide very high correct rat~,~.</Paragraph>
      <Paragraph position="1"> Furthermore, Fig. 2 also shows that in gene~l the probabilities for shorter constituents wo~(i usually be much higher than those for longer constituents. This means with parsing strategy 2 almost all short constituents; no matter noisy or  correct, would be constructed first, and only those long noisy constituents with lower probability values can be rejected by the parsing strategy 2. This thus leads to the parsing strategies 3 and 4 discussed below.</Paragraph>
      <Paragraph position="2"> In parsing strategy 3 (using the length/probability selection principle and First-1 rule, as listed in Section 3), the length of a constituent is considered first, because it is found that the correct constituents have much better chance to be obtained very quickly by means of the Markovian probabilities for longer constituents than shorter correct constituents, as discussed in the above. In this way, the construction of the desired constituents would be much more faster and very significant reduction in computation requirements can be achieved. The test results in the fifth row of Table 1 show that with this strategy in average only 70.2 constituents were constructed for a sentence, a constituent reduction ratio of 0.27 is found, and it takes only about 4 sec to process a sentence on PC/AT, which is now very close to real-time. However, the correct rate of recognition is seriously degraded to as low as 85.8%, apparently because some correct constituents have been missed due to the high speed construction principle. Fortunately, after a series of experiments, it was found that in this case the correct sentences very often appeared as the second or the third constructed sentences, if not the first. Therefore, the parsing strategy 4 is proposed below, in which everything is the same as parsing strategy 3 except that the first-1 decision rule is replaced by the first-3 decision rule. In other words, those missed correct constituents can very possibly be picked up in the next few steps, if the final decision can be slightly delayed.</Paragraph>
      <Paragraph position="3"> The test results for parsing strategy 4 listed in the sixth row of Table 1 show that with this strategy the correct rate of recognition has been improved to 93.8% and the computation complexity is still close to that of parsing strategy 3, i.e., the average number of constructed constituents for a sentence is 91.0, it takes about 5 sec to process a sentence, and a constituent reduction ratio of 0.29 is achieved. This is apparently a very attractive approach considering both the accuracy and the computation complexity. In fact, with the parsing strategy 4, only those noisy word hypotheses which both have relatively high probabilities and can be unified with their neighboring word hypotheses can cause interferences.</Paragraph>
      <Paragraph position="4"> This is why the noisy word hypothesis interferences can be reduced, and the present approach is therefore not sensitive at all to the increased number of noisy word hypotheses in a very large vocabulary environment. Note that although intuitively the integration of grammatical and statistical approaches would imply more computation requirements, but here in fact the preference-first algorithm provides correct directions of search such that many noisy constituents are simply rejected and the reduction of the computation complexity makes such ah integration also very attractive in terms of computation requirements.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML