File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-3165_metho.xml

Size: 14,877 bytes

Last Modified: 2025-10-06 14:13:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-3165">
  <Title>Interactive Speech Understanding</Title>
  <Section position="3" start_page="0" end_page="12" type="metho">
    <SectionTitle>
2 Enhanced GLR Parsing for
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Speech Understanding
</SectionTitle>
      <Paragraph position="0"> Ill this section, tile GI,R patrsing method is described first. Then some techniques which enhatnce the robustness are described.</Paragraph>
      <Paragraph position="1"> AcrEs DE COLING-92. NANTES, 23.28 ^ot~'r 1992 1 0 S 3 Paoc. OF COLING-92, NAtcrras. Aoo. 23-28, 1992</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Background: GLR Parsing
</SectionTitle>
      <Paragraph position="0"> The LR parsing technique was originally developed for the compilers of programming languages \[1\] and has been extended for natural language processing \[11\]. The GLI\[ parsing analyzes the input sequence from left to right with no backtracking by looking at the parsing table constructed from the context-flee grammar rules in advance.</Paragraph>
      <Paragraph position="1"> An example grammar and its parsing table are shown in Figure l and Figure 2 respectively.</Paragraph>
      <Paragraph position="2"> Entries &amp;quot;s n&amp;quot; in the action table (the left part of the table) indicate the action &amp;quot;shift one word from the input 1)uffcr onto the stack and go to state n&amp;quot;. Entries &amp;quot;r n&amp;quot; indicate tile action &amp;quot;reduce constituents on the stack usiug rule n&amp;quot;. The entry &amp;quot;ace&amp;quot; stands for the action &amp;quot;accept&amp;quot;, and t)lank spaces represent &amp;quot;error&amp;quot;. &amp;quot;$&amp;quot; in the action table is the end-of-inl)ut symbol. The goto table (the right part of the table) decides to which state the parser shouhl go after a reduce action. The LR parsing table in Figure 2 is different fi'om regular LR tables utilized by the compilers in that there are multiple entries, called conflicts, on the rows of state 11 and 12. While the encountered entry has only one aztion, parsing proceeds exactly the same way as the regular LR parsing.</Paragraph>
      <Paragraph position="3"> In case there are multiple actions in an entry, all the actions are executed with the graph-structured stack \[11\].</Paragraph>
      <Paragraph position="4">  (1) S --&gt; NP VP (2) S --&gt; S PP (3) NP --&gt; n (4) NP --&gt; det n (5) NP --&gt; NP PP (6) PP --&gt; prep NP (7) VP --&gt; v NP</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="12" type="sub_section">
      <SectionTitle>
2.2 GLR Parsing for Erroneous Sen-
tences
</SectionTitle>
      <Paragraph position="0"> The original GLR parsing method was not designed to handle ungrammatical sentences. This feature is acceptable if the domain is strictly defined and input sentences are correct at all times.</Paragraph>
      <Paragraph position="1"> Unfortunately, accuracy of speech recognition is not 100%. Common errors in speech recognition are insertions, deletions (missing words), and sub-</Paragraph>
      <Paragraph position="3"> stitntions. Some techniques have been developed to handle erroneous sentences for the GLR parsing \[12, 10\].</Paragraph>
      <Paragraph position="4"> * The action table can be looked up in a predictive way to handle a missing word.</Paragraph>
      <Paragraph position="5"> Namely, a set of possible terminal symbols {Ti} at State i can be missing word candidates. null * This way of using the action table is also useful to handle substitution and insertion errors. I.e., the table can tell which part of the input should be replaced by a specific symbol or ignored.</Paragraph>
      <Paragraph position="6"> '\['he parser explores every possibility in paralleP.</Paragraph>
    </Section>
    <Section position="4" start_page="12" end_page="12" type="sub_section">
      <SectionTitle>
2.3 Gap-filling Technique
</SectionTitle>
      <Paragraph position="0"> The techniques described in tile previous section can not handle such a big noise as two consecutive missing words. To cope with this, the gap-filling technique \[9\] is presented here.</Paragraph>
      <Paragraph position="1"> In tile gap-filling GLR parsing, the goto table is consulted just the same way as the action table, in addition to its regular usage. Namely, at state si which is expecting shift action(s), the parser also consults the gore table. If an entry m exists along the row of state sl under the column lie practice, pruning is incorporated to reduce search by using the likelihood attached to each word in the speech hy potheses, ACIT~ DE COLING-92. NANTES. 23-28 ^OI~T 1992 1 0 5 4 PROC. OF COL1NG-92. NAN'r~s, AU6.23-28, 1992 labeled with nontel'nlinM 1), the parser shifts D onto the stack an(l goes to state m. Note that no .:zINP\] -~ input is scanned when this action is performed. ..~):\]~-+~ When the input is in&lt;:omplete, the parser pro 0% \&amp;quot;,. wo cut Bad duces hyl)otheses with a fake nonterminal at tile +&amp;quot;+ ~n .+~v&amp;quot;. adj noisy position. ::, ....&amp;quot; +&amp;quot;....</Paragraph>
      <Paragraph position="2"> We show an example of l)arsing an incorrectly NO 2 Wp \] -B INP\] ~2 recognized sentence &amp;quot;we cut sad with a kuife&amp;quot; using the grammar in I:igure 12 and the LIC/ table in Figure 2. :~ At the initial state 0, the got() ta-Ill( +, tells that the nonterminals N P and S can I&gt;e shifte(1 using the gap-filling technique. Although the first wor&lt;t &amp;quot;we&amp;quot; (noun) is expected at state 0, these fake+ nonterminals are ere+areal (\]&amp;quot;igure 3) in ca+se &amp;quot;we&amp;quot; is an incorrectly recognized word.</Paragraph>
      <Paragraph position="3"> Tile new states for the fake tlonterminals NP and S are 2 and 1, resi&gt;ectively, q'he goto table tells that fake nonterminals PP and VP can be place(\[ at state 2. In this case, however, we do not create these nonterlltinals~ l)eeause two fake l/oIltel'lltinals r+u'ely need to I&gt;e I)\[a(:e&lt;\] adjacently in prac |ice. No further fake nonterminal is att, a.ched to ii i tile fake nonterminal S for the same reason. ~! v+,</Paragraph>
      <Paragraph position="5"> '&amp;quot;,, we OOl had with a knilo n v ad i prop ell n  Iu parsing the third word &amp;quot;sad&amp;quot;, a fake nonterminal \[NP\] to word &amp;quot;cut&amp;quot; keeps the correct path (Figure ,1).</Paragraph>
      <Paragraph position="6"> l'arsing continues in this way and the linal situation is shown in Figure 5. As a result, the parser tinds two snccessfifl parses: (n (v (\[NP\] (prep (det n))))) ((n (v \[NP\])) (prep (de |n))) Namely, the \])arser Jinds &lt;rot that the third word is incorrect and must be the word(s) in NP category.</Paragraph>
      <Paragraph position="7"> 2'J'he terminal symbols of this grammar are grammatical category names called prcterminals. A lexicon should be prepared to map all actual word to its pretcrmina\]. :~'l'hc techniques in the previous section arc enough for parsing this erroneous se/dencc. We use this eXaml&gt;le only for describing Ihe gal~ |iliing techJ,illue.</Paragraph>
      <Paragraph position="9"/>
    </Section>
  </Section>
  <Section position="4" start_page="12" end_page="12" type="metho">
    <SectionTitle>
3 Interactive Speech Under-
</SectionTitle>
    <Paragraph position="0"> standing In this section, the rot)tLstness &lt;)f tlw (;LR parser with various error-recovery techniques (esl)ecia.lly the gap-filling te(:htdque) aga.inst a noisy input is described. Then an interactive way to resolve the unidentified portion is I(reseld.ed.</Paragraph>
    <Section position="1" start_page="12" end_page="12" type="sub_section">
      <SectionTitle>
3.1 Resolving Unidentified Portion
</SectionTitle>
      <Paragraph position="0"> The gap-filling teehniqtm enhances the robustness of the (HAl parsing in handling a noisy int)ut as folk&gt;ws: * A fake nonterminals fills big missing constituents of the input which would yiehl no hylmtheses without the gap-.tilling func+tion. * The gap+filling fiHtction enables an LR parser to perform reduce actions only when the action creates a definite high-score nontermihal. The fake nonterminal is likely to I)e citllel +111 illSel'ti&lt;')ii of all tlllkllo~,vii word, ACI'ES DE COLING-92, NANTES, 23-28 AO(ff 1992 1 0 5 5 PROC. Ol: COLING-92, NANTES. AUG. 2,3+28, 1992 A gap filled with a fake nonterminal can be resolved by reanalysis of the input under the constraint that that portion of the input should yield the specific nonterminal. This top-down reanalysis would be effective against the genuinely bottom-up GLR parsing. In practice, however, a more reliable way is to ask the user to speak only the missed portion. In the previous example, only the portion of \[NP\] shouhl lie st)oken again.</Paragraph>
      <Paragraph position="1"> The parser can analyze the re-utterance efficiently ,as follows:</Paragraph>
    </Section>
    <Section position="2" start_page="12" end_page="12" type="sub_section">
      <SectionTitle>
3.2 Handling Unknown Words
</SectionTitle>
      <Paragraph position="0"> If the reutterance cau not be parsed correctly even by the reutteraime, the unidentified portion is likely to contain an unknown word. Finding an unknown word by a specific nonterminal symbol enables the interactive grammar augmentation as the following, for instance.</Paragraph>
      <Paragraph position="1"> The parser can not identily the ~ollowing portion of your input.</Paragraph>
      <Paragraph position="2"> We cut \[NP\] with a knife If this is a new word in the category of \[NP\] a rule NP --&gt; (recog. result of the 2nd utterance) will be added to the grammar. Is this ok? Handling unknown words is important in natural language processing. For example, Kainioka et al. \[5\] proposed a mechanisnl which parses a sentencc with unknown words nsing Delinite C, lause Gralumars. The efficient gap-filling technique of handling unknown words is quite useful in practical systems and enhances the robustness of the GLR parsing greatly.</Paragraph>
      <Paragraph position="3"> When an unknown word W,,~, is detected, the word should be incorporated into the system. If the grammar is separated from the lexicon, the word can be easily added to the dictionary. If the grammar contains the lexicon, the LR table should be augmented incrementally in the following way.</Paragraph>
      <Paragraph position="4"> 1. For each state si which has an entry under the column of the nonterminal D($~k,) in the goto table, add shift action &amp;quot;s m&amp;quot; (m is the new state number) for W .... (If Wnew consists of such multiple words as &amp;quot;get rid of', a new state should be created for each element of the words. ) 2. Add reduce action &amp;quot;r p&amp;quot; (p is the new rule number) for all the terminals on the row of state nl.</Paragraph>
      <Paragraph position="5"> Before we close this section, wc should consider side etfects of the gap-tilling technique. It is true that putting fake nonterminals expands search. Thus, some side effect might appear if the accuracy of input is not good. Namely, input should be good enough to produce distinct fake nonterminals and real nonterminals. Although it is difficult to analyze this phenomenon theoretically, the following natural heuristics can minimize the search growth.</Paragraph>
      <Paragraph position="6"> o Two consecutive fake uontermiuals are not allowed as shown in the previous section.</Paragraph>
      <Paragraph position="7"> * When a word (Wi) can be shifted to both a fake nonternfinal D.fake and a same-name real nonterminal D~e,z, only D~,t should be valid.</Paragraph>
      <Paragraph position="8"> * When D:,~ and D,.~l (:an be bundled using the local ambiguity packing \[111 tecbnique, discard l) f (,k,,.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="12" end_page="12" type="metho">
    <SectionTitle>
4 Experiments: Parsing Spo-
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="12" end_page="12" type="sub_section">
      <SectionTitle>
ken Sentences
</SectionTitle>
      <Paragraph position="0"> We evaluated effectiveness of tlle enhanced GLR parsing by spoken input. We used a device which recognizes a .lapanese utterance and produces its phoneme sequence \[4\]. The parser we used is 1)ased on the (-HA/ parser exploring the possibilities of substituted/inserted/deleted phonemes \[10\] by looking up the eonfilsion mntrix, which was constructed from the large vocabulary data.</Paragraph>
      <Paragraph position="1"> The confusion matrix is also used to mssign the score to each explored phoneme, because the recogldtion device gives neither the alternative phoneme candidates nor the likelihood of hypothesized phonemes. The gap-filling fimction is incorporated iuto the parser in the following experiments. Parsing a l&gt;honeme seqnence might sound less pot&gt;ular than I)arsing a word lattice in speech AcrEs DE COUNG-92, NANTES, 23-28 Ao(:r 1992 1 0 S 6 PROC. OF COLING-92, NANTES, AU6.23-28, 1992 recognition. Because the parser builds a lattice dynamically in parsing the sequence from left to right using a CFG which contains the dictionary, no static lattice is necessary.</Paragraph>
      <Paragraph position="2"> 125 sentences (five speakers pronounced 25 sentences) were tested in tim domain called &amp;quot;conversation between doctors and patients.&amp;quot; 111 sentences were parsed correctly \[88.8 %\] (the correct sentence was obtained as the top-scored hypothesis). 14 failed sentences can be classified into three groups: (i) 4 sentences were parsed as the top-scored hypotlmses with fake nonterminals. Thus the parser asked the user to speak the unidentitied portion again.</Paragraph>
      <Paragraph position="3"> (ii) 6 sentences were parsed incorrectly in that the correct sentence did not get the highest score mainly because the incorrect nonterminal had a slightly higher score than the correct one. In this case, both the closely-scored correct and incof rect nontermin~s are packed into one nouterlllinal using the local ambiguity packing technique in an efficient implementation. In this situation the parser should ask the user to speak only that unclear portion in the same way as in (i) instead of producing a barely top-scored hypothesis. In the current implementation the parser asks the user which word is the correct one.</Paragraph>
      <Paragraph position="4"> (iii) 4 sentences were pronounced very I)adly.</Paragraph>
      <Paragraph position="5"> The user has to speak the whole sentence again.</Paragraph>
      <Paragraph position="6"> 5 sentences with unknown words were also tested, in all eases, the unknown word was detected. null This result shows that interactive partial re-utterance is very effective both for error-recovery and for detection of unknown words.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML