XML Viewer - c94-2159

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-2159_evalu.xml
Size: 6,983 bytes
Last Modified: 2025-10-06 14:00:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2159">
  <Title>PAUSE AS A PHRASE DEMARCATOR FOR SPEECH AND LANGUAGE PROCESSING</Title>
  <Section position="5" start_page="988" end_page="990" type="evalu">
    <SectionTitle>
ANALYSIS
</SectionTitle>
    <Paragraph position="0"> To examine the l'easibility of integrating h:to syntactic rules both p:msal phenoutena and the fi;ah:res 0\[&amp;quot; SI)OIILI/:IOOIlS speech studied in Section 2, we prepared three, dill'trent sets of rules. In all three s(%s, rules have bee.n exl)licitly u:oditied l;o represent lmUSgd phel:ot:wp.a. The. first set: Pause; contains only such modifications, while I,he other l;wo sets add olle ad ditionai spont:meous 5mtut'e each: rule set Emphasis l&gt;crmits llse o\[&amp;quot; |,he ell:l)hasis marker deswnc el'Let a noun phrase, while rule set Turn allows t)ostposidonal u(;i.erauccs at; t:he end o\[' a turn. \a?e conducted pre.</Paragraph>
    <Paragraph position="1"> liminary speech recoguitiou cxperiment, s with a pgLrser which uses linguist, ic constraints written ~us a CFC.</Paragraph>
    <Paragraph position="2"> (.~Ollstralrlt, s 3.\] Linguistic ~ &amp;quot; To represem; ore' underlying linguistic eonstnfints we adapted existiug synt;wt.ie rules developed for sl)eech recognition\[6\]. Earlier expcriluents using b'lutselsu-based sl)eech input showed 70% sent, ence reeognidon accuracy for tl:e top caudidat, e and 8,1% for d:c. top 5 e:mdidates.</Paragraph>
    <Paragraph position="3"> The format for all of our synt, actic :':alex ix as foblows; null (&lt;CATI&gt; &lt;--&gt; (&lt;CAT2&gt; &lt;CAT3&gt;)) Nonterminals are surrounded by &lt;&gt;. \]'he above rule indicates thal. CATI consists of CAT2 al:d CAT3.</Paragraph>
    <Paragraph position="4"> We denote the categories in interpa::sa/ phrase rules in lower-cruse and t, he categories in interpausal phrase-based se:/gellee rllieS il: upper-case.</Paragraph>
    <Paragraph position="5"> In the rule set Pause we prepared about d5 l&gt;hrases dmt can end will: a pause: postpositionaI phrases, COllj:lllCt, ive phrases, adnominM verbal phrases marked with a special conjugation form,  phrases that end with a conjunctive postposition, adnominal phrases with the genitive postposition no, and coordinate verbal phrases. The first three rules are as follows:</Paragraph>
    <Paragraph position="7"> In the rule set Emphasis we prepared seven additional rules for treating the emphasis marker desune, represented as follows:</Paragraph>
    <Paragraph position="9"> Methods for combining interpausal phrases to obtain an overall utterance meaning require further study. At this stage we defined a sentence very loosely. It can be an interjection; an interjection followed by a combination of interpausal phrases; or simply a combination of interpausal phrases. To allow fragmentary ntterances, in the rule set Turn, we also introduced a sentence consisting of a nominal phrase, which may contain adnominal phrases. Complete sentences in Turn are defined as follows:  A given phoneme string can belong to several categories. For instance, de can be a postposition or a copula conjugation form. The number of different phoneme strings is 503 for Pause and Turn, and 504 for Emphasis.</Paragraph>
    <Section position="1" start_page="989" end_page="990" type="sub_section">
      <SectionTitle>
3.2 Speech Recognition Experiment
</SectionTitle>
      <Paragraph position="0"> We conducted a speech recognition experiment with 118 test sentences concerning secretarial services for an international conference. A professional broadcaster uttered the sentences without any special constraints such as pause placement.</Paragraph>
      <Paragraph position="1"> For our speech recognition parser, we used tIMM-LR\[14\], which is a combination of generalized LR parsing and Hidden Markov Models (HMM). The system predicts phonemes by using an LR parsing table and drives HMM phoneme verifiers to detect or verify them without any intervening structure such as a phoneme lattice. Linguistic rules for parsing can be written m CFG format.</Paragraph>
      <Paragraph position="2"> As mentioned in section 3.1, we explicitly defined rules that can end with pauses in linguistic constraints. According to the pause model, a pause can last from 1 to 150 frames, where a frame lasts 9 reset. Examples (1) and (2) show the results of ItMM-Lit. Japanese speech recognition 2. (1) shows sample results of rule set Pause and (2) shows sample results of Turn. The phoneme strings which were actually pronounced are enclosed in I I:  In the examples, the symbols &gt;, -, N and P have special meaning: A correctly recognized phrase is marked with &gt;. A word boundary is marked with -.</Paragraph>
      <Paragraph position="3"> A syllabic nasal is transcribed N. A pause is marked with p.</Paragraph>
      <Paragraph position="4"> Example (1) shows typical recognition errors involving postpositions like no, m, ga, and o, which often receive reduced pronunciation in natural speech. The surounding context may aggravate the problem.</Paragraph>
      <Paragraph position="5"> IIere, for instance, topic marker wa is erroneously recognized as object marker o in the environment; of preceding and subsequent phoneme o. The possible introduction of pauses at such junctures further complicates the recognition problem. Analysis deeper than CFG parsing will often be needed to filter unlikely candidates. Example (2) demonstrates the dangers of allowing postpositional phrases to end utterances. Here, all recognition candidates other than the third are inappropriate postpositional phrases. To recognize the unlikelihood of such candidates, we will need further controls, such as discourse management.</Paragraph>
      <Paragraph position="6"> Our resulting sentence speech recognition accuracies are shown in Table 5. For instance, using rule set Pause, the correct candidate was the highest ranking candidate 50.0 percent of the time, Rank 1, while the correct candidate was among the top ,5 candidates 55.9 percent of the time, Rank 5.</Paragraph>
      <Paragraph position="7"> 2The maximal amount of the whole beam width, called the global beam width, is set at 100, emd the maximM beau width of each branch, the local beam width, is 12.</Paragraph>
      <Paragraph position="9"> With the underlying linguistic rules fl)r the three rule sets, earlier experiments had achieved 70% sen-I, ence speech l:ecognition accuracy for speech input with explicit p~mses at bunsets'u bonndaries. Our best, present results tbr spontaneous speech are much more modest: 50%.</Paragraph>
      <Paragraph position="10"> 'l'~d~le 5 shows that the introduction of the emphasis marker des'uric did not affect processing: as seen in Table 4, rule set Emphasis has a slightly higher perplexity than Pause, but we had ex~(:tly the same resues for the two. On I;he other hand, the perplexities of Pause and Turn ~re identical, but the treattnent of fragmentary utterances did decrease recognition acClll:acy. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML