File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2154_metho.xml

Size: 13,712 bytes

Last Modified: 2025-10-06 14:14:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2154">
  <Title>Modeling Topic Coherence for Speech Recognition</Title>
  <Section position="4" start_page="913" end_page="914" type="metho">
    <SectionTitle>
3 Sublanguage Component
</SectionTitle>
    <Paragraph position="0"> The sublanguage component performs the following four steps:  1. Select keywords from previously uttered sentellces null 2. Collect silnilar articles flom a large corlms based tm the keywords 3. Extract sublanguage words fl'om the similar articles 4. Compute scores of N-best hypotheses based on tile sublanguage wor(ls  A sublanguage analysis is performed separately for each sentence in an artieh; (afl;er the first sentence). There are several parameters in these pro~ eesses, and the values of the parameters we usetl for this experiment will be summarized at the end of each section below. We generally tried several parameter values and tile vahles shown in this paper are the best ones on our training data set;. We used a large corpus in the experiment as the source for similar articles. This corpus includes 146,000 articles, or 76M tokens, from January 1992 to .hfly 1995 of North American Business News which consists of Dow Jones Information Services, New York Times, Reuters North American Business Report, Los Angeles Times, and Washington Post. This corpus has no overlap with the evaluation data set, which is drawn from August 1995 North American Business News.</Paragraph>
    <Paragraph position="1"> Now, each step of our sublanguage component will be described in detail.</Paragraph>
    <Section position="1" start_page="913" end_page="914" type="sub_section">
      <SectionTitle>
Select Keywords
</SectionTitle>
      <Paragraph position="0"> The keywords which will be used in retrieving similar articles are selected from previously dictated sentences. Tile system we will describe here is an incremental adal)tation system, which uses only the inlbrmation the syst;em has acquired from tile previous utterances. St) it does not know the correct transcriptions of prior sentences or any information about subsequent sentences in the article.</Paragraph>
      <Paragraph position="1"> Not all of l, he words from the prior sentences are used as keywords for retrieving similar articles. As is the practice in information retrieval, we filtered out several types of words. First of all, we know that closed class words and high frequency words appear in most of the documents regardless of the topic, st) it is not useflfl to include these as keywords. On the other hand, very low frequency words soinetimes introduce noise into the retrieval process because of their peculiarity.</Paragraph>
      <Paragraph position="2"> Only open-class words of intermediate flequeney (actually frequency from 6 to 100000 in tile corpus of 146,000 articles) are retained as keywords and used in finding the similar articlem Also, because the N-best sentences inevitably contain errors, we set at threshold for the appearance of words in tile N-best sentences. Specifically, we require that a  word aptiear at least 15 tilnes ill the tilt) 20 N-besl; senten(;es (as ranke,(l l/y Sl{.\['s s(:ore) 1;o (luali(y as a keyword for retriewd.</Paragraph>
      <Paragraph position="3">  The sex of keywords is used in order to retrieve similar artMes a.c(:ording to the folh)wing formulas. Ilere Weigh,;('w) is the weight of word w, F'('w) is the, frequency of w(ird 'W in the 20 N-best senten(:es, M is the total Itumb(_!\]' ()\[' t()kens in the corpus, t/(w) is the Dequen(:y of word w in the corpllS, AScorc(a) is artMe seorc of a.rtMe a, which indicates the similarity between the set of keywords an(l the artMe, and n(a) is the mmfl)er of tokens in article a.</Paragraph>
      <Paragraph position="5"> Fm, ch keyword is weighted by t;he l)rodu(:t of two factors. One (if them is the fr(;(luen(:y of the wor(t in the 20 N-1lest senten(:es, and the other is the log of the inverse t)robability ()f the wor(l in the large eortms. This is a standard metric ()f infer mation retrieval based on the assumption dmt the higher fre(luency w(irds provide less intormation about topics (Si)arck-Jones, 1973). Article scores (AScore) for all articles in the large (:ortms are (:oInputed as the sum of the weighted scores of the selecte(t keywords in each arti(:le, an(1 are n(irrealized t)y the log of the size of each article. This s(:ore in(li(:ates the similarity b(!tween the set, of keywords and the article. We (:olleet the most similar 50 artMes D()m the corpus. These forln the. &amp;quot;sift)language. set&amp;quot;, whi(:h will I)e use(l in analyzing the f()llowing sentenc4; in the test m'ti(:le.</Paragraph>
      <Paragraph position="6"> I w l&amp;quot;e I umber of artMes ill sublanguage set  Sublanguage words are extra(:ted from the collected sublanguage artMes. This extraction was done ill order to filter out tot)i(:-um'elated words. ltere, we exehtde flmetion WOl'dS, as we did for key-word selection, 1)eta,use flm(:don words are generally coInIllOll throughout (ti\[thrent sul)languages. Next, to find strongly tot)ie related words, we extracl;ed words which apl/ear in at least 3 (lilt (if the  50 sublmlguage articles. Also, the do(:tnnent Dequen(:y in sublanguage articles has to be at least 3 times the word Dequency in the large corpus:</Paragraph>
      <Paragraph position="8"> Itere, DF(w) is the number of do(:uments in whi(:h the wor(l apl)ears. We (:m~ expe(:t that these me;he(Is eliminate, less topi(: relate(l words, s() dtat only str(mgly tot)i(: related wor(ts are extra(:ted as the sul)language words.</Paragraph>
      <Paragraph position="9">  Finally, we ('omput(,. scores of the N-best hypothes(;s gen('.rnted l)y the speech recognizer. 'Fhe top 100 N-t)(;st hypotheses (ae(:ording to SIH's score) are r(&gt;s(:or(~(l. The sul)language score we assign t;() each word is the logarithm of the ratio of d()cunlent h'e(lueal(:y in the sublanguage m'ticles to the word frequen(:y of the word in tile large corpus.</Paragraph>
      <Paragraph position="10"> The larger this s(:ore of a word, tile more strongly the word is related to the sublmlguage we found through the tirior discourse,.</Paragraph>
      <Paragraph position="11"> The s(:ore \['or ea(:h sentenc(; is eal(:ulated by a(:(:unmlating the score of 1he sele(:te(l wor(ls ill the hyt)othesis, tlere l\[Sco'rc(h) is the sut)language score of hypothesis h.</Paragraph>
      <Paragraph position="13"> This formula can be motivated by the fact that die sublanguage score will be combilm(t linearly with general language nlo(M s(:ores, whMl mainly consist of the logarithm of the tri-gram 1)robabilides. The denominator of the log in Formula 4 is the unigram probability of wor(t w. Sin(:e it is the (h!nonlinator ()f a logarithm, it; winks to reduce the effect of the general laltgliage model whMl may be (:;robed(led in the trigranl language mo(M score.</Paragraph>
      <Paragraph position="14"> The nmnerat()r is a pure sublanguage score and it works to ad(l tim s(;or(~ of the sublanguage mo(M to the ()ther s(:ores.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="914" end_page="914" type="metho">
    <SectionTitle>
4 Cache model
</SectionTitle>
    <Paragraph position="0"> A cache model was also used in o/lr (;xpetilll(!ilt.</Paragraph>
    <Paragraph position="1"> We (lid not use all the words in tile previous ul&gt; teran(:e, but rather filtered out several types of words in order to retain only lopi(: relate(1 words.</Paragraph>
    <Paragraph position="2"> We, actually used all of the &amp;quot;selected keywor(ls&amp;quot; as explained in the last section for our ca(:he model.</Paragraph>
    <Paragraph position="3"> Seo~'es for the words iil (:~ehe (CS,:o,',&lt;,,,)) a,e (xmq)ut(;d in a similar way to that for sublanguage words. Here, N' is the number of tokens in the previously uttere(t N-best sentences.</Paragraph>
    <Paragraph position="5"/>
  </Section>
  <Section position="6" start_page="914" end_page="915" type="metho">
    <SectionTitle>
5 Experiment
</SectionTitle>
    <Paragraph position="0"> The speech recognition experiment has been conducted as a part of the 1995 AI1,PA continuous  speech recognition evaluation under the supervision of NIST(NIST, 1996). The conditions of the experiment are: * The input is read speech of unlimited vocabulary texts, selected from several sources of North American Business (NAB) news from the period 1-31 August 1995 * Three non-close talking microphones are used anonymously for each article * All speech is recorded in a room with background noise in the range of 47 to 61 dB (A weighted) * The test involves 20 speakers and each speaker reads 15 sentences which are taken in sequence from a single article * Speaker gender iv unknown The SRI system, which we used as the base system, produces N-bent (with N=I.00) sentences and six kinds of scores, as they are explained before. We produce two additional scores based on the sublanguage model and the cache model. The two scores are linearly combined with SRI's six scores. The weights of the eight scores are determined by minimizing the word error on the training data set. The training data set, has speech data recorded under the same conditions as the evaluation data set. The training data set consists of 256 sentences, 17 articles (a part of the ARPA 1995 CSR &amp;quot;dev test&amp;quot; data distributed by NIST) and does not overlap the evaluation data set.</Paragraph>
    <Paragraph position="1"> The evaluation is clone with the tuned parameters of the sublanguage component and the weights of the eight scores decided by the training optimization. Then the evaluation is conducted using 300 sentences, 20 articles, (the ARPA 1995 CSR &amp;quot;eval test&amp;quot; distributed by NIST) disjoint fl'om the dev test and training corpus. The evaluation of the sublanguage method has to be done by comparing the word error rate (WER) of the system with sublanguage scores to that of the SRI system without sublanguage scores.</Paragraph>
    <Paragraph position="2"> Inevitably, this evaluation is affected by the performance of the base system. In particular, the number of errors for the base system and the minimmn number of errors obtainable by choosing the N-best hypotheses with minimmn error, are important. (We will call the latter kinds of error &amp;quot;MNE&amp;quot; for &amp;quot;minimal N-best errors&amp;quot;.) The difference of these nmnbers indicates the possible improvement we (:an achieve by restoring the hypotheses using additional components.</Paragraph>
    <Paragraph position="3"> We can't expect our sublanguage model to fix all of the 375 word errors (non-MNE). For one thing, there are a lot of word errors unrelated to the article topic, for exmnple function word replacement (&amp;quot;a&amp;quot; replaced by &amp;quot;the&amp;quot;), or deletion or insertion of topic unrelated words (missing Num. of error WER  &amp;quot;over&amp;quot;). Also, the word errors in the first sentence of each article are not withii~ our means to tlX. \]</Paragraph>
  </Section>
  <Section position="7" start_page="915" end_page="915" type="metho">
    <SectionTitle>
6 Result
</SectionTitle>
    <Paragraph position="0"> The absolute improvement using the sublanguage component over SRI's system is 0.65%, from 25.37% to 24.72%, as shown in Table 3. That is, the number of word errors is reduced froin 1522 to 1483. This means that 1.0.40% of the possible improvement was achieved (39 out; of 375). The  absolute improvement looks tiny, however, the reJative improvement excluding MNE, 10.40 %, is quite impressive, becmlse there are several types of error which can not be corrected by the sublanguage model, as was explained before.</Paragraph>
    <Paragraph position="1"> The following is an example of the actual outtmt of the system. (This is a relatively badly recognized example.) .... Example ....</Paragraph>
    <Paragraph position="2"> in recent weeks hyundai corporation and fujitsu limited announced plans for memory chip plants in oregon at projected costs of over one billion dollars each in recent weeks CONTINENTAL VERSION</Paragraph>
  </Section>
  <Section position="8" start_page="915" end_page="916" type="metho">
    <SectionTitle>
SUGGESTS ONLY limited announced plans for
MEMBERSHIP FINANCING FOR IT HAD projected
</SectionTitle>
    <Paragraph position="0"> COST of one DAY each in recent weeks CONTINENTAL VERSION SUGGESTS ONLY limited announced plans for memory chip plants in WORTHINGTON PROJECT COST of one MILLION each 1Note that, in our experiment, a few errors in tattim sentences were corrected, because of the weight optimization based oil the eight scores which includes all of the SRI's scores. But it; is very minor and these improvements are offset by a similar number of disimprovements caused by tile same reason.</Paragraph>
    <Paragraph position="1">  The first sentence is the correct transeriI)tion, the second one is SRI's best scored hypothesis, and the third one is the hypothesis with the highest combined score of SRI and our models. This sentence is the 15th in an article on memory chip production. As you can see, a mistake in SRI's hypothesis, membership instead of memory and chip, was replaced by the correct wor(ts. Ilowever, other parts of the sentence, like hyundai corporation and fuj itsu, were not amended. V~Te lkmnd that this particular error is one (If the MNE, for which there is no C/'orreet candidate in the N-best hypotheses. Another error, million or day instead of billion, is not a MNE. There exist some hypotheses which have billion at the right spot, (the 47th ean(lidate is the top candidate which has the word). Our sublanguage model works to replace word day by million, but this was not the correct word.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML