File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/80/c80-1071_intro.xml

Size: 10,063 bytes

Last Modified: 2025-10-06 14:04:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C80-1071">
  <Title>SPEECH RECOGNITION SYSTEM FOR SPOKEN JAPANESE SENTENCES</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2. Acoustic Analyser and Matching Method
</SectionTitle>
    <Paragraph position="0"> A psychology based model is used to obtain neat phoneme string from speech wave using the following feature parameters determined every ten milli-seconds \[5\].</Paragraph>
    <Paragraph position="1">  (i) Maximum value of amplitudes, (ii) Number of zero-crossing, (iii) Normalized prediction error, (iv) Pareor-coefficients, (v) Variation of Parcor-coefficients between successive frames, (vi) Frequency spectrum, (vii) Formant frequencies.</Paragraph>
    <Paragraph position="2">  The output phonemes and their decision methods are given in Table 2.1. The obtained output phoneme strings contain 5 Japanese vowels, a nasal group, an unvoiced stop consonant group, /s/, /h/, /r/, buzz parts and silence. Discrimination of each stop consonant \[8\] and that of each nasal consonant are not yet embodied in this system. null Vowels and /s/ having long duration and silent parts are used as characteristic phonemes.  Besides an ordinary word dictionary, a characteristic phoneme dictionary (This dictionary exists only implicitly and is automatically composed from the word dictionary which is written in Roman letters.) is prepared and presents major acoustic features of each word. These major features are used for reduction of the number of candidate words.</Paragraph>
    <Paragraph position="3"> For matching between a phoneme string with erroneous phonemes and items of the word or characteristic phoneme dictionaries, a new matching method using graph theory is devised \[7\]. These acoustic and matching processings are the same as the ones in the previous systems.  3. Knowledge Representation 3.1. Syntactic Knowledge 3.1.1. Classification of Japanese words  for machine reco@nition In order to automatically recognize continuously spoken natural languages, it is necessary to use syntactic rules. However using the original form of Japanese grammar written by grammarians is not necessarily suitable for mechanical recognition. Moreover it is very difficult to reduce the number of predicted words only by syntactic information because of the nature of Japanese language which does not require to keep the word order so rigorously. Taking account of these conditions, Japanese words are classified as described in the following article and the syntax may preferably be represented by state transition networks as shown in section 3.1.3.</Paragraph>
    <Paragraph position="4"> 3.1.1.1. Classification of words by parts of speech Each word is classified grammatically as given in Table 3.1. In Japanese nouns, pronouns, numerals and quasi-nouns (KEISHIKI-MEISHI in Japanese) are called substantives (inflexionless parts of speech in Japanese grammar, TAIGEN in Japanese), and verbs, auxiliary verbs and adjectives are called inflexional words (inflexional parts of speech! YOGEN in Japanese). Meanwhile the words No. 1 - No. ii in Table 3.1 are inflexionless words and the words No. 12 - No. 15 are  inflexional words. In No. 16 the inflexion rules necessary for each inflexional word are written in appropriate forms. The additional word &amp;quot;carriage return&amp;quot; in No. 17 is a special symbol. We ask each spejker to utter the word &amp;quot;carriage return&amp;quot; at the end of each sentence in order to inform the recognizer of the end of a sentence. Japanese verbs, adjectives and auxiliary verbs are inflexional. The verb's inflexion has been classified traditionally into 5 kinds of inflexion types: GODAN-KATSUYO (inflexion), KAMI-ITCHIDAN-KATSUYO, SHIMO-ICHIDAN-KATSUYO, SAGYO-HENKAKU-KATSUYO and KAGYO-HENKAKU-KATSUYO. But we classify them into 14 types as given in Table</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 taking into account the combination of the
</SectionTitle>
      <Paragraph position="0"> stem, a consonant following the stem and the inflexional ending of each word. Examples are shown in Fig. 3.1. By so doing the number of inflexion tables becomes smaller.</Paragraph>
      <Paragraph position="1"> The adjectives and verbal-adjectives(KEIYO-DOSHI in Japanese) have we classified into 3 types according to their inflexion. Two types of them are shown in Fig. 3.2.</Paragraph>
      <Paragraph position="2"> The inflexion of auxiliary verbs is the same as the traditional one. Some examples are</Paragraph>
      <Paragraph position="4"> Inflexion of verbs: IKU (go) (No.l in Table 3.2) and YOMU (read) (No.6 in Table 3.2). RENYO or RENTAI means that the following word must be inflexional or substantive respectively. The following words TA and DA are auxiliary verbs and</Paragraph>
      <Paragraph position="6"> Examples of inflexion of an adjective and a verbal-adjective. The numbers in parentheses are identified with the ones in Fig. 3.1.</Paragraph>
      <Paragraph position="8"> Examples of inflexion of auxiliary verbs. The numbers in parentheses are identified with the ones in Fig. 3.1.</Paragraph>
      <Paragraph position="9">  3.1.1.2. Classification of words by syntactic functions In a Japanese sentence some words express material (no~ma) such as substantives and verbs, and the others express syntactic function (no~sis) such as particles and auxiliary verbs \[9\]. The latter controls the syntactic function of the former, or; in other words, gives a material word or phrase a modifying function and these two words usually appear in a pair in sentences. The pair is called a phrase, and some modifying relation is established between phrases. And those modifying relations between phrases compose a sentence. In some cases a phrase consists of only a word such as an adjective, an adverb and some inflexional word, without being accompanied by any word that expresses a syntactic function, and itself carries a syntactic function. Some examples are shown here.</Paragraph>
      <Paragraph position="11"> (b) Modification of an inflexional word or phrase Some examples are shown in above (ii). (c) Termination (the end of a sentence). 3.1.3. Szntactic state transition network  A syntactic state transition network is a network which represents the Japanese syntax\[10\]. The standard form is shown in Fig. 3.4, where each S represents a syntactic state, an arrow a transition path to the next state, C a part of speech, and I syntactic information. Therefore, if a state S O is followed by the part of speech C O then the state transits context-freely to S 1 outputting syntactic information I 0.</Paragraph>
      <Paragraph position="12"> To an inflexional word a transition network is also applied and represents the inflexion. In speech recognition it is necessary to pursue the whole transition from the stem of an inflexional word to the end of inflexion, in other words, to predict the stem of an inflexional word with its inflexional ending and to output the syntactic information comprehensively for the whole words including their inflexions. In Fig. 3.5 is shown an example of transition network and accompanying syntactic information for two verbs &amp;quot;IKU(go)&amp;quot;</Paragraph>
      <Paragraph position="14"> Standard form of syntactic state transition network. SO, Sl: states, CO: part of speech or inflection, I0: syntactic information.</Paragraph>
      <Paragraph position="16"> and YOMU (read)&amp;quot; with their inflexion and syntactic information. X/Z means that X is output letters and Z is the syntactic information. ~: empty, CR: carriage return, P: particle, and the numbers are identified with the ones in Fig. 3.1.</Paragraph>
      <Paragraph position="17">  --475-and &amp;quot;YOMU (read)&amp;quot;. This procedure corresponds to predicting all possible combinations of a verb with auxiliary verbs. For example, for a word &amp;quot;go&amp;quot;, it may be better to predict probable combinations: go, goes, will go, will have gone, went and so on, though the number of probable combinations will be restricted.</Paragraph>
      <Paragraph position="18"> The syntactic state transition network can not only predicts combinable words but also outputs syntactic information about modifying relation between phrases.</Paragraph>
      <Paragraph position="19">  3.2. Knowledge about Vocabulary 3.2.1. Word dictionary  Each word is entered in a word dictionary in group according to part of speech as shown in Fig. 3.6. Each entry and its inflexion table are represented in Roman letters together with semantic information. If a part of speech is predicted using the syntactic state transition network, a word group of the predicted part of speech is picked out from the dictionary.</Paragraph>
      <Paragraph position="20"> 3.2.2. Automatic translating routine for Roman letter strings and inflexion tables This routine translates a word written in Roman letters into a phoneme string using a table \[ii\]. A translated phoneme string of a predicted word is used as a reference for matching an input phoneme string. This routine can also extract the characteristic phoneme string of a word. A characteristic phoneme string of a word contains only phonemes to be surely extracted from the speech wave. It is composed of vowels, /s/ and silence, and represents major acoustic information of a word. Some examples of the phoneme strings are shown in Table 3.3.</Paragraph>
      <Paragraph position="21"> For matching procedure between an input phoneme string and a predicted word are used both phoneme and characteristic phoneme strings of the word. Here, these phoneme strings are not stored in the word dictionary. The system has only one word dictionary written in Roman letters and phoneme stringsnecessary for matching are produced each time from the word dictionary using the translating routine. This fact makes it very easy to enrich the entry of vocabulary. part of Word  tic phoneme strings of words. P: unvoiced stop, N: nasal, B: buzz, .: silence.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML