File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/69/c69-3301_abstr.xml
Size: 9,846 bytes
Last Modified: 2025-10-06 13:45:45
<?xml version="1.0" standalone="yes"?> <Paper uid="C69-3301"> <Title>CONTINUE END PROGRAM III C AUTOMATIC DISCRIMINATION OF N AND M C UNIVERSITY OF 0ULU FINLAND C INSTITUTE OF PHONETICS C C LOGIC OF THE PROGP, A~i: C 1:CALCULATE THE MEANS OF THE AMPLITUDES AT THE NINE C MEASUREMENT POINTStWHICH ARE THE MOST DISCRIMINATING C POINTS ON THE FREQUENCY AXIS FOR N AND M~. C 2:SET THE AMPLITUDES IN ORDER OF MAGNITUDE! C 3 :INDICATE THE ORDINAL NUMBERS OF THE AMPLITUDES ! C 4:FORM THE GENERAL NUMERICAL MODEL FOR N AND M C ON BASIS OF THE ORDINAL NU~BERS! C 5:CALCULATE THE MODELS OF NEW NASAL SOUNDS WITH C THE SAME METHOD ! C RESOLVE THE PROBLEM:IS THE NEW NASAL SOUND A N OR C A M? CO~LPAIR ITS MODEL WITH THAT OF THE GENERAL. C MODELS OF N AND M! C DIMENSION ASUM (9) ,A~EAN (9), BSUM (9), BHEAN (9) ,NUM (9) D~IENSION AMPLIT ( 9 ) ~ NUMBER ( 9 ) ~ INU~IBR ( 9 ) C C COMPUTATION OF THE MEANS IN THE BASIC MATERIAL C CONSISTING OF A SET OF N AND M SOUNDS</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> AUTOMATIC RECOGNITION OF SPEECH SOUNDS BY A DIGITAL CO~UTER </SectionTitle> <Paragraph position="0"> Three contributions concernin~ the discrimination of the momentan spectrums of some selected Finnish and German sounds The main difficulties in the speech recognition may be listed in the following way: lo Which should be the basic linEuistic units to be recognized: sounds (allophones) t phonemes t segment combinations I syllables I words? 2e Should the output text be written ortograph~cally? How then the problem of the differences between the phonemic form of an utterance and the ortography should be resolved? 3, If the word is chosen as basic units for the recognition~ how one should resolve the problem of the grammatical flexion (e. E. in Finnish)? ~o How can the recognition automation decide r where there is a boundary between two words or two sentences? vant &quot; - nolse$ one must reEard also the noise produced by the automation itselfdeg 8. How to localize the points in the speech continuum t which the recognition can be based on~ is there one. special acoustic segment (or a momentan spectrum) for every sound t which is characteristic, for the sound? 9. It has been shown that segments, which are 1 i n g u i s t i c a 1 1 y i d e n t i c a I t can be acoustically different. The differences are due to followin E factors: (I) The same speaker can not produce two exact similar sounds, because the conception of the identity is a human abstraction. (2) Different speakers produce linguistically the same sound in a different way. (~) Linguistically the. same. so~d can be modified acoustically by the word prominence, sentence prominence I environment, emotional factors t speech tempo I dialectal background of the speaker t speech defects t huskiness t and so on.e Io. L i n g u i s t i c a 1 1 y d i f f e r e n t sounds can be acoustically similaro 11. Should the phonotactic Structures (Sigurd) or the characteristic sequencies (Pike) of a language be regarded when creating the recognition program?.</Paragraph> <Paragraph position="1"> 12. The technical problems form one great part of the speech recognition. They concern the m e c h a n i c a 1 s o 1 u t i o n s and the r e c o g n i t i o n p r o g r a me lo Vowel reco~rnition based on some selected vowel variables and discriminant analysis.</Paragraph> <Paragraph position="2"> i The probability of correct identification of the acoustically close German vowel phonemes /i: t It e:~ ~ t y:, and Y~ on the basis of spectrographic input data and tile discriminant analysis (literate I, 2rand ~) was calculated. One male speaker were used. Following variables were measured: the frequencies of the four first formants. (Fie.. F~)I their amplitudes (Lle.eL~) t the amplitude Of the zero (minimum) point between F1 and F2 (here called LZI) and that between F2 and F3 (LZ2) t and the duration of the vowels. The probability of correct identification was 94 per cent on average. The highest identification probability was shown by the phoneme /e:/ (98,9 ~) and the lowest by the phoneme /Y/ (85~7 ~). The sounds were picked up from sentences read by the informant.</Paragraph> <Paragraph position="3"> In the real classification procedure which was connected to the probabilistic recognition program 6 identifications were false out of 103 possible. The order of the significance of the variables studied regarding their discriminatory power was F2 s LZI~ F1 t F~ s duration s LI~ F3s L~ LZ21L3. - One must take into account the possibility that two variables 9 the discriminatory power of which is good~ will correlate with each other. In this case the better one is placed in a high position in the list I but the other one comes later than its real discriminatory power implies~ because the correlation is taken into account. If the better variable was. not considered~ the weaker variable would perhaps take its place (if the correlation is strong enough). This may explain the fact that F3 comes after F~ (the correlation of F2 with F3 is strong concerning the vowels studied).</Paragraph> <Paragraph position="4"> The energy minimum between F1 and F2 (LZI) had a good discriminatory power. This showes that in the acoustic signal there can be cues~ which are available in the automatic recognition s such cues~ which need not to be relevant for perception (cf. Tillmann~ p. 1~9)deg 2. Recognition based on the discrimination of the numerical models of sounds.</Paragraph> <Paragraph position="5"> In the second experiment the input data of the recognition program consisted of the numerical describers of the sounds. They were formed by using constant points in the measurement of the spectrums of sounds. Thus the describer of a sound consisted of a serie of numbers~ which indicated the amplitude at constant selected frequencies. The narrow filter (with 45 Hz bandwidth) was used when producing the sections r which formed the material measured. 32 measurement points inside the range of ~ kHz were used.</Paragraph> <Paragraph position="6"> The describers for 330 Finnish sound manifestations were calcula'ted. These sounds were representatives for 8 short Finnish vowel or 3 nasal phonemes /a, e, i 0 o, u, y~ a r o r m r n, n/. 30 representatives of every phoneme type were picked up from sentences read by a single male speaker.</Paragraph> <Paragraph position="7"> The data thus obtained were stored and submitted to the discriminating analysis. The measurement points were handled as variables.</Paragraph> <Paragraph position="8"> The probability of correct recognition was about 60deg.deg70 % on average. One must regard r however r that the localization of the sections was (under circumstances) not very exact and the technical equipment was unfortunately not the best one.</Paragraph> <Paragraph position="9"> 30 Recognition based on the numerical models of sounds and a special recognition program..</Paragraph> <Paragraph position="10"> In the third recognition experiment the Finnish nasal sounds belonging to the phonemes /n/ or /m/ were tried to be classified automatically on basis of the numerical describers, which are discussed in the preferring J chapter.</Paragraph> <Paragraph position="11"> Firstly the frequency area of ~ kHz was studied by means of 33 constant measurement points with distances of 121HZo The 'general' describers for /n/ and /m/ were calculated by means of the PROGRAM I (below).</Paragraph> <Paragraph position="12"> The basic material consisted of 87 wide hand sections (made with Kay Electric Co. Sound Sona-Graph model 6061-B)o The sections were made from the target point of F2 of the nasals in single words (all possible enviroD-ments were considered). The describers of /n/ and /m/ are presented graphically in fig. Io The influence of the environment on the dental nasals (n) seems not to be very great (fig, 2)o one male speaker (Finnish) Secondly the numerical describers were restricted so that only nine constant measurement points were considered. The nine points with the best discriminatory power were sought by means of the PROGRAM II (below).</Paragraph> <Paragraph position="13"> Thirdly the 'general' numerical models for the both phonemes were calculated on basis of the nine points mentioned. The logic of the procedure is described shortly at the beginning of the program (PROGRAM III).</Paragraph> <Paragraph position="14"> With the same method the numerical model of a new nasal sound was calculated (PROGRAM III), and the nasal sound was classified by compairing its model with the mean of the models of /n/ and /m/.</Paragraph> <Paragraph position="15"> The main idea of classification is that the amplitudes at the nine measurement points are set on order of magnitude, and then their relative places on the frequency axis are indicated by means of the ordinal numbers (nine possibilities). The ordinal numbers are then placed one after another9 so that they form one single number. This number iis handled as the numerical model of a group of nasal sounds or a single nasal sound.</Paragraph> <Paragraph position="16"> The classification time of a sound by means of method described here is only a fraction of that when using the discrimination analys~s.</Paragraph> <Paragraph position="17"> Final comments Every language needs its own recognition program consisting of subprograms~ which can be very different. That the recognition program can be worked out implies that there is ~ sufficient amount of acoustic knowledge about the language in question.</Paragraph> <Paragraph position="18"> It is possible that the complete speech recognition doesn't succeed with the computers available~ so that we must waite so long that the biological computers are at our disposal. (contin. after the programs)</Paragraph> </Section> class="xml-element"></Paper>