File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/w98-0906_abstr.xml
Size: 27,067 bytes
Last Modified: 2025-10-06 13:49:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0906"> <Title>Loanword formation: a neural network approach</Title> <Section position="1" start_page="0" end_page="52" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Loanword phonology seeks to model the process by which foreign words are 'nativised' or incorporated into the phonological system of the 'borrowing' language. We can conceive of this as a parsing of the phonetic input provided by the foreign word forms, in accordance with phonological output constraints of the borrowing language.</Paragraph> <Paragraph position="1"> Following Silverman (1992) we conceive loanword formation as fundamentally a two-stage process: the first of which yields a parsing of the phonetic input into segmentally organised phonetic feature bundles, interpretable as segmental targets in the borrowing language. In the second stage of processing, these segmental targets are parsed into phonological structures (syllables, mora, feet...etc) compatible with the word-prosody of the borrowing language.</Paragraph> <Paragraph position="2"> Japanese borrowings from English provide a good test-bed for models of loanword formation, because examples are abundant and, while the segmental mapping from English to Japanese is relatively straightforward, their respective word-level prosodies are strikingly different, providing ample opportunity to observe prosodic restructuring in loanword formation.</Paragraph> <Paragraph position="3"> to construct 'symbolic' or 'analytical' parsing algorithms for this task, either with or without reference to a framework of learnability theory. This approach came under strong challenge from 'empirical' Connectionist models of language processing in the 1980's.</Paragraph> <Paragraph position="4"> The debate between these competing paradigms, or the search for some suitable 'hybrid', continues. It may be argued that loanword formation provides a more restrictive, and hence better controlled, environment for studying parsing mechanisms than other natural language processing tasks, which involve a host of lexical, morphological, syntactic or pragmatic influences.</Paragraph> <Paragraph position="5"> In this paper, we report the results obtained from a feed-forward neural network, trained on an 1100 word corpus of American English loanwords &quot;borrowed' into Japanese, post-WWII. We make preliminary comparisons with an analytical constraint-based approach to modeling loanword formation.</Paragraph> <Paragraph position="6"> An intended application for the NN parser, was to devise an English-Japanese proper or place-name translator, which would map an English phoneme sequence into Katakana (e.g.: Brisbane/brIzbOn/=> 7&quot; ,J z&quot; ~,&quot; :, ). The theoretical aim of this study is to investigate the learning mechanisms required for phonological parsing in loanword formation. Phonologists have typically sought Introduction The Japanese language has borrowed thousands of words from English, particularly since World War II, under the overwhelming economic and cultural influence of the United States. This massive borrowing over a comparatively short period of time provides a unique window on processes of loanword formation. Borrowed words may reflect varying degrees of nativisation to the phonological patterns of the borrowing language.</Paragraph> <Paragraph position="7"> While the segments and phonetic features of English words tend to be remarkably well preserved by the process of loanword formation, the resulting Japanese word forms are so completely transformed in terms of their prosodic structure that English listeners almost invariably fail to recognize their English sources, when loanwords are presented to them as isolated words carefully spoken by a native speaker of Japanese (Ingram, 1998). The main factor underlying poor recognition of the English source words appears to lie in the extensive resyllabification, involving vowel epenthesis, which is required to parse the segmental input into Japanese prosodic frames. Some examples are given below: orthographic phonemic Japanese Olympic olimpik oriNpikku truck trak torakku cut kat katto cud kad kado cart kart kaato cat kmt kJatto Japanese syllable structure is predominantly CV, lacking complex onsets and codas. To maintain faithful representation of the segment structure of the English source words, extensive use is made of the epenthetic vowels (chiefly, ha/, /o/ and /I/). The temporal structure of the English source word is converted to Japanese moraic timing. English tense (long) vowels and diphthongs are usually treated as two-mora units, whereas lax (short) vowels are assigned to a single mora. (Stress pattern in the English source word plays a moderating role. Tense vowels or diphthongs in unstressed syllables may emerge as one-mora vowels, as in Olympic \[ollmptk\].) English voiceless obstruents following a lax vowel are almost always treated as geminate (two mora) stops. But voiced obstruents usually do not geminate in this environment in loanword formation. Also, if the preceding vowel is tense (long) in the English source word, gemination of voiceless obstruents does not occur in the loanword. The reasons for these timing changes need not concern us here.</Paragraph> <Paragraph position="8"> The segmental mapping from the English source word to segments in the Japanese loans is basically one-for-one, without any feature exchange between adjacent segments. There are some palatalized consonants in Japanese borrowings which appear to 'pick up' the palatalization feature from the following vowel (see, cat above). However, this featureswapping between adjacent segments occurs only under quite restrictive conditions. Note that Japanese has no sound which corresponds to the low front English/~e/. We hypothesise that Japanese listeners respond to the 'palatal' quality of English /m/, by relocating this feature onto the preceding consonant, thereby regularizing the foreign phonetic contrast to Japanese phonemics.</Paragraph> <Paragraph position="9"> The examples given above illustrate the major phonological transformations required to parse English source words into English loanwords in Japanese, within the limits of a phonemic transcription. Implicit in these transformations, though they are not explicitly captured by the phonemic representation, are the contrasting prosodic structures, such as syllable structure constraints, which motivate the transformations.</Paragraph> <Paragraph position="10"> Modeling loanword formation In an 'analytic' account of loanword formation, these prosodic constraints would be explicitly represented in the form of re-write rules, filters or constraints on well-formedness for particular aspects of prosodic structure. The output of parsing would take the form of assignment of an explicit prosodic phrase marker. By contrast, in a NN based account of loanword formation, the details of prosodic feature assignment are not explicitly represented, but are typically regarded as emergent features, contained in the weightstates of the trained network. The NN demonstrates acquisition of the prosodic constraints on word formation by being able to transform phonetic segments and features of input source words into well-formed phonemic sequences in the borrowing language. (The further step of transforming Japanese phonemic representations into Katakana script is trivial.) In a complete model of loanword formation, it would of course be necessary to specify more precisely how the stage I extraction of phonetic segments and features from the speech signal is achieved and what perceptual filtering takes place on the segmental phonetic properties of foreign words as processed through the ear of the native listener. These initial auditory representations are not, strictly speaking, sequences of phonemic segments or features. Rather, they are likely to be evanescent and somewhat fragmentary, made up of phoneme-like segments and 'foreign' phonetic features of sufficient auditory saliency that they cannot be ignored by the perceptual mechanism.</Paragraph> <Paragraph position="11"> However, for purposes of modeling stage II processes of loanword formation, whether an analytical or a statistical NN model is adopted, an initial segmental parsing of the input word into sequences of phoneme-like feature bundles was assumed (see Appendix I for the segments and features used).</Paragraph> <Paragraph position="12"> The Database We compiled a database of 1100 words from a dictionary of neologisms borrowed mainly from American usage in the post-war era (Bailey, 1962).</Paragraph> <Paragraph position="13"> The English phonemic transcriptions of these words were obtained from the Carnegie</Paragraph> <Section position="1" start_page="46" end_page="47" type="sub_section"> <SectionTitle> Mellon Pronouncing Dictionary </SectionTitle> <Paragraph position="0"> ( ftp://ftp.cs.cmu.edu/project/fgdata/dict/) which generally reflects American rather than British pronunciation.</Paragraph> <Paragraph position="1"> Network architecture We use a two-layer feed-forward neural network with 65 inputs, 20 hidden units and 53 outputs. A featural representation is employed for the (English) input and a phonemic representation for the (Japanese) output. This architecture was inspired by NETtalk (Sejnowski & Rosenberg, 1987) although the task performed is, in a sense, the reverse of that performed by NETtalk, since the latter used an orthographic representation for its input and a featural representation for its output.</Paragraph> <Paragraph position="2"> If the aim of the exercise were to model the way humans leam the task of loanword formation, it would be more appropriate to train the network on some task involving only the target language, prior to testing it on input from the source language. In that scenario, the training task would presumably involve deriving the correct surface form from some hypothesised underlying structure. However, many choices and assumptions would need to be made in order to devise an appropriate framework and representation for these underlying structures. By avoiding such choices, our approach has the advantage of allowing phonological constraints to be studied in a more canonical context. But it means that the network is really performing a composite task, since it can make use of the constraints and statistics of the source language as well as those of the borrowing language.</Paragraph> <Paragraph position="3"> The 65 inputs are divided into 5 groups of 13, which encode the phonological features of the current phoneme, the two preceding phonemes and the two following phonemes (13 features x 5 phonemes = 65 inputs in all).</Paragraph> <Paragraph position="4"> The featural input representation has several advantages over a phonemic one:</Paragraph> <Paragraph position="6"> it reduces the number of inputs, features often influence the form of loanwords in a systematic way, the same word is often rendered differently in different dialects (for example, British vs. American English) and the featural representation is less sensitive to this than a phonemic one would be.</Paragraph> <Paragraph position="7"> Input and output phonemes do not always correspond on a 1-to-1 basis. In some cases a phoneme may be deleted, or it may have a consonant and/or a vowel appended to it. In order to allow for these possibilities, we divide the outputs of our network into three groups. The first group has one output for each possible phoneme (consonant or vowel); the second group has one output for each possible consonant; the third group has one output for each possible vowel. Each group has one additional output representing the &quot;empty' phoneme &quot;_&quot;. Since there are 20 consonants and 5 vowels in Japanese, the total number of outputs is 26+21 +6=53.</Paragraph> <Paragraph position="8"> For example, consider the English word cat which has the phonemic representation/k~et/ and becomes/kjatto/in Japanese. The network views this as three separate training items:</Paragraph> <Paragraph position="10"> This means that the network, when presented with the features encoding the input/__k~et/, should be trained to produce an activation of 1.0 for the/k/output of the first group, the/j/ output of the second group and the/_/(empty) output of the third group (and an activation of 0.0 for the other 50 outputs).</Paragraph> <Paragraph position="11"> When it comes to the testing phase, within each group the output with the largest activation is selected, and this determines the three-phoneme sequence chosen by the network to correspond with the salient input phoneme.</Paragraph> <Paragraph position="12"> The networks were trained by back-propagation (Rumelhart et al., 1986) for 100 epochs, with a learning rate of 0.01 and a momentum of 0.9. The cross-entropy minimization criterion was used.</Paragraph> </Section> <Section position="2" start_page="47" end_page="48" type="sub_section"> <SectionTitle> Results </SectionTitle> <Paragraph position="0"> Each of 11 networks was trained on 1000 words from the database, and tested on the other lO0 words. Each word occurred in the test set of exactly one network.</Paragraph> <Paragraph position="1"> The 1100 words in the database had an average of 8.8 phonemes per word, making a total of 9658 input phonemes. Each of these input phonemes can produce output consisting of a head phoneme (group 1 outputs), plus an optional added consonant (group 2) and/or added vowel (group 3).</Paragraph> <Paragraph position="2"> Figures 1-3 show the percentage of errors on the training and test sets for each of these three groups. In our data set, the head phoneme was nonempty 97% of the time, while the added consonant and added vowel were nonempty only 4% and 17% of the time, respectively, so the network error is much smaller for the latter two groups. After 30 epochs the training and test errors, respectively, reach a level of 7.4% (resp.</Paragraph> <Paragraph position="3"> 9.9%) for the head phoneme, 1.1% (resp. 2%) for the added consonant, and 1.1% (resp.</Paragraph> <Paragraph position="4"> 1.6%) for the added vowel. After this, the training error continues to fall while the test error levels off. (Note: the test error was computed at the end of each epoch, while the training error was computed during the epoch. Therefore the training error may exceed the test error in the first few epochs.)</Paragraph> </Section> <Section position="3" start_page="48" end_page="52" type="sub_section"> <SectionTitle> Error Analysis </SectionTitle> <Paragraph position="0"> An analysis was undertaken of all 'errors': cases where there was a discrepancy between the romanji (Japanese Romanized) representation of a loan word and the phonemic representation assigned by the fully trained network, when the item in question was not included in the training set. These are summarised in Table 1. The 'error' categories are not necessarily mutually exclusive; nor do they necessarily indicate an error on the part of the network, but simply a discrepancy between the dictionary-based phonemicization (Romanji transcription) and that assigned by the network.</Paragraph> <Paragraph position="1"> Discrepancies of Schwa vowel colouring: The most common 'error' or discrepancy between the network-assigned Japanese phonemicization and the Romanji dictionary with a schwa \[o\] in English phonemic representations. Japanese has no equivalent to schwa and Romanji transcriptions of such vowels are guided by English spelling in the selection of an appropriate symbol. Because the network had no access to the orthographic representations of English source words, these discrepancies were frequent.</Paragraph> <Paragraph position="2"> Discrepancies of vowel length: The second most frequent error involved discrepancies of vowel length. English tense vowels and diphthongs should be perceived as long (two mora) vowels in Japanese.</Paragraph> <Paragraph position="3"> However, stress and position in the word may act as moderating influences. In primary stressed position English vowels are lengthened, while reduced vowels in unstressed syllables may be very short.</Paragraph> <Paragraph position="4"> Therefore, tense English vowels in unstressed position may not be perceived as long (bimoraic) by Japanese listeners.</Paragraph> <Paragraph position="5"> An analysis of discrepancies between the network predictions and the romanji assigned vowel durations (Table 2) revealed that only 13% of cases (example \[1\]) could be accounted for by shortening of a tense vowel in an unstressed syllable.</Paragraph> <Paragraph position="6"> 2. a k ut i b i t i i z u activities a k uC i b i t i z u ~ektivotiiz 42 39% 3. a n a r o J i i analogy a n a r o J i on~elo~ii 21 20% 4. a d ob a N t e eJ i advantage 3 03% a_ddob e N C i JJi odvaenticl3 5. a N C o b e anchovy 14 13% a N C o o b i i ~entJ'ouvii 6. C e kkuo f u checkoff C_e_k o_o_f u t~ekoof 8 07% 7. a u too b ufa SS o N out-of-fashion au ut oa b ufa SS o N autavf~eSan 3 03% *Source of discrepancy indicated in bold. In the majority of cases (59%), the discrepancy between the romanji and the network assigned vowel length was caused by the network shortening phonemically long/tense vowels in final position (example 2\[2.\]) or elsewhere in the word (2\[3\]). We are presently unable to account for this behaviour of the network. In 13% of cases (2\[5.\]) vowel length discrepancies could be sourced to irregular romanization on the part of the compilers of the Japanese dictionary, or to errors of vowel length phonemicization in the the American dictionary (7%, 2\[6\]). The influence of English spelling could be clearly seen in the romanji forms in 3% of cases: e.g., advantage => adobaNteeJi because age => eeJI, though the pronunciation \[edv~entI~\] indicates the vowel is short/lax. In a small proportion of cases (3%, 2\[7\]) the vowel length discrepancy appeared to be attributable to the network's allowance of three-mora vowel sequences within a single syllable (super-heavy syllables), not sanctioned by Japanese syllable structure.</Paragraph> <Paragraph position="7"> Gemination of obstruents: The basic rule of gemination for English loanwords is that voiceless obstruents geminate following short vowels in stressed syllables. Voiced stops also geminate irregularly in this environment. The most common error of gemination involved the network failing to geminate a voiceless obstruent in the expected environment (Table 3, example \[1\]). But the network also inappropriately produced geminates following an unstressed vowel (312\]), though not consistently. We observed that gemination did not occur in romanji forms derived from a consonant cluster in the English word (e.g.: the/k/in vector does not geminate, becoming Romanji/bekutoru/). However, this constraint was not respected by the network (3\[3\]). The network occasionally produced geminate consonants linked to its propensity to shorten vowels (3\[4\]). Gemination of voiced obstruents was irregular in the Romanized forms (3\[5,7\]) and consequently in the network output as well (3\[4,5\]). Nor was the gemination of voiceless obstruents entirely regular in the romanji forms (3\[9\]).</Paragraph> <Paragraph position="8"> 1. a N r a kk i i unlucky-net a N r a k i i onlakiinet 18 2. a n e k ud o o t o anecdote a n e kkud o t o ~nokdout 12 3. b e k ut o r u vector b e kkut o a vektor 10 4. f ii d ob a kku ~edback f i ddob a kku fiidbmk 5. b o bbus ur e e bob-sleigh 3 b o b us ur e e bobslei 6. a d ob a N s u advance a ddob a N s u adv~ns 7. a d or I b u ad-lib a ddor I b u aedlib 8. a_i_ky_a CC a a eye-catcher 2 a i kk a CC a a aik~etfer 9. fe t i S i z u m u fetishism fe tt i S i z u m u fetifizem * The inferred site of error is indicated in bold. Vowel epenthesis: Errors of vowel epenthesis are of particular interest in assessing the network's capacity to adapt to Japanese syllable structure.</Paragraph> <Paragraph position="9"> Epenthetic vowels are very frequent in English loan words (averaging 1.28 per word in romanji forms in the current data set, or slightly over 1400 occurrences). However, only 39 discrepancies of vowel epenthesis were observed (an error rate of 3%).</Paragraph> <Paragraph position="10"> Furthermore, the largest sub-category of epenthesis errors were related to word boundaries in compound forms, (see Table 4). desired output English-orthography Incidence actual output_ English-phonemes a t o m i kkue i J i atomic-age a t o m i kk e e J i otomikeid33 d or a i b ui N drive-in d or a i b i N draivin ll *Epenthesis associated with boundary in bold The network, having no information about word boundaries, could not predict epenthetic vowels at word endings in compound forms. However, such boundaries are inherently problematic in loan word formation, as they may or may not be apparent to speakers of the borrowing language.</Paragraph> <Paragraph position="11"> The quality of the epenthetic vowel (\[I\], \[u\], \[a\], \[e\], \[o\]) was also well predicted by the network, with only 10 discrepancies (<1%); in all cases caused by irregular romanization, some of which were clearly related to English spelling.</Paragraph> <Paragraph position="12"> Dipthongs /ou/ and /ez/: The diphthongs /ou/ and /eI/ are usually represented as long vowels 'oo' and 'ee' in romanji, but not consistently. In approximately 40% of cases, English/el/was rendered as Romanji 'ei' and in 32% of cases English/ou/converted to the short vowel 'o' in Romanji. These irregularities in the romanji representation of the diphthongs/ei/and/ou/ are the probable cause of the occasional discrepancies with the network predictions. In summary, the network's performance in predicting the phonological forms of loan words in Japanese, was on the whole, very good, except for the features of vowel quality in reduced syllables of English words, and in the prediction of vowel length and consonant gemination in romanji forms. Except in the case of English vowels in reduced syllables, spelling was found to play a subordinate role to the phonological features of words in the source language. Some improvement in the prediction of vowel length, and substantial improvement in the prediction of consonant gemination may be expected by providing the network access to the locus of primary stress in English words. We are currently investigating this.</Paragraph> <Paragraph position="13"> We confine discussion to observations on the main practical and theortical objectives of this on-going study. With respect to the goal of devising an English-Japanese proper or place-name translator, which will convert English phonemic representations to romanji or kana forms, we find the results encouraging. With the inclusion of primary stress, and access to orthographic representations for the prediction of vowel quality in reduced syllables, it should be possible to obtain near optimal performance for the prediction of romanji or kana forms, given a degree of indeterminism that is present in loan word dictionary entries. Precisely how English orthographic information can be incorporated as required is a problem that we have not yet addressed. For the data set of our study (post-war borrowings of general lexical items), it is clear that English spelling plays a strictly subordinate role to the phonetic form, as perceived by the Japanese listener. There is typically much disagreement amongst Japanese lexicographers on the romanji or kana representations of English place names. This is probably because lack of exposure to the spoken form promotes greater reliance on the (highly irregular) English orthographic representation.</Paragraph> <Paragraph position="14"> With respect to the theoretical goal of the study, an anonymous reviewer made the following astute observation: &quot;If the intent is to 'investigate the learning mechanisms required for phonological parsing in loanword formation,' then training on a corpus of loanwords is a surprising choice, since it is usually assumed that loanword phonology is not learned separately but is a side effect of having trained on the internal phonology of the target language&quot;. In other words, perhaps there is an irreconcilable conflict between the engineering and scientific goals of the study. The most direct approach to the problem fi'om an engineering perspective is to train the neural network to perform the mapping between English phonemic and Japanese romanji representations. But this seems clearly the wrong approach from the perspective of psycholinguistic modeling of loan word fomaation, where we postulate an initial stage of segmental phonetic mapping, subsequently constrained by the word prosody of the borrowing la.nguage.</Paragraph> <Paragraph position="15"> It is certainly true that our NN modeling has at least simplified some of the processing which we hypothesise takes place, firstly in accepting as input (American) English phonemic representations, where our model postulates quasi-phonemic segmental representations in the borrowing language. We have, in other words, fudged on a level of phonetic to phonemic segmental mapping. This has had a demonstrable, but minor impact on the accuracy of the network performance.</Paragraph> <Paragraph position="16"> However, the more serious objection that the network should have been trained on native Japanese words may be addressed in the following way. From the perspective of constraint satisfaction, we require an input that highly over-generates with respect to the target forms, but which nevertheless contains all the segmental features to which the output should strive to be faithful. We have not attempted to directly simulate the generative capacity of GEN, but rather, provided a mechanism which ensures that segmental faithfulness is met, within the over-generative capacities of the three segment window of the NN architecture. In other words, we argue that by training on the English-Japanese mapping, we have indirectly or approximately simulated the over-generative capacity of GEN, while providing the network with all (and only) the segmental phonetic input to which it is required to be faithful. It is not clear to us how this could be accomplished by training exclusively within the target language.</Paragraph> <Paragraph position="17"> However, even if this argument carries weight, the second theoretical leg of this study remains to be undertaken: the construction of an analytical (rule-based) competitor to the NN model, and the systematic testing of both against the behaviour of native Japanese speakers' intuitions. We are grateful for the assistance of Chiharu Tsuratani in providing native speaker assessments of the NN responses. This testing needs to be more systematically pursued in further investigations.</Paragraph> </Section> </Section> class="xml-element"></Paper>