File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2003_evalu.xml

Size: 8,829 bytes

Last Modified: 2025-10-06 13:59:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2003">
  <Title>Induction of Cross-Language Affix and Letter Sequence Correspondence</Title>
  <Section position="6" start_page="20" end_page="22" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> We have run the algorithm on several language pairs using affixal morphology and the Latin alphabet: English vs. Spanish, Portuguese and Italian, and Spanish vs. Portuguese. All of them are related both historically and through borrowing (obviously at varying degrees), so we expect relatively many correspondence phenomena.</Paragraph>
    <Paragraph position="1"> Testing results for one of these pairs, English -Spanish, are presented in this section.</Paragraph>
    <Paragraph position="2"> The input word pair set was created from a bi-lingual dictionary (Freelang04) by taking all translations of single English words to single Spanish words, generating about 13,000 word pairs.</Paragraph>
    <Paragraph position="3"> Individual letter mapping. The cost matrix after EM convergence (25 iterations) exhibits the following phenomena (e:s (c) denotes that the final cost of replacing the English letter e by the Spanish letter s is c): (1) English letters mostly map to identical Spanish letters, apart from letters that Spanish does not make use of like k and w; (2) some English vowels map frequently to some Spanish vowels: y maps almost exclusively to i (0.01), e:a (0.47) is highly productive, e:o (0.98), i:e (0.97), e:o (0.98); (3) some English consonants map to different Spanish ones: t:c  (0.89) (due to an affix, -tion:-cion); m:n (0.44) is highly frequent; b:v(0.80); x:j (0.78), x:s(0.94); w always maps to v; j:y (0.11); (4) h usually disappears, h:NULL (0.13); and (5) inserted Spanish letters include the vowels o, e, a and i, at that order, where o overwhelms the others. The English o maps exclusively to the Spanish o and not to other vowels.</Paragraph>
    <Paragraph position="4"> Affixes. Table 1 shows some of the conspicuous affix pairs discovered by the algorithm. We show both the number of witnesses and of squares.</Paragraph>
    <Paragraph position="5"> The table shows many interesting correspondence phenomena. However, discussing those at depth from a linguistic point of view is out of the scope of this paper. Some notes: (1) some of the most frequent affix pairs are not that close orthographically: -ity:-idad, -ness:- (nouns), -ate:-ar (verbs), -ly-:-mente (adverbs), -al:-o (adjectives), so will not necessarily be found using ordinary edit distance methods; (2) some affixes are ranked high both with and without a letter that they favor when attaching to a stem: -ation:acion, -ate:-ar; (3) some English suffixes map strongly to several Spanish ones: -er:-o, -er:ador. null Recall that the table cannot include inflectional affixes, since our input was taken from a bilingual dictionary, not from a text corpus.</Paragraph>
    <Paragraph position="6"> Letter sequences. Table 2 shows some nice pairings, stemming from all three expected phenomena: st-:est- (due to phonology), ph:f, th:t, ll:l (due to orthography), and tion:cion, tia:cia (due to morphology: affixes located in the middle of words.) Such affix and letter sequence pairing results can clearly be useful for English speakers learning Spanish (and vice versa), for remembering words by associating them to known ones, for avoidance of spelling mistakes, and for analyzing previously unseen words.</Paragraph>
    <Paragraph position="7"> Evaluation. An unsupervised learning model can be evaluated on the strength of the phenomena that it discovers, on its predictive power for unseen data, or by comparing its data analysis results with results obtained using other means. We have performed all three evaluations.</Paragraph>
    <Paragraph position="8"> For evaluating the discovered phenomena, a repository of known phenomena is needed. The only such repository of which we are aware are language learning texts. Unfortunately, the phenomena these present are limited to the few most conspicuous pairs (e.g., -ly:-mente, -ity:-idad, ph:f), all of which are easily discovered by our model. The next best thing are studies that present data of a single language. We took the affix information given in a recent, highly detailed, corpus based English grammar (Biber99), and compared it manually to ours. Of the 35 most productive affixes, our model finds 27. Careful study of the word pair list showed that the remaining 8 (-ment, -ship, -age, -ful, -less, -en, dis, mis-) indeed do not map to Spanish ones frequently. Note that some of those are indeed extremely frequent inside English yet do not correspond significantly with any Spanish affix.</Paragraph>
    <Paragraph position="9"> As a second test, we took a comprehensive English-Spanish dictionary (Collins), selected 10 pages at random (out of 680), studied them, and listed the prominent word form phenomena (85).</Paragraph>
    <Paragraph position="10"> All but one (the verbal suffix in seduce:seducir) were found by our model.</Paragraph>
    <Paragraph position="11"> The numbers reported above for the two tests are recall numbers. To evaluate affix precision, we have manually graded the top 100 affix pairs (as sorted at the end of stage 2 of the algorithm.) 8 of those were clearly not affixes; however, 3 of the 8 (-t:-te, -t:-to, -ve:-vo) were important phonological phenomena that should indeed appear in our final model. Of the remaining 92, 15 were valid but 'duplicates' in the sense of being sub-strings of other affixes (e.g., -ly:-mente, -ly:emente.) In the next 50 pairs, only 6 were clearly not affixes. Note that by their very definition, we should not expect the number of frequent derivational affixes to be very large, so there is not much point in looking further down the list.</Paragraph>
    <Paragraph position="12"> Nonetheless, inspection of the rest of the list reveals that it is not dominated by noise but by duplicates, with many specialized, less frequent affixes (e.g., -graphy:-grafia) being discovered.</Paragraph>
    <Paragraph position="13"> Regarding letter sequences, precision was very high: of the 38 different pairs discovered, only one (hr:r) was not regular, and there were 11 duplicates. Recall was impressive, but harder to verify due to the lack of standards. We found only one (not very frequent) pair that was not discovered (-sp:-esp).</Paragraph>
    <Paragraph position="14"> To evaluate the model on its data analysis capability, we took out 100 word pairs at random, trained the model without them, analyzed them using the final cost function, and compared with prominent phenomena noted manually (again, we had to grade manually due to the lack of a gold standard.) The model identified those prominent phenomena (including a total lack thereof) in 91 of the pairs. Notable failures included the pairs superscribe : sobrescribir and coded : codificado, where none of the prefixes and suffixes were  identified. Some successful examples are listed below (affixes are denoted by [], sequences by &lt;&gt;, and insert by _: or :_): installation : instalacion. &lt;ll:l&gt;, [ation:acion] volution : circonvolucion. _:c, _:i, _:r, _:c, _:o, _:n, [tion:cion] intelligibility : inteligibilidad. [in:in], &lt;ll:l&gt;, [ity:idad] sapper : zapador. &lt;s:z&gt;, &lt;pp:p&gt;, [er:ador] harpist : arpista. &lt;h:_&gt;, [ist:ista] pathologist : patologo. &lt;th:t&gt;, [ist:o] elongate : prolongar. [te:r] industrialize: industrializar. [in:in], &lt;ial&gt;, [e:ar] demographic : demografico. &lt;ph:f&gt;, [ic:ico] gynecological :ginecologico. &lt;yn:in&gt;, [ical:ico] peeled : pelado. [ed:ado] The third and final evaluation method is to compare the model's results with results obtained using other means. We are not aware of any data bank in which cross-language affix or letter sequence correspondences are explicitly tagged, so we had used a relatively simple algorithm as a baseline: We invoked the squares method for each language independently, ending up with affix candidates. For every word pair E:S, if E contains an affix candidate C and S contains an affix candidate D, we increment the count of the candidate affix pair C:D. Finally, we sort the candidates according to their count.</Paragraph>
    <Paragraph position="15"> Baseline recall is obviously as good as in our algorithm (it produces a superset), but precision is so bad so as to render the baseline method useless: out of the first 100, only 19 were affixes, the rest being made up of noise and badly segmented 'duplicates'.</Paragraph>
    <Paragraph position="16"> In summary, the results are good, but gold standards are needed for a more consistent evaluation of different cross-language word form algorithms. Results for the other language pairs were overall good as well.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML