File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/e91-1019_intro.xml
Size: 3,394 bytes
Last Modified: 2025-10-06 14:05:00
<?xml version="1.0" standalone="yes"?> <Paper uid="E91-1019"> <Title>AUTOMATIC LEARNING OF WORD TRANSDUCERS FROM EXAMPLES</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> Several tasks associated with electronic lexicons may be viewed as transductions between character strings. This may be the decomposition of words into morphemes in morphology or the grapheme-to-phoneme transcription in phonology. In the first case. one has for example to decompose the French word &quot;chronom~trage&quot; into the sequence of affixes &quot;chrono+m~tre+er/age&quot;. In the second, &quot;abstenlr&quot; should be translated into &quot;abstoniR&quot;,or &quot;apstoniR&quot; I. Most of the proposed methods in the IThese two tasks are in fact closely related in that {I) the correct phoneme transcription may mirror an underlying morphological structure, like for &quot;asoc/a/&quot; whose phonemic form is &quot;asos jal&quot; rather than &quot;azosjal&quot; due to the decomposition &quot;a+soclal&quot;, and (2) the surface form of a derived word may depend on the pronunclaUon of its component morphemes, llke for &quot;d~+harnacher&quot; which results in &quot;d~harnacher&quot; and not &quot;d~sharnachet&quot;.</Paragraph> <Paragraph position="1"> domain (Catach 1984; Danlos et al. 1986; Koskenniemi 1983; Laporte 1988; Ritchle et al. 1987, Tufts 1989; V6ronls 1988) are based on the availability of local rules whose combination, either through direct interpretation or by being compiled, form the target transducer.</Paragraph> <Paragraph position="2"> Although these methods make it possible - at least in theory - to design suitable transducers, provided that the rule descrlpUon language has the right expressive power, they are complex to use because of the difficulty of writing down rules. Moreover, for a given rule language, there may not exist an algorithm for compiling rules into a form better suited to the translation process. Lastly, in numerous cases, the translation procedures are improperly deterministic as shown by the example of&quot;abstcnlf so that it is not possible to consider several competing hypotheses in parallel not to speak of ranking them according to some certainty factor.</Paragraph> <Paragraph position="3"> We have designed a program which allows to construct transducers without retaming the above shortcomings. It is no longer necessary to write down translation rules since the transducer is obtained as the result of an automatic learning over a set of examples. The transducer is represented into the language of probabillstic finite state automata (Markov models\] so that its use is straightforward. Lastly, tt produces results which are assigned a probability and makes it possible to llst them by decreasing order of likelihood.</Paragraph> <Paragraph position="4"> After stating the problem of character strings translation and defining the few - 107 central notions of markovian learnJng, this paper describes their adaptation to the word translation problem in the learning and translating phases. This adaptation is illustrated through two applications: morphological analysis and grapheme-to-phoneme transcription.</Paragraph> </Section> class="xml-element"></Paper>