File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0109_intro.xml
Size: 2,525 bytes
Last Modified: 2025-10-06 14:02:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0109"> <Title>Multilingual Noise-Robust Supervised Morphological Analysis using the WordFrame Model</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> The development of the WordFrame model was motivated by work originally presented in Yarowsky and Wicentowski (2000). In that work, a suite of unsupervised learning algorithms and a supervised morphological learner are co-trained to achieve high accuracies for English and Spanish verb inflections. The supervised learner employed a na&quot;ive approach to morphology, only capable of learning word-final stem changes between inflections and roots. This &quot;end-of-string model&quot; of morphology was used again in Yarowsky et al. (2001) where it was applied to English, French and Czech. (More complete details of the end-of-string model are presented in Section 3.3.1.) Though simplistic, this end-of-string model is robust to noise, especially important in co-training with low-accuracy unsupervised learners. However, the end-of-string model relied heavily upon externally provided, noise-free lists of affixes in order to correctly align inflections to roots. The WordFrame model allows, but does not require, such affix lists, thereby eliminating direct human supervision.</Paragraph> <Paragraph position="1"> Much previous work has been done in automatically acquiring such affix lists, most recently the generative models built by Snover and Brent (2001) which are able to identify suffixes in English and Polish. Schone and Jurafsky (2001) use latent semantic analysis to find prefixes, suffixes and circumfixes in German, Dutch and English. Baroni (2003) treats morphology as a data compression problem to find English prefixes.</Paragraph> <Paragraph position="2"> Goldsmith (2001) uses minimum description length to successfully find paradigmatic classes of suffixes in a number of European languages, including Dutch and Russian, though the approach has been less successful in handling prefixation.</Paragraph> <Paragraph position="3"> The Boas project (Oflazer et al., 2001), (Hakkani-T&quot;ur et al., 2000), and (Oflazer and Nirenburg, 1999) has produced excellent results bootstrapping a morphological analyzer, but rely on direct human supervision to produce two-level rules (Koskenniemi, 1983) which are then compiled into a finite state machine. null</Paragraph> </Section> class="xml-element"></Paper>