File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/94/p94-1013_relat.xml
Size: 3,337 bytes
Last Modified: 2025-10-06 14:16:04
<?xml version="1.0" standalone="yes"?> <Paper uid="P94-1013"> <Title>DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French</Title> <Section position="5" start_page="88" end_page="89" type="relat"> <SectionTitle> PREVIOUS WORK </SectionTitle> <Paragraph position="0"> The problem of accent restoration in text has received minimal coverage in the literature, especially in English, despite its many interesting aspects. Most work in this area appears to done in the form of in-house or commercial software, so for the most part the problem and its potential solutions are without comprehensive published analysis. The best treatment I've discovered is from Fernand Marly (1986, 1992), who for more than a decade has been painstakingly crafting a system which includes accent restoration as part of a comprehensive system of syntactic, morphological and phonetic analysis, with an intended application in French text-to-speech synthesis. He incorporates information extracted from several French dictionaries and uses basic collocational and syntactic evidence in hand-built rules and heuristics. While the scope and complexity of this effort is remarkable, this paper will focus on a solution to the problem which requires considerably less effort to implement.</Paragraph> <Paragraph position="1"> The scope of work in lexical ambiguity resolution is very large. Thus in the interest of space, discussion will focus on the direct historic precursors and sources of inspiration for the approach presented here. The central tradition from which it emerges is that of the Bayesian classifier (Mosteller and Wallace, 1964). This was expanded upon by (Gale et al., 1992), and in a class-based variant by (Yarowsky, 1992). Decision trees (Brown, 1991) have been usefully applied to word-sense ambiguities, and HMM part-of-speech taggers (Jelinek 1985, Church 1988, Merialdo 1990) have addressed the syntactic ambiguities presented here. Hearst (1991) presented an effective approach to modeling local contextual evidence, while Resnik (1993) gave a classic treatment of the use of word classes in selectional constraints. An algorithm for combining syntactic and semantic evidence in lexical ambiguity resolution has been realized in (Chang et al., 1992). A particularly successful algorithm for integrating a wide diversity of evidence types using error driven learning was presented in Brill (1993). While it has been applied primarily to syntactic problems, it shows tremendous promise for equally impressive results in the area of semantic ambiguity resolution. null 2Such a tool would particularly useful for typing Spanish or French on Anglo-centric computer keyboards, where entering accents and other diacritic marks every few keystrokes can be laborious.</Paragraph> <Paragraph position="2"> The formal model of decision lists was presented in (Pdvest, 1987). I have restricted feature conjuncts to a much narrower complexity than allowed in the original model- namely to word and class trigrams. The current approach was initiMly presented in (Sproat et al., 1992), applied to the problem of homograph resolution in text-to-speech synthesis. The algorithm achieved 97% mean accuracy on a disambiguation task involving a sample of 13 homographs 3.</Paragraph> </Section> class="xml-element"></Paper>