File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/p94-1013_evalu.xml

Size: 3,325 bytes

Last Modified: 2025-10-06 14:00:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1013">
  <Title>DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French</Title>
  <Section position="7" start_page="92" end_page="93" type="evalu">
    <SectionTitle>
EVALUATION
</SectionTitle>
    <Paragraph position="0"> Because we have only stripped accents artificially for testing purposes, and the &amp;quot;correct&amp;quot; patterns exist on-line in the original corpus, we can evaluate performance objectively and automatically. This contrasts with other classification tasks such as word-sense disambiguation and part-of-speech tagging, where at some point human judgements are required. Regrettably, however, there are errors in the original corpus, which can be quite substantial depending on the type of accent. For example, in the Spanish data, accents over the i (1) are frequently omitted; in a sample test 3.7% of the appropriate i accents were missing. Thus the following results must be interpreted as agreement rates with the corpus accent pattern; the true percent correct may be several percentage points higher.</Paragraph>
    <Paragraph position="1"> The following table gives a breakdown of the different types of Spanish accent ambiguities, their relative frequency in the training corpus, and the algorithm's performance on each: 1deg  As observed before, the prior probabilities in favor of the most common accent pattern are highly skewed, so one does reasonably well at this task by always using the most common pattern. But the error rate is still  roughly 1 per every 75 words, which is unacceptably high. This algorithm reduces that error rate by over 65%. However, to get a better picture of the algorithm's performance, the following table gives a breakdown of results for a random set of the most problematic cases - words exhibiting the largest absolute number of the non-majority accent patterns. Collectively they constitute the most common potential sources of error.</Paragraph>
    <Paragraph position="2">  Evaluation is based on the corpora described in the algorithm's Step 2. In all experiments, 4/5 of the data was used for training and the remaining 1/5 held out for testing. More accurate measures of algorithm performance were obtained by repeating each experiment 5 times, using a different 1/5 of the data for each test, and averaging the results. Note that in every experiment, results were measured on independent test data not seen in the training phase.</Paragraph>
    <Paragraph position="3"> It should be emphasized that the actual percent correct is higher than these agreement figures, due to errors in the original corpus. The relatively low agreement rate on words with accented i's (1) is a result of this. To study this discrepancy further, a human judge fluent in Spanish determined whether the corpus or decision list algorithm was correct in two cases of disagreement. For the ambiguity case of mi/ml, the corpus was incorrect in 46% of the disputed tokens. For the ambiguity anuncio/anunciS, the corpus was incorrect in 56% of the disputed tokens. I hope to obtain a more reliable source of test material. However, it does appear that in some cases the system's precision may rival that of the AP Newswire's Spanish writers and translators.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML