File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/x96-1053_evalu.xml

Size: 2,823 bytes

Last Modified: 2025-10-06 14:00:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="X96-1053">
  <Title>MITRE: DESCRIPTION OF THE ALEMBIC SYSTEM AS USED IN MET</Title>
  <Section position="4" start_page="461" end_page="461" type="evalu">
    <SectionTitle>
RESULTS
</SectionTitle>
    <Paragraph position="0"> The preliminary nature of the MET task precludes formulating a full assessment of our system's performance. Nevertheless, we are pleased with our early results. Alembic either exceeded or came near matching its performance on the English name-tagging task in MUC-6. The chart in Fig. 2 shows the relative rankings of the four languages (solid bars indicate training, and shaded ones formal testing).</Paragraph>
    <Paragraph position="1"> These results show gaps between training and testing performance, especially in the two Asian languages. Part of these differences can be attributed to inconsistencies that were eventually detected in the  final test data. This may account for much of the Io% training-to-testing gap in Chinese. Indeed, on a held-out development test set, Chinese performance was virtually identical to that on the development training set; the learning procedure had thus acquired a very predictive model of the development data overall. However, since the tagging conventions on the formal test set were not wholly consistent with those in the training set, the performance of the model could only be expected to decrease in the final evaluation. For Japanese, a similar problem arose because refinements to the guidelines over the course of MET development were not reflected in the development data set. Since our Japanese developer could not actually read most of the Japanese material, he could only interpret changes to the guidelines in so far as they were incorporated in the training set.</Paragraph>
    <Paragraph position="2"> As the guidelines and training set drifted further apart, this led increasingly to the same inconsistencies we experienced with Chinese.</Paragraph>
    <Paragraph position="3"> We should not let these error analyses obscure Alembic's achievements, however. The system garnered commendable scores on all three languages, despite its developers having at best passing linguistic fluency--and in one case no language knowledge at all. We think this success is due to several factors.</Paragraph>
    <Paragraph position="4"> First, the inherent speed of the system (25,000-30,000 words per minute) enables a rapid-evaluation methodology. For manual engineering, this allows changes in the model to be implemented and tested efficiently. Second, Alembic supports the developer through a growing suite of tools, chief among them the phrase rule learner. Finally, we owe the bulk of the system's success to the underlying framework with its emphasis on sequences of simple rules.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML