File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2116_evalu.xml

Size: 3,072 bytes

Last Modified: 2025-10-06 13:59:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2116">
  <Title>Empirical Acquisition of Differentiating Relations from Definitions</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The evaluation discussed here assesses the quality of the information that would be added to the lexicons with respect to relation disambiguation, which is the focus of the research. An application-oriented evaluation is discussed in (O'Hara, forthcoming), showing how using the extracted information improves word-sense disambiguation.</Paragraph>
    <Paragraph position="1"> All the definitions from WordNet 1.7.1 were run through the differentia-extraction process. This involved 111,223 synsets, of which 10,810 had preprocessing or parse-related errors leading to no relations being extracted. Table 1 shows the frequency of the relations in the output from the differentia extraction process. The most common relation used is Theme, which occurs four times as much compared to the annotations. It is usually annotated as the sense for 'of,' which also occurs with roles Source, Category, Ground, Agent, Characteristic,andExperiencer. Some of these represent subtle distinctions, so it is likely that the difference in the text genre is causing the classifier to use the default more often.</Paragraph>
    <Paragraph position="2"> Four human judges were recruited to evaluate random samples of the relations that were extracted. To allow for inter-coder reliability analysis, each evaluator evaluated some samples that were also evaluated by the others, half as part of a training phase and half after training. In addition, they also evaluated a few samples that were manually corrected beforehand. This provides a baseline against which the uncorrected results can be measured against. Because the research only addresses relations indicated by prepositional phrases, the evaluation is restricted to these cases. Specifically, the judges rate the assignment of relations to the prepositional phrases on a scale from 1 to 5, with 5 being an exact match.</Paragraph>
    <Paragraph position="3"> The evaluation is based on averaging the assess- null tracted relationships. 25 relationships were each evaluated by 4 judges. Mean gives the mean of the assessment ratings (from 1 to 5). Score gives ratings relative to scale from 0 to 1.</Paragraph>
    <Paragraph position="4"> ment scores over the relationships. Table 2 shows the results from this evaluation, including the manually corrected as well as the uncorrected subsets of the relationships. For the corrected output, the mean assessment value was 3.225, which translates into an overall score of 0.60. For the uncorrected system output, the mean assessment value was 3.033, which translates into an overall score of 0.58. Although the absolute score is not high, the system's output is generally acceptable, as the score for the uncorrected set of relationships is close to that of the manually corrected set.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML