File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/91/m91-1004_evalu.xml

Size: 5,212 bytes

Last Modified: 2025-10-06 14:00:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="M91-1004">
  <Title>MUC-3 LINGUISTIC PHENOMENA TEST EXPERIMEN T</Title>
  <Section position="10" start_page="35" end_page="43" type="evalu">
    <SectionTitle>
RESULTS
</SectionTitle>
    <Paragraph position="0"> The recall and precision scores for the appositive tests appear in Table 1 . Table 2 contains the scores based on the single measure calculated by multiplying recal l times precision .</Paragraph>
    <Section position="1" start_page="35" end_page="43" type="sub_section">
      <SectionTitle>
Analysis of Result s
</SectionTitle>
      <Paragraph position="0"> Hypothesis 1 asserts that the apposition results are independent of the overal l performance of the systems . To determine the validity of Hypothesis 1, scatter plots were made of overall recall versus precision scores for a test run under comparabl e conditions (Figure 1), the appositive scores for phrases (Figure 2), and the appositiv e scores for sentences (Figure 3) . Comparing Figures 1 and 2 shows that the scores fo r apposition are significantly different from the overall scores .</Paragraph>
      <Paragraph position="1"> The performance b y systems on apposition is largely independent of their overall scores . The same conclusion can be drawn for the appositive scores for sentences by comparing Figures 1 and 3 .</Paragraph>
      <Paragraph position="2">  The recall versus precision scores for appositive phrases shows that th e performance is different from the overall performance .</Paragraph>
      <Paragraph position="3">  The recall versus precision scores for appositive sentences are more like the scores for phrases than like the overall scores .</Paragraph>
      <Paragraph position="4"> The scatter plots for appositives scored from phrases and sentences in Figure s 2 and 3, respectively, are more comparable to each other than to the overall score s suggesting that the use of information from sentences could be a valid test of performance on a phenomenon .</Paragraph>
      <Paragraph position="5"> Further analysis illustrated in Figures 4 and 5 show s that the scores for appositives and sentences containing appositives parallel each other for both recall and precision . These parallelisms affirm that material from sentences containing a phenomenon can be used for testing that phenomenon an d also indicate that we may be isolating the phenomenon .</Paragraph>
      <Paragraph position="6"> Hypothesis 2 asserts that the systems will score higher on the simple r appositives than on the more complex ones. The scores for recall are remarkabl y higher for the easy appositives as opposed to the harder appositives as shown i n Figure 6. Figure 7 shows a less clear trend for the precision scores . The single measure of recall times precision, however, shows an unmistakable trend of system s scoring more highly for the easier appositives. These results give us confidence that we are isolating the phenomenon of apposition .</Paragraph>
      <Paragraph position="7">  easier appositions than for the harder ones .</Paragraph>
      <Paragraph position="8"> The inability to predict whether postposed or preposed appositives would scor e higher was actually supported by the data .</Paragraph>
      <Paragraph position="9"> Hypothesis 3 was born out in that th e systems did score differently on the two types of appositives. There was no clear trend in the results as to which kind of apposition was easier. The recall, precision , and single measure scores are shown in Figures 9 through 11 . Notice that the result s were predicted providing further evidence that the phenomenon of apposition i s being isolated . It would be interesting to look at the methods of processing the tw o types of appositives for each of the systems to see why their scores are as they are .  appositives shows that the systems score differently on the two but neither i s consistently easier.</Paragraph>
      <Paragraph position="10"> Hypothesis 4 predicts that the systems will score higher for the message s containing simple sentences in place of the appositives . Two sites volunteered to ru n this part of the test and they both contradicted the hypothesis . Their scores are shown in Table 3 alongside their scores for the messages containing the appositione d phrases . On further analysis, it was found that the introduction of the simple sentences made the task more complex in both cases. Apparently, the appositioned nou n phrases convey the information more simply than a separate sentence containing a copula and requiring reference resolution .</Paragraph>
      <Paragraph position="11"> The systems, for various reasons, tende d  not to use the information in the separate sentence . The recall scores are thu s lower. The precision scores are somewhat affected . The results show an explanable effect on the scores lending further credence to the claim that the appositio n phenomena is being isolated .</Paragraph>
    </Section>
  </Section>
  <Section position="11" start_page="43" end_page="43" type="evalu">
    <SectionTitle>
VOLUNTARY
</SectionTitle>
    <Paragraph position="0"> the messages without apposition and the messages with apposition show an effect o f modifying the appositioned noun phrases .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML