File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/91/m91-1004_evalu.xml
Size: 5,212 bytes
Last Modified: 2025-10-06 14:00:02
<?xml version="1.0" standalone="yes"?> <Paper uid="M91-1004"> <Title>MUC-3 LINGUISTIC PHENOMENA TEST EXPERIMEN T</Title> <Section position="10" start_page="35" end_page="43" type="evalu"> <SectionTitle> RESULTS </SectionTitle> <Paragraph position="0"> The recall and precision scores for the appositive tests appear in Table 1 . Table 2 contains the scores based on the single measure calculated by multiplying recal l times precision .</Paragraph> <Section position="1" start_page="35" end_page="43" type="sub_section"> <SectionTitle> Analysis of Result s </SectionTitle> <Paragraph position="0"> Hypothesis 1 asserts that the apposition results are independent of the overal l performance of the systems . To determine the validity of Hypothesis 1, scatter plots were made of overall recall versus precision scores for a test run under comparabl e conditions (Figure 1), the appositive scores for phrases (Figure 2), and the appositiv e scores for sentences (Figure 3) . Comparing Figures 1 and 2 shows that the scores fo r apposition are significantly different from the overall scores .</Paragraph> <Paragraph position="1"> The performance b y systems on apposition is largely independent of their overall scores . The same conclusion can be drawn for the appositive scores for sentences by comparing Figures 1 and 3 .</Paragraph> <Paragraph position="2"> The recall versus precision scores for appositive phrases shows that th e performance is different from the overall performance .</Paragraph> <Paragraph position="3"> The recall versus precision scores for appositive sentences are more like the scores for phrases than like the overall scores .</Paragraph> <Paragraph position="4"> The scatter plots for appositives scored from phrases and sentences in Figure s 2 and 3, respectively, are more comparable to each other than to the overall score s suggesting that the use of information from sentences could be a valid test of performance on a phenomenon .</Paragraph> <Paragraph position="5"> Further analysis illustrated in Figures 4 and 5 show s that the scores for appositives and sentences containing appositives parallel each other for both recall and precision . These parallelisms affirm that material from sentences containing a phenomenon can be used for testing that phenomenon an d also indicate that we may be isolating the phenomenon .</Paragraph> <Paragraph position="6"> Hypothesis 2 asserts that the systems will score higher on the simple r appositives than on the more complex ones. The scores for recall are remarkabl y higher for the easy appositives as opposed to the harder appositives as shown i n Figure 6. Figure 7 shows a less clear trend for the precision scores . The single measure of recall times precision, however, shows an unmistakable trend of system s scoring more highly for the easier appositives. These results give us confidence that we are isolating the phenomenon of apposition .</Paragraph> <Paragraph position="7"> easier appositions than for the harder ones .</Paragraph> <Paragraph position="8"> The inability to predict whether postposed or preposed appositives would scor e higher was actually supported by the data .</Paragraph> <Paragraph position="9"> Hypothesis 3 was born out in that th e systems did score differently on the two types of appositives. There was no clear trend in the results as to which kind of apposition was easier. The recall, precision , and single measure scores are shown in Figures 9 through 11 . Notice that the result s were predicted providing further evidence that the phenomenon of apposition i s being isolated . It would be interesting to look at the methods of processing the tw o types of appositives for each of the systems to see why their scores are as they are . appositives shows that the systems score differently on the two but neither i s consistently easier.</Paragraph> <Paragraph position="10"> Hypothesis 4 predicts that the systems will score higher for the message s containing simple sentences in place of the appositives . Two sites volunteered to ru n this part of the test and they both contradicted the hypothesis . Their scores are shown in Table 3 alongside their scores for the messages containing the appositione d phrases . On further analysis, it was found that the introduction of the simple sentences made the task more complex in both cases. Apparently, the appositioned nou n phrases convey the information more simply than a separate sentence containing a copula and requiring reference resolution .</Paragraph> <Paragraph position="11"> The systems, for various reasons, tende d not to use the information in the separate sentence . The recall scores are thu s lower. The precision scores are somewhat affected . The results show an explanable effect on the scores lending further credence to the claim that the appositio n phenomena is being isolated .</Paragraph> </Section> </Section> <Section position="11" start_page="43" end_page="43" type="evalu"> <SectionTitle> VOLUNTARY </SectionTitle> <Paragraph position="0"> the messages without apposition and the messages with apposition show an effect o f modifying the appositioned noun phrases .</Paragraph> </Section> class="xml-element"></Paper>