File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1027_evalu.xml
Size: 2,927 bytes
Last Modified: 2025-10-06 13:58:47
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1027"> <Title>Shallow language processing architecture for Bulgarian</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Evaluation of the anaphora </SectionTitle> <Paragraph position="0"> resolution module The precision of anaphora resolution measured on corpus of software manuals containing 221 anaphors, is 75.0%. Given that the anaphora resolution system operates in a fully automatic mode, this result could be considered very satisfactory. It should be noted that some of the errors arise from inaccuracy of the pre-processing modules such as clause segmentation and NP extraction (see Table 3).</Paragraph> <Paragraph position="1"> We also evaluated the anaphora resolution system in the genre of tourist texts. As expected, the success rate dropped to 68.1% which, however, can still be regarded as a very good result, given the fact that neither manual pre-editing of the input text, nor any post-editing of the output of the pre-processing tools were undertaken. The main reason for the decline of performance is that some of the original indicators such as term preference, immediate reference and sequential instructions of the knowledge-poor approach, are genre specific.</Paragraph> <Paragraph position="2"> The software manuals corpus featured 221 anaphoric third person pronouns, whereas the tourist text consisted of 116 such pronouns. For our evaluation we used the measures success rate, critical success rate and non-trivial success rate (Mitkov, 2001). Success rate is the ratio SR = AC=A, where AC is the number of correctly resolved and A is the number of all anaphors. Critical success rate is the success rate for the anaphors which have more than one candidates for antecedent after the gender and number agreement filter is applied. Non-trivial success rate is calculated for those anaphors which have more than one candidates for antecedent before the gender and number agreement is applied. We also compared our approach with the typical baseline model Baseline most recent which takes as antecedent the most recent NP matching the anaphor in gender and number. The results are shown in the Table 1.</Paragraph> <Paragraph position="3"> These results show that the performance of LINGUA in anaphora resolution is comparable to that of MARS (Orasan et al., 2000). An optimised version 7 of the indicator weights scored a success rate of 69,8% on the tourist guide texts, thus yielding an improvement of 6,1%.</Paragraph> <Paragraph position="4"> Table 2 illustrates the complexity of the evaluation data by providing simple quantifying measures such as average number of candidates per anaphor, average distance from the anaphor to the antecedent in terms of sentences, clauses, intervening NPs, number of intrasentential anaphors as opposed to intersentential ones etc.</Paragraph> </Section> class="xml-element"></Paper>