File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1074_intro.xml
Size: 1,527 bytes
Last Modified: 2025-10-06 14:02:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1074"> <Title>Optimizing Algorithms for Pronoun Resolution</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Evaluation Corpus </SectionTitle> <Paragraph position="0"> We chose as an evaluation base the NEGRA tree bank, which contains about 350,000 tokens of German newspaper text. The same corpus was also processed with a finite-state parser, performing at 80% dependency f-score (Schiehlen, 2003).</Paragraph> <Paragraph position="1"> All personal pronouns (PPER), possessive pronouns (PPOSAT), and demonstrative pronouns (PDS) in Negra were annotated in a format geared to the MUC-7 guidelines (MUC-7, 1997). Proper names were annotated automatically by a named entity recognizer. In a small portion of the corpus (6.7%), all coreference links were annotated. Thus the size of the annotated data (3,115 personal pronouns1, 2,198 possessive pronouns, 928 demonstrative pronouns) compares favourably with the size of evaluation data in other proposals (619 German pronouns in (Strube and Hahn, 1999), 2,477 English pronouns in (Ge et al., 1998), about 5,400 English coreferential expressions in (Ng and Cardie, 2002)).</Paragraph> <Paragraph position="2"> In the experiments, systems only looked for single NP antecedents. Hence, propositional or predicative antecedents (8.4% of the pronouns annotated) and split antecedents (0.2%) were inaccessible, which reduced optimal success rate to 91.4%.</Paragraph> </Section> class="xml-element"></Paper>