File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0210_concl.xml

Size: 2,162 bytes

Last Modified: 2025-10-06 13:54:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0210">
  <Title>Discourse Annotation and Semantic Annotation in the GNOME Corpus</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Discussions and Conclusion
</SectionTitle>
    <Paragraph position="0"> Corpus consistency The main lesson learned from this effort is that actually using a corpus is the best way both to ensure its correctness and to learn which types of information are most useful.</Paragraph>
    <Paragraph position="1"> Thematic Roles One attribute on which we weren't able to reach acceptable agreement was the thematic role of an NP, which has been argued to be a better indicator of salience than grammatical function (Sidner, 1979; Stevenson et al., 1994); the agreement value in this case was k = .35. Other groups however have shown that this can be done, e.g., in Framenet (Baker et al., 1998) and more recently in PropBank (Kingsbury and Palmer, 2002).</Paragraph>
    <Paragraph position="2"> Planned Revisions of the Scheme A number of aspects of the annotation scheme used for the corpus could be improved. An obvious improvement would be to directly annotate predicates with their WordNet senses instead of annotatingONTOand animacy. We started doing this for the annotation of modifiers (Cheng et al., 2001), and developed an interface to WordNet, but too late to redo the whole corpus. Of the attributes, COUNT and GENERIC were the most difficult to annotate; further tests with these attributes could be useful.</Paragraph>
    <Paragraph position="3"> Automatic annotation A substantial part of the annotation work required for GNOME now could (and should) be done automatically, or semiautomatically. This includes, most obviously, the identification of sentences and NPs, already done automatically in the VENEX corpus (Poesio, 2004b); and at least grammatical function, animacy, and countability could be automatically annotated in preliminary form with existing techniques, and then corrected by hand. We also plan to use the corpus to bootstrap techniques for automatic identification of uniqueness and gender.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML