File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/i05-2038_concl.xml

Size: 1,644 bytes

Last Modified: 2025-10-06 13:54:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2038">
  <Title>Syntax annotation for the GENIA corpus</Title>
  <Section position="7" start_page="224" end_page="224" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> A subset of the GENIA corpus is annotated for syntactic (tree) structure. Inter-annotator agreement test indicated that the annotation can be done stably by linguists without much knowledge in biology, provided that proper guideline is established for linguistic phenomena particular to scientific research abstracts. We have made the 500-abstract corpus in both XML and PTB formats and made it publicly available as &amp;quot;the GENIA Treebank beta version&amp;quot; (GTBbeta). We are in further cleaning up process of the 500-abstract set, and at the same time, initial annotation of the remaining abstracts is being done, so that the full GENIA set of 2000 abstracts will be annotated with tree structure.</Paragraph>
    <Paragraph position="1"> For parsers to be useful for information extraction, they have to establish a map between syntactic structure and more semantic predicate-argument structure, and between the linguistic predicate-argument structures to the factual relation to be extracted. Annotation of various information on a same set of text can help establish these maps. For the factual relations, we are annotating relations between proteins and genes in cooperation with a group of biologists.</Paragraph>
    <Paragraph position="2"> For predicate-argument annotation, we are investigating the use of the parse results of the Enju parser.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML