File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1706_concl.xml

Size: 1,456 bytes

Last Modified: 2025-10-06 13:53:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1706">
  <Title>XML-Based NLP Tools for Analysing and Annotating Medical Language</Title>
  <Section position="5" start_page="1" end_page="1" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have performed a number of different NLP tasks on the OHSUMED corpus of MEDLINE abstracts ranging from low-level tokenisation through shallow parsing to deep syntactic and semantic analysis. We have used XML as our processing paradigm and we believe that without the core XML tools the task would have become extremely hard. Furthermore, we have built fully-automatic pipelines and have not resorted to hand-coding at any point so that our output annotations are completely reproducable and our resources are reusable on new data. Our approach of building a firm foundation of low-level tokenisation has proved invaluable for a variety of higher-level tasks.</Paragraph>
    <Paragraph position="1"> The XML-annotated OHSUMED corpus which has resulted from our project will be useful for a number of different tasks in the biomedical domain. For this reason we are developing a web-site from which many of our resources (including the pipelines described in this paper) are available: http:// www.ltg.ed.ac.uk/disp/. In addition, we provide various marked-up and tokenised versions of OHSUMED, including the output of the parsers described here.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML