File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-2007_concl.xml

Size: 1,131 bytes

Last Modified: 2025-10-06 13:54:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-2007">
  <Title>A Preliminary Look into the Use of Named Entity Information for Bioscience Text Tokenization</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper has introduced a system to normalize bioscience and health articles based on learning features surrounding punctuation which may need to be removed for normalization. The system performed significantly better than the baseline system.</Paragraph>
    <Paragraph position="1"> By analyzing the system's performance on named entity data from the GENIA corpus, it was discovered that named entities seemed to be more difficult to normalize than surrounding non-named text. This finding led to the creation of another normalization system trained on named entity data, which showed significant improvement over the first system when tested on named entities. This improvement seems to indicate that a system which would compute named entities in parallel with normalization would be useful.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML