File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-3325_concl.xml

Size: 1,275 bytes

Last Modified: 2025-10-06 13:55:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3325">
  <Title>The Difficulties of Taxonomic Name Extraction and a Solution</Title>
  <Section position="7" start_page="132" end_page="132" type="concl">
    <SectionTitle>
6 Conclusions
</SectionTitle>
    <Paragraph position="0"> This paper has reported on our experiences with the automatic extraction of taxonomic names from English text documents. This task is essential for modern biology. A peculiarity of taxonomic name extraction is a shortage of training data. This is one reason why deployment of established NER techniques has turned out to be infeasible, at least without adaptations. A taxonomic-name extractor must circumvent that shortage. Our experience has been that designing regular expressions that generate training data directly from the documents is feasible in the context of taxonomic name extraction. A combining approach where individual techniques are carefully tuned and assigned in the right order has turned out to be superior to other potential solutions with regard to precision, recall, and number of user interactions. - Finally, is seems promising to use document and term frequencies as additional evidence. The ides is that both are low for taxonomic names.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML