File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0432_concl.xml

Size: 1,421 bytes

Last Modified: 2025-10-06 13:53:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0432">
  <Title>Named Entity Recognition Using a Character-based Probabilistic Approach</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have presented a very simple system that uses only internal and contextual character-level evidence. This highly language-independent model performs well on both seen and unseen tokens despite using only the supervised training data. The incorporation of trie-based estimates into an HMM framework allows the optimal tag sequence to be found for each sentence.</Paragraph>
    <Paragraph position="1"> We have also shown that case information can be restored with high accuracy using simple machine learning techniques, and that this restoration is beneficial to named entity recognition. We would expect most NER systems to benefit from this recapitalisation process, especially in fields without accurate case information, such as transcribed text or allcaps newswire.</Paragraph>
    <Paragraph position="2"> Trie-based classification yields probability estimates that are highly suitable for use as features in a further machine learning process. This approach has the advantage of being highly language-independent, and requiring fewer features than traditional orthographic feature representations. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML