File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1309_concl.xml
Size: 1,070 bytes
Last Modified: 2025-10-06 13:53:48
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1309"> <Title>Protein Name Tagging for Biomedical Annotation in Text</Title> <Section position="6" start_page="1" end_page="1" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> This paper describes a method to find protein names by chunking based on a morpheme, which leads to better recognition of protein name boundaries. For this, we propose morphological analysis of which core technologies are found in non-segmented languages. With the large dataset (1600 abstracts for training and 400 abstracts for testing in GENIA 3.01), we obtain f-score of 70 points for protein molecule names and 75 points for protein names, including molecules, families, domains etc. The results are comparable to previous approaches in the literature. We focus protein names as a case study.</Paragraph> <Paragraph position="1"> However, given annotated corpus of similar size and quality, the same approach can be applied to other bio-entities such as gene names.</Paragraph> </Section> class="xml-element"></Paper>