File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1209_intro.xml
Size: 2,548 bytes
Last Modified: 2025-10-06 14:02:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1209"> <Title>Support Vector Machine Approach to Extracting Gene References into Function from Biological Documents</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Text Retrieval Conference (TREC) has been dedicated to information retrieval and information extraction for years. TREC 2003 introduced a new track called Genomics Track (Hersh and Bhupatiraju, 2003) to address the information retrieval and information extraction issues in the biomedical domain. For the information extraction part, the goal was to automatically reproduce the Gene Reference into Function (GeneRIF) resource in the LocusLink database (Pruitt et al., 2000.) GeneRIF associated with a gene is a sentence describing the function of that gene, and is currently manually generated.</Paragraph> <Paragraph position="1"> This paper presents the post-conference work on the information extraction task (i.e., secondary task). In the official runs, our system (Hou et al., 2003) adopted several weighting schemes (described in Section 3.2) to deal with this problem. However, we failed to beat the simple baseline approach, which always picks the title of a publication as the candidate GeneRIF. Bhalotia et al. (2003) converted this task into a binary classification problem and trained a Naive Bayes classifier with kernels, achieving 53.04% for CD. In their work, the title and last sentence of an abstract were concatenated and features were then extracted from the resulting string. Jelier et al. (2003) observed the distribution of target GeneRIFs in 9 sentence positions and converted this task into a 9-class classification problem, attaining 57.83% for CD.</Paragraph> <Paragraph position="2"> Both works indicated that the sentence position is of great importance. We therefore modified our system to incorporate the position information with the help of SVMs and we also investigated the capability of SVMs versus Naive Bayes on this problem.</Paragraph> <Paragraph position="3"> The rest of this paper is organized as follows.</Paragraph> <Paragraph position="4"> Section 2 presents the architecture of our extracting procedure. The basic idea and the experimental methods in this study are introduced in Section 3.</Paragraph> <Paragraph position="5"> Section 4 shows the results and makes some discussions. Finally, Section 5 concludes the remarks and lists some future works.</Paragraph> </Section> class="xml-element"></Paper>