File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/a00-1026_abstr.xml

Size: 4,479 bytes

Last Modified: 2025-10-06 13:41:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1026">
  <Title>Extracting Molecular Binding Relationships from Biomedical Text</Title>
  <Section position="1" start_page="0" end_page="188" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> ARBITER is a Prolog program that extracts assertions about macromolecular binding relationships from biomedical text. We describe the domain knowledge and the under-specified linguistic analyses that support the identification of these predications. After discussing a formal evaluation of ARBITER, we report on its application to 491,000 MEDLINE ~ abstracts, during which almost 25,000 binding relationships suitable for entry into a database of macromolecular function were extracted.</Paragraph>
    <Paragraph position="1"> Introduction Far more scientific information exists in the literature than in any structured database. Convenient access to this information could significantly benefit research activities in various fields. The emerging technology of information extraction (Appelt and Israel 1997, Hearst 1999) provides a means of gaining access to this information. In this paper we report on a project to extract biomolecular data from biomedical text.</Paragraph>
    <Paragraph position="2"> We concentrate on molecular binding affinity, which provides a strong indication of macromolecular function and is a core phenomenon in molecular biology. Our ultimate goal is to automatically construct a database of binding relationships asserted in MEDLINE citations.</Paragraph>
    <Paragraph position="3"> The National Library of Medicine's MEDLINE textual database is an online repository of more than 10 million citations from the biomedical literature. All citations contain the title of the corresponding article along with other bibliographic information. In addition, a large number of citations contain author-supplied abstracts. Initial studies indicate that there are approximately 500,000 MEDLINE citations relevant to molecular binding affinity.</Paragraph>
    <Paragraph position="4"> Our decision to apply information extraction technology to binding relationships was guided not only by the biological importance of this phenomenon but also by the relatively straight-forward syntactic cuing of binding predications in text. The inflectional forms of a single verb, bind, indicate this relationship in the vast majority of cases, and our initial work is limited to these instances. For example, our goal in this project is to extract the binding predications in  (2) from the text in (1).</Paragraph>
    <Paragraph position="5"> (1) CC chemokine receptor 1 (CCR1) is ex- null pressed in neutrophils, monocytes, lymphocytes, and eosinophils, and binds the leukocyte chemoattractant and hematopoiesis regulator macrophage inflammatory protein (MIP)- 1 alpha, as well as several related CC chemokines.</Paragraph>
    <Paragraph position="6">  Considerable interest in information extraction has concentrated on identifying named entities in text pertaining to current events (for example, Wacholder et al. 1997, Voorhees and Harman 1998, and MUC-7); however, several recent efforts have been directed at biomolecular data (Blaschke et al. 1999, Craven and Kumlien 1999, and Rindflesch et al. 2000, for example). The overall goal is to transform the information  encoded in text into a more readily accessible tbrmat, typically a template with slots named for the participants in the scenario of interest. The template for molecular binding can be thought of as a simple predication with predicate &amp;quot;bind&amp;quot; and two arguments which participate (symmetrically) in the relationship: BINDS(&lt;X&gt;, &lt;Y&gt;).</Paragraph>
    <Paragraph position="7"> Various strategies, both linguistic and statistical, have been used in information extraction efforts. We introduce a Prolog program called ARBITER (Assess and Retrieve Binding Terminology) that takes advantage of an existing domain knowledge source and relies on syntactic cues provided by a partial parser in order to identify and extract binding relations from text. We discuss the syntactic processing used and then report on a formal evaluation of ARBITER against a test collection of 116 MEDLINE citations in which the binding relations were marked by hand. Finally, we provide a brief overview of the results of applying ARBITER to the 500,000 MEDLINE citations discussing molecular binding affinity.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML