File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3310_intro.xml

Size: 1,180 bytes

Last Modified: 2025-10-06 14:04:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3310">
  <Title>Exploring Text and Image Features to Classify Images in Bioscience Literature</Title>
  <Section position="3" start_page="73" end_page="74" type="intro">
    <SectionTitle>
2 Image Taxonomy
</SectionTitle>
    <Paragraph position="0"> We downloaded from PubMed Central a total of 17,000 PNAS full-text articles (years 1995-2004), which contain a total of 88,225 images. We manually examined the images and defined an image taxonomy (as shown in Table 1) based on feedback from physicians. The categories were chosen to maintain balance between coherence of content in each category and the complexity of the taxonomy.</Paragraph>
    <Paragraph position="1"> For example, we keep images of biological objects (e.g., cells, tissues, organs etc) in one single category in this experiment to avoid over decomposition of categories and insufficient data in individual categories. Therefore we stress principled approaches for feature extraction and classifier design. The same fusion classification framework can be applied to cases where each category is further refined to include subclasses.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML