File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1055_metho.xml

Size: 2,274 bytes

Last Modified: 2025-10-06 14:09:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1055">
  <Title>Classifying Semantic Relations in Bioscience Texts</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Data and Features
</SectionTitle>
    <Paragraph position="0"> For our experiments, the text was obtained from MEDLINE 20012. An annotator with biology expertise considered the titles and abstracts separately and labeled the sentences (both roles and relations) based solely on the content of the individual sentences. Seven possible types of relationships between TREATMENT and DISEASE were identified. Table 1 shows, for each relation, its definition, one example sentence and the number of sentences found containing it.</Paragraph>
    <Paragraph position="1"> We used a large domain-specific lexical hierarchy (MeSH, Medical Subject Headings3) to map words into semantic categories. There are about 19,000 unique terms in MeSH and 15 main sub-hierarchies, each corresponding to a major branch of medical ontology; e.g., tree A corresponds to Anatomy, tree C to Disease, and so on. As an example, the word migraine maps to the term C10.228, that is, C (a disease), C10 (Ner null ple MeSH terms for one word, we simply choose the first one. These semantic features are shown to be very useful for our tasks (see Section 4.3). Rosario et al. (2002) demonstrate the usefulness of MeSH for the classification of the semantic relationships between nouns in noun compounds.</Paragraph>
    <Paragraph position="2"> The results reported in this paper were obtained with the following features: the word itself, its part of speech from the Brill tagger (Brill, 1995), the phrase constituent the word belongs to, obtained by flattening the output of a parser (Collins, 1996), and the word's MeSH ID (if available). In addition, we identified the sub-hierarchies of MeSH that tend to correspond to treatments and diseases, and convert these into a tri-valued attribute indicating one of: disease, treatment or neither. Finally, we included orthographic features such as 'is the word a number', 'only part of the word is a number', 'first letter is capitalized', 'all letters are capitalized'. In Section 4.3 we analyze the impact of these features.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML