File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1055_intro.xml
Size: 4,188 bytes
Last Modified: 2025-10-06 14:02:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1055"> <Title>Classifying Semantic Relations in Bioscience Texts</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"> While there is much work on role extraction, very little work has been done for relationship recognition. Moreover, many papers that claim to be doing relationship recognition in reality address the task of role extraction: (usually two) entities are extracted and the relationship is implied by the co-occurrence of these entities or by the presence of some linguistic expression. These linguistic patterns could in principle distinguish between different relations, but instead are usually used to identify examples of one relation. In the related work for statistical models there has been, to the best of our knowledge, no attempt to distinguish between different relations that can occur between the same semantic entities.</Paragraph> <Paragraph position="1"> In Agichtein and Gravano (2000) the goal is to extract pairs such as (Microsoft, Redmond), where Redmond is the location of the organization Microsoft. Their technique generates and evaluates lexical patterns that are indicative of the relation.</Paragraph> <Paragraph position="2"> Only the relation location of is tackled and the entities are assumed given.</Paragraph> <Paragraph position="3"> In Zelenko et al. (2002), the task is to extract the relationships person-affiliation and organization-location. The classification (done with Support Vector Machine and Voted Perceptron algorithms) is between positive and negative sentences, where the positive sentences contain the two entities.</Paragraph> <Paragraph position="4"> In the bioscience NLP literature there are also efforts to extract entities and relations. In Ray and Craven (2001), Hidden Markov Models are applied to MEDLINE text to extract the entities PROTEINS and LOCATIONS in the relationship subcellular-location and the entities GENE and DISORDER in the relationship disorderassociation. The authors acknowledge that the task of extracting relations is different from the task of extracting entities. Nevertheless, they consider positive examples to be all the sentences that simply contain the entities, rather than analyzing which relations hold between these entities. In Craven (1999), the problem tackled is relationship extraction from MEDLINE for the relation subcellular-location. The authors treat it as a text classification problem and propose and compare two classifiers: a Naive Bayes classifier and a relational learning algorithm. This is a two-way classification, and again there is no mention of whether the co-occurrence of the entities actually represents the target relation.</Paragraph> <Paragraph position="5"> Pustejovsky et al. (2002) use a rule-based system to extract entities in the inhibit-relation. Their experiments use sentences that contain verbal and nominal forms of the stem inhibit. Thus the actual task performed is the extraction of entities that are connected by some form of the stem inhibit, which by requiring occurrence of this word explicitly, is not the same as finding all sentences that talk about inhibiting actions. Similarly, Rindflesch et al. (1999) identify noun phrases surrounding forms of the stem bind which signify entities that can enter into molecular binding relationships. In Srinivasan and Rindflesch (2002) MeSH term co-occurrences within MEDLINE articles are used to attempt to infer relationships between different concepts, including diseases and drugs.</Paragraph> <Paragraph position="6"> In the bioscience domain the work on relation classification is primary done through hand-built rules. Feldman et al. (2002) use hand-built rules that make use of syntactic and lexical features and semantic constraints to find relations between genes, proteins, drugs and diseases. The GENIES system (Friedman et al., 2001) uses a hand-built semantic grammar along with hand-derived syntactic and semantic constraints, and recognizes a wide range of relationships between biological molecules.</Paragraph> </Section> class="xml-element"></Paper>