File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1804_intro.xml
Size: 3,251 bytes
Last Modified: 2025-10-06 14:02:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1804"> <Title>Processing and Language</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> (Zhu et al 2004) noted that creating a model of the dynamics of molecular interaction networks offers enormous potential for understanding systems biology. Existing work has led to the development of databases and ontologies which provide classifications and annotations based on a gene product s function, location, structure and so on, as for example in PANTHER (Thomas PD et al 2003), a library of protein families and subfamilies indexed by function, and the Gene Ontology Annotation1 (GOA) (Camon et al 2003).</Paragraph> <Paragraph position="1"> Further progress requires a robust formal ontology of structures, locations, functions and processes, linked together via relations such as is_part_of, is_located_at, is_realized_by, and so forth. As a step along this road, we provide a methodology for deriving and representing association rules between the entities present within the separate ontologies of the Gene Ontology.2 (Gene Ontology Consortium, 2001).</Paragraph> <Paragraph position="2"> Such rules will be able to situate a biological process in relation to a cellular location to an agent. They will be able to relate lower-granularity molecular functions in relation to highergranularity biological processes, and establish other sorts of relations between entities in different parts of GO.</Paragraph> <Paragraph position="3"> A preliminary study in this area (Burgun et al 2004) combines ontological, lexical and statistical principles. Their study provides association rules on a selected set of 23 gene products that were potentially involved in enterocyte differentiation and that showed similar levels of expression.</Paragraph> <Paragraph position="4"> (Clelland and Oinn) provide commonly annotated terms based on the CluSTr database (Kriventseva et al 2001), which has recently been incorporated into the QuickGO browser.3 Association rules have been used for mining gene expression data by (Creighton and Hanash 2003). (Ogren et al 2004) studied the compositional nature of the GO terms and described the dependencies among them.</Paragraph> <Paragraph position="5"> Our investigation draws on the fact that terms from GO s separate ontologies are often used to annotation the same gene or gene product. We draw on the TIGR database to establish the corresponding patterns of association between terms in GO when taken in its entirety.</Paragraph> <Paragraph position="6"> In what follows we describe the results of this CompuTerm 2004 - 3rd International Workshop on Computational Terminology 31 annotations, pertaining to the 41,502 distinct gene products present within GOA and focusing on the TIGR database within the February 2004 edition of GO. These associations were mined to establish association links between GO terms using standard statistical database techniques based on the so-called apriori algorithm and using a part of speech tagger. The discovered links were then analysed on the basis of methods drawn from foundational ontology.</Paragraph> </Section> class="xml-element"></Paper>