XML Viewer - w04-3105

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-3105_metho.xml
Size: 23,534 bytes
Last Modified: 2025-10-06 14:09:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3105">
  <Title>Mining MEDLINE: Postulating a Beneficial Role for Curcumin Longa in Retinal Diseases</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Open Discovery
</SectionTitle>
    <Paragraph position="0"> Our open discovery approach is founded on the notion of topic profiles. A topic is any subject of interest such as treatment of hypertension or ATM gene. A profile is essentially a representation of a topic that is derived from the text collection being mined. For MEDLINE our topic profiles are vectors of weighted Medical Subject Headings (MeSH). These terms belong to a controlled vocabulary and are manually assigned to each MEDLINE document by trained indexers. Given a topic of interest, our algorithm first retrieves relevant MEDLINE documents.</Paragraph>
    <Paragraph position="1"> MeSH terms are then extracted from these documents and their weights are calculated. These weighted terms form the profile vector for the topic. We discuss the method for calculating weights shortly.</Paragraph>
    <Paragraph position="2"> We also exploit the fact that MeSH terms have been classified using 134 UMLS (Unified Medical Language System)1 semantic types as for example Cell Function, Sign or Symptom. Each MeSH term is assigned one or more semantic types. For example, interferon type II falls within both Immunologic Factor and Pharmacologic Substance semantic types. More generally, semantic types represent 'categories' that have been used to classify the MeSH metadata. Semantic types are useful because depending on the nature of the discovery goals we may adopt a particular view, i.e., we may restrict the discovery process to consider only MeSH terms that belong to certain semantic types. In these cases the topic profiles are restricted to MeSH terms belonging to semantic types specified by the view.</Paragraph>
    <Paragraph position="3"> We calculate term weights for the MeSH terms. Term weights are a slight modification of the commonly used TF*IDF scores. Since a MeSH term typically occurs once in a MEDLINE record, here TFi (term frequency) equals the number of documents in which the MeSH term ti occurs within the retrieved document set. IDFi (inverse document frequency) is log(N/TFi). N is the number of documents retrieved for the topic. Weights are normalized as shown below for term ti. This vector of weighted MeSH terms forms the topic profile.</Paragraph>
    <Paragraph position="5"> where vi = TFi [?] log(N/TFi) and there are r terms in the profile.</Paragraph>
    <Paragraph position="6"> Algorithm: Figure 2 outlines our open discovery algorithm which follows the framework shown in Figure 1. We begin by building the A topic profile restricted to ST-B semantic types. Note that all MEDLINE searches are conducted automatically via the PubMed interface2. We then automatically select M MeSH terms for each ST-B semantic type from this A profile and call these the B terms. Next profiles are built for each of these B terms limited to another selected set of semantic types ST-C. These B profiles are analysed in combination to select an initial pool of candidate C terms. These candidate terms are then checked for novelty in the context of the starting A topic. When the algorithm terminates the user is provided a final list of ranked, novel C terms. The higher the rank the greater  the estimated confidence in the potential connection with the A topic.</Paragraph>
    <Paragraph position="7"> At this point the rest of the process depends almost entirely on the user. (This is also the case in other implementations of the open discovery process (eg. Lindsay &amp; Gordon 1999; Weeber et al., 2001)). It is up to the user to select A - C pairs of interest and explore the literature for supporting evidence.</Paragraph>
    <Paragraph position="8"> The role of ST-B and ST-C in the algorithm is to apply reasonable constraints to the problem and shape the path of the discovery process. Similarly, parameter M may be used to focus the discovery process. The higher this number the bigger the scope through which one looks for novel C topics. Obviously it takes experience to come up with reasonable values for these parameters. But we already see some patterns emerge in the MEDLINE mining literature. For example when looking for substances likely to influence a disease several researchers have used functional semantic types such as Cell Function and Molecular Dysfunction for selecting intermediate pathways (eg. Weeber et al., 2001). Experiments varying these semantic types have been described in our previous work (Srinivasan, 2004). Unique aspects of our algorithm in comparison to open discovery methods explored by others, include for example, the fact that our weighting scheme identifies interesting and relevant B terms at high ranks. Also, C terms are assessed by combining the evidence on their connection to the different intermediate B terms.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Open Discovery with Turmeric
</SectionTitle>
    <Paragraph position="0"> Our interest in curcumin was sparked by the fact that this spice is widely used in Asia and is highly regarded for its curative and analgesic properties. These include the treatment of burns, stomach ulcers and ailments, and for various skin diseases. Curcumin is also used as an antiseptic, in alleviating symptoms of the common cold as well as a depilatory. A number of MEDLINE records have reported on the anti-cancer and anti-inflammatory properties of curcumin (12680238, 12678737, 126760443).</Paragraph>
    <Paragraph position="1"> Our open discovery goal is aimed at determining whether there are novel disease contexts in which curcumin could prove beneficial, and to propose evidence-based hypotheses that can be experimentally verified.</Paragraph>
    <Paragraph position="2"> We executed our open discovery algorithm with curcumin as the starting topic (A). The specific PubMed search conducted was turmeric OR curcumin OR curcuma (done on November 15, 2003). A total of 1,175 PubMed documents were retrieved. As Figure 3 shows the majority of these publications (1,043, 89%) are rela3Numbers within parantheses such as these refer to PubMed record ids. The reader may enter these directly into the PubMed interface to retrieve the corresponding records.</Paragraph>
    <Paragraph position="3"> Input from user: (1) an A topic of interest, (2) a set of UMLS semantic types (ST-B) for selecting B terms and a set (ST-C) for selecting C terms.</Paragraph>
    <Paragraph position="4"> Parameter: M * Step 1: Conduct an appropriate PubMed search for topic A, and build its MeSH profile limited to the semantic types in ST-B.</Paragraph>
    <Paragraph position="5"> Call this profile AP.</Paragraph>
    <Paragraph position="6"> * Step 2: For each semantic type in ST-B, select theM top ranking MeSH terms from AP.</Paragraph>
    <Paragraph position="7"> Remove duplicate terms if any. These are designated the B terms (B1, B2, B3, etc.).</Paragraph>
    <Paragraph position="8"> * Step 3: Conduct an independent PubMed search for each B term and build its profile limited to the semantic types ST-C. Call these profiles BP1, BP2, BP3, etc.</Paragraph>
    <Paragraph position="9"> * Step 4: Compute a final combined profile where the combined weight of a MeSH term is the sum of its weights in BP1, BP2, BP3, etc. Call this initial profile CP.</Paragraph>
    <Paragraph position="10"> * Step 5: For each term t in CP if a MEDLINE search on topic A AND t returns non zero results, eliminate t from CP.</Paragraph>
    <Paragraph position="11"> Output: For each semantic type in ST-C, output the MeSH terms in CP ranked by combined weight.</Paragraph>
    <Paragraph position="12"> These are the C terms organized by semantic type and ranked by estimated potential.</Paragraph>
    <Paragraph position="13">  tively recent, being published in 1990 or later. This indicates a surge in scientific interest in the health effects of this spice, which has long been valued in Asia for its medicinal properties.</Paragraph>
    <Paragraph position="14"> We limited ST-B to the three semantic types Gene or Genome; Enzyme; and Amino Acid, Peptide or Protein.</Paragraph>
    <Paragraph position="15"> We restricted ST-C to Disease or Syndrome and Neoplastic Process4 and set M (the parameter specifying the number of B terms to select) to 10. These semantic types are appropriate since we are looking for biochemical and genetic connections between turmeric and novel diseases. Table 1 shows the top 10 selected MeSH terms from each ST-B type (step 2). We can observe from the table that some of the terms appear in more than one semantic type. (This is possible since a term may be assigned to more than 1 semantic type in the UMLS). However, we remove duplicates in step 2. Also, some terms are very specific such as Protein Kinase C while others are broad representing families such as DNA-Binding Proteins and Isoenzymes. At present we do not distinguish between B terms using specificity. Our plan is to examine this aspect in future research.</Paragraph>
    <Paragraph position="16"> The B terms listed in Table 1 are the top ten terms that were retrieved from a search of the literature for the semantic types Genes or Genomes, Enzymes, and Amino Acid, Peptide or Proteins. The biochemical effects of curcumin become apparent upon conducting a search of the literature for curcumin and any of these terms. Curcumin, for example, has a strong down-regulatory effect on c-Jun NH2-terminal kinase (JNK) (14627502, 12859962, 11370761, 12097302) resulting in the arrest of cell proliferation (14627502) in prostate tumor cells (12853969) and induction of apoptosis (12859962). Curcumin inhibits NF-kappaB (12714587) leading to the suppression of cell proliferation and the induction of apoptosis  in multiple myeloma (12393461) and ovary cancer cells (12520734). TGF-beta1 induced IL-6 which has been implicated in the malignant progression of prostate cancers was severely impeded by curcumin through inhibition of c-Jun (matches with Genes, jun in the table) JNK (an instance of MAPK in the table) or AP-1 (12853969).</Paragraph>
    <Paragraph position="17"> The curcumin open discovery process terminated with a ranked list of diseases. Table 2 shows the top 5 entries5. One observation made at this point was that the type of automated search conducted in step 5 of the algorithm to check for novelty is insufficient. At present, the search involves only the particular MeSH term intersected with the A topic. We do not yet automatically consider synonyms of the MeSH term. For example for the last entry in the table, although Ischemic Attack Transient AND (turmeric OR curcumin OR curcuma) retrieved 0 documents, the search Ischemia AND (turmeric OR curcumin OR curcuma) retrieves 17 documents. Hence this entry is unlikely to be immediately interesting to the user. However, the top two entries did not retrieve any document even after searching with different synonyms. Testes is also unlikely to be interesting since a curcumin search intersected with sperm retrieved many documents. Considering retrieval set size alone is insufficient. For instance, curcumin intersected with thyroid retrieved 5 documents. However, these appear to be peripheral to curcumin's effect on thyroid neoplasms focusing more on aspects such as hypothyroidism and toxicity. Automating query expansion using synonyms will be the subject of further research. null At this point the user may select entries and peruse the appropriate literature further to (a) determine the nature of the relationship between curcumin and the diseases (as the substance under study could be beneficial or harmful) and (b) assess the quality of the background knowledge that may be used to guide further study of curcumin and the disease. This manual phase may be guided by the specific B term-based pathways connecting the selected  disease with curcumin. Table 3 lists the B terms that were automatically identified as connecting curcumin and 'Retina'.</Paragraph>
    <Paragraph position="18"> In the next section we present such an analysis for 'Retina'. That is, we (the second author) examine the literature to determine if retinal diseases may be a good context in which a bioscientist may study curcumin. Our analysis indicates that indeed there is good evidence supporting the hypothesis of a beneficial role for turmeric in the context of diabetic retinopathies, ocular inflammation and glaucoma. Analysis of the other highly-ranked diseases is left for future work.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Turmeric - Retinal Diseases Connection
</SectionTitle>
    <Paragraph position="0"> The procedure followed up to this point is 'term-centric'.</Paragraph>
    <Paragraph position="1"> That is, we automatically identify statistically interesting B terms and then generate a ranked list of C terms. We now present further analysis on the connection between retinal diseases and curcumin. In some cases reading the title and abstracts of select records provided sufficient information. In addition the full text of the document was available. Our strategy was to examine publications for biochemical or molecular biology mechanisms. In particular, we were interested in ascertaining whether any of the genes noted earlier were also involved in the pathophysiology of these retinal disorders. We focused on the genes as the critical links that connect the agent curcumin to the disorders.</Paragraph>
    <Paragraph position="2"> Analysis: The user's goal is to identify biochemical pathways potentially connecting retinal diseases and curcumin. Retinal diseases could result from complications due to diabetes, or of infection and inflammation of the retina.</Paragraph>
    <Paragraph position="3"> Diabetic retinopathy is a leading cause of blindness.</Paragraph>
    <Paragraph position="4"> An early sign of the disease is the adhesion of leukocytes to the vessels of the retina, endothelial cell injury, and the breakdown of the blood-retina barrier (12000720). Even acute intensive insulin therapy constitutes an additional risk factor for diabetic retinopathy, due to insulin-induced hypoxia and an associated acceleration in the blood-retina barrier breakdown (11901189). Glaucoma is the second most common cause of blindness in the world (8695555) and is caused by mutations in a number of genes on chromosomes 1 and 10 as well as in other loci on chromosomes 2, 3, 8, and 7. While several diseases have one or a few genetic loci that control disease progression and familial transmission, it is often the case that a variety of genes may be involved in their pathophysiology. Following is a brief survey of some of the genes that may be involved in the process of tissue injury or inflammation and regulation of cell division. Control of the immune process and of the inflammatory response is important in combating infection and autoimmune diseases. Regulation of cell division, particularly programmed cell death, is critical in diverse diseases such as cancer and tissue regeneration, e.g. retinal injury and diseases. Regulation of the activity of such genes could provide strategies for therapeutic intervention using curcumin.</Paragraph>
    <Paragraph position="5"> In diabetes and during inflammation, periods of hypoxia, i.e. low oxygen concentration, occur in various tissues and organs. At such times an early cellular response results in the elevated expression of interleukin1beta (IL-1 beta) and cyclooxygenase 2 (COX-2) genes (11527948, 14507857, 11821258) which in turn stimulate new blood vessel growth leading to retinopathy (12821538, 12601017). Similarly, the expression of COX-2 was associated with the development of glaucoma (9441697). Treatment with COX-2 inhibitors suppressed blood-retinal barrier breakdown and had an antiangiogenic effect, i.e. they prevented the growth of new blood vessels and thus had a protective effect on the retina (12821538, 11980873).</Paragraph>
    <Paragraph position="6"> Another gene, tumor necrosis factor alpha (TNFalpha), was elevated during the early stages of diabetic retinopathy and inflammation (11821258, 12706995, 11161842). Anti-TNF-alpha treatment reduced leukocyte adhesion to blood vessels of the eye and vascular leakage (12714660) indicating a potential therapeutic effect for such a treatment to reduce ocular inflammation. Activation of TNF-alpha and other genes may also lead to the pathophysiology of glaucoma (10975909, 10815159).</Paragraph>
    <Paragraph position="7"> The family of mitogen-activated protein kinases (MAPK) is another group of genes that has an important role in retinal disease. These include extracellular signalregulated kinases (ERK), c-Jun amino(N)-terminal kinase (JNK), and p38. One of these, ERK, was induced in glaucoma (12824248). Often inflammatory responses include the induction of apoptosis, or programmed cell death. The involvement of JNK in inducing apoptosis was demonstrated in prostate cancer (12859962, 12663665) and retinal cells (12270637). There is also a link to TNF-alpha (discussed above) which was shown to activate phosphorylation of ERKs, p38, and JNK MAPK in human chondrocytes (12878172).</Paragraph>
    <Paragraph position="8"> IL-1beta activation, induced by the presence of retinal holes, a key feature of diabetic retinopathy, is also reported to result in the activation of a number of the MAPK genes ERK, JNK, and p38 (12824248). These conditions in turn exacerbate the disease process in that they result in proliferative and migratory cells accumulating in the wounded retina (12500176). Inhibitors of MAPK and phosphatidylinositol 3-kinase (PI3) inhibited retinal pigment epithelial cell proliferation (12782163).</Paragraph>
    <Paragraph position="9"> The breakdown in the blood-retina barrier is also suppressed by inhibitors of p38 MAPK and PI3 (11901189).</Paragraph>
    <Paragraph position="10"> Changes in the levels of the gene NF-kappaB is an early cellular response to inflammation. Activation of TNF-alpha (discussed above) is followed by increased transcription of NF-kappaB which in turn stimulates ERK, p38, and JNK MAPK (12878172). Also activation of NF-kappaB subsequently stimulated COX-2 and matrix metalloproteinase-9 expression (12807725).</Paragraph>
    <Paragraph position="11"> Curcumin was shown to be effective in inhibiting cell proliferation of tumorigenic and non-tumorigenic breast cancer cells (12527329) and other tumor cells (12680238). As described previously the gene COX-2 is involved in early inflammatory diabetic retinopathy (11821258). Curcumin was able to suppress COX-2 in a dose-related manner (12844482) and neutralized the effect of IL-1 beta, possibly through its effect on p38 and COX-2 and JNK (12957788). Curcumin is also a known inhibitor of JNK (12957788,12854631,12582006, 12130649, 12105223, 9674701) and a suppresser of NF-kappaB activation (11753638, 11506818, 12878172, 12825130). For example, it suppressed the induction of NF-kappaB and its dependent genes by cigarette smoke (12807725), in alcoholic liver disease (12388178) and in cultured endothelial cells (12368225).</Paragraph>
    <Paragraph position="12"> Having shown that these genes, in particular, IL-1beta, COX-2, TNF-alpha, JNK, ERK, NF-kappaB, etc., are involved in retinopathy and in regulating cell proliferation and leukocyte attachment and the breakdown of the blood-retina barrier, and having established that curcumin is capable of inhibiting the activity of these genes we hypothesize that curcumin may have therapeutic value in preventing or ameliorating a number of retinal pathologies. null Our approach has focused on specific genes, in particular to provide clues regarding the relevant biochemical pathways. In some cases the evidence is gathered in the context of other diseases such as alcoholic liver disease with the idea that similar evidence may be found for retinal diseases. In summary it seems likely that curcumin, taken in the diet or applied topically, could prove beneficial in cases of diabetic retinopathies, retinal injury, ocular inflammation and glaucoma.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Related Research
</SectionTitle>
    <Paragraph position="0"> Text mining, i.e., uncovering information that may lead to hypotheses, has attracted the attention of many researchers (eg. Andrade &amp; Valencia, 1998; Gordon &amp; Lindsay, 1996; Masys et al., 2001; Smalheiser &amp; Swanson, 1996a; Smalheiser &amp; Swanson, 1996b; Srinivasan &amp; Wedemeyer 2003; Srinivasan, 2004; Swanson, 1986; Swanson, 1988; Swanson et al., 2001; Weeber, 2000). Examples of recent text mining applications include automatically identifying viruses that may be used as bioweapons (Swanson et al., 2001), proposing therapeutic uses for thalidomide (Weeber, 2003) and finding functional connections between genes (Chaussabel &amp; Sher, 2002; Shatkay et al., 2000).</Paragraph>
    <Paragraph position="1"> A major emphasis in text mining research has been to directly exploit co-occurrence relationships in MED-LINE. For example, Jenssen et al., (2001) generate a co-occurrence based gene network called PubGene from MEDLINE for 13,712 named human genes. Each of PubGene's 139,756 links is weighted by the number of times the genes co-occur. Wilkinson and Huberman6 identify communities of genes. Starting with a co-occurrence based gene network for a particular disease domain, communities are identified by repeatedly removing edges of highest betweeness (number of shortest paths traversing the edge). Applying this to the domain of colorectal cancer, they are able to identify interesting hypotheses linking genes that were for example, in the same community but had no edge between them.</Paragraph>
    <Paragraph position="2"> Our research is based on the open discovery framework proposed by Swanson. As indicated before, Swanson and Smalheiser made several discoveries using their open and closed discovery methods (Swanson, 1986; Swanson, 1988; Swanson et al., 2001; Smalheiser &amp; Swanson, 1996a; Smalheiser &amp; Swanson, 1996b), that were later validated by bioscientists. These discoveries together offer a testbed of examples that are being used by other researchers to develop their own discovery algorithms (Gordon &amp; Lindsay, 1996; Lindsay &amp; Gordon, 1999; Srinivasan, 2004; Weeber et al., 2001).</Paragraph>
    <Paragraph position="3"> One characteristic that may be useful in distinguishing between text mining efforts is the extent to which they are problem or sub domain specific. For example, PubGene is directly targeted towards bioinformatics re6Wilkinson, D., &amp; Huberman, B. A. A method for finding communities of related genes.</Paragraph>
    <Paragraph position="4"> http://citeseer.nj.nec.com/546592.html.</Paragraph>
    <Paragraph position="5"> searchers. In contrast, implementations such as ours that derive from the open discovery framework are not problem specific. These may be used for a variety of goals, as for example by geneticists involved in understanding the results of microarray experiments and by epidemiologists searching for links between viruses and specific populations. We believe that the next generation of text mining systems will be judged not only by their effectiveness but also by their flexibility in application.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML