File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0704_metho.xml

Size: 17,414 bytes

Last Modified: 2025-10-06 14:10:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0704">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Situated Question Answering in the Clinical Domain: Selecting the Best Drug Treatment for Diseases</Title>
  <Section position="4" start_page="24" end_page="26" type="metho">
    <SectionTitle>
3 EBM and Clinical QA
</SectionTitle>
    <Paragraph position="0"> Evidence-based medicine not only supplies a process model for situating question answering capabilities, but also provides a framework for codifying the knowledge involved in retrieving answers.</Paragraph>
    <Paragraph position="1"> This section describes how the EBM paradigm provides the basis of the semantic domain model for our question answering system.</Paragraph>
    <Paragraph position="2"> Evidence-based medicine offers three facets of the clinical domain, that, when taken together, describe a model for addressing complex clinical information needs. The first facet, shown in Table 1 (left column), describes the four main tasks that physicians engage in. The second facet pertains to the structure of a well-built clinical question. Richardson et al. (1995) identify four key elements, as shown in Table 1 (middle column). These four elements are often referenced with a mnemonic PICO, which stands for Patient/Problem, Intervention, Comparison, and Outcome. Finally, the third facet serves as a tool for appraising the strength of evidence, i.e., how much confidence should a physician have in the results? For this work, we adopted a system with three levels of recommendations, as shown in Table 1 (right column).</Paragraph>
    <Paragraph position="3"> By integrating these three perspectives of evidence-based medicine, we conceptualize clinical question answering as &amp;quot;semantic unification&amp;quot; between information needs expressed in a  Clinical Tasks PICO Elements Strength of Evidence Therapy: Selecting effective treatments for patients, taking into account other factors such as risk and cost.</Paragraph>
    <Paragraph position="4"> Diagnosis: Selecting and interpreting diagnostic tests, while considering their precision, accuracy, acceptability, cost, and safety.</Paragraph>
    <Paragraph position="5"> Prognosis: Estimating the patient's likely course with time and anticipating likely complications.</Paragraph>
    <Paragraph position="6"> Etiology: Identifying the causes for a patient's disease.</Paragraph>
    <Paragraph position="7"> Patient/Problem: What is the primary problem or disease? What are the characteristics of the patient (e.g., age, gender, co-existing conditions, etc.)? Intervention: What is the main intervention (e.g., diagnostic test, medication, therapeutic procedure, etc.)? Comparison: What is the main intervention compared to (e.g., no intervention, another drug, another therapeutic procedure, a placebo, etc.)? Outcome: What is the effect of the intervention (e.g., symptoms relieved or eliminated, cost reduced, etc.)? A-level evidence is based on consistent, good quality patient-oriented evidence presented in systematic reviews, randomized controlled clinical trials, cohort studies, and metaanalyses. null B-level evidence is inconsistent, limited quality patient-oriented evidence in the same types of studies.</Paragraph>
    <Paragraph position="8"> C-level evidence is based on diseaseorientedevidenceorstudieslessrigor- null ous than randomized controlled clinical trials, cohort studies, systematic  PICO-based knowledge structure and correspondingstructuresextractedfromMEDLINEabstracts. null Naturally, this matching process should be sensitivetotheclinicaltaskandthestrengthofevidence null of the retrieved abstracts. As conceived, clinical question answering is a knowledge-intensive endeavor that requires automatic identification of PICO elements from MEDLINE abstracts.</Paragraph>
    <Paragraph position="9"> Ideally, a clinical question answering system should be capable of directly performing this semantic match on abstracts, but the size of the MEDLINE database (over 16 million citations) makes this approach currently unfeasible. As an alternative, we rely on PubMed,1 a boolean search engine provided by the National Library of Medicine, to retrieve an initial set of results that we then postprocess in greater detail--this is the standard two-stage architecture commonly-employed by many question answering systems (Hirschman and Gaizauskas, 2001). The complete architecture of our system is shown in Figure 1. The query formulation module converts the clinical question into a PubMed search query, identifies the clinical task, and extracts the appropriate PICO elements. PubMed returns an initial list of MEDLINE citations, which is analyzed by the knowledge extractor to identify clinically-relevant elements. These elements serve as input to the semantic matcher, and are compared to corresponding elements extracted from the question. Citations are then scored and the top ranking ones are returned as answers.</Paragraph>
    <Paragraph position="10">  swering system.</Paragraph>
    <Paragraph position="11"> Althoughwehaveoutlinedageneralframework for clinical question answering, the space of all possible patient care questions is immense, and attempts to develop a comprehensive system is beyond the scope of this paper. Instead, we focus on a subset of therapy questions: specifically, questions of the form &amp;quot;What is the best drug treatment for X?&amp;quot;, where X can be any disease. We have chosentotacklethisclassofquestionsbecausestudies null of physicians' question-asking behavior in natural settings have revealed that this question type occurs frequently (Ely et al., 1999). By leveraging the natural distribution of clinical questions, we canmakethegreatestimpactwiththeleastamount  of development effort. For this class of questions, we have implemented a working system with the architecture described in Figure 1. The next three sections detail each module.</Paragraph>
  </Section>
  <Section position="5" start_page="26" end_page="26" type="metho">
    <SectionTitle>
4 Query Formulator
</SectionTitle>
    <Paragraph position="0"> Since our system only handles one question type, the query formulator is relatively simple: the task is known in advance to be therapy and the ProblemPICOelementisthediseaseaskedaboutinthe null clinicalquestion. Inordertofacilitatethesemantic matchingprocess,weemployMetaMap(Aronson, 2001) to identify the concept in the UMLS ontology that corresponds to the disease; UMLS also provides alternative names and other expansions.</Paragraph>
    <Paragraph position="1"> The query formulator also generates a query to PubMed, the National Library of Medicine's boolean search engine for MEDLINE. As an example,thefollowingqueryisissuedtoretrievehits null for the disease &amp;quot;meningitis&amp;quot;:</Paragraph>
    <Paragraph position="3"> In order to get the best possible set of initial citations, we employ MeSH (Medical Subject Headings) terms when available. MeSH terms are controlled vocabulary concepts assigned manually by trained medical librarians in the indexing process (based on the full text of the article), and encode a substantial amount of knowledge about the contents of the citation. PubMed allows searches on MeSH headings, which usually yield highly accurate results. In addition, we limit retrieved citations to those that have the MeSH heading &amp;quot;drug therapy&amp;quot;and those that describe a clinical trial (another metadata field). By default, PubMed orders citations chronologically in reverse.</Paragraph>
  </Section>
  <Section position="6" start_page="26" end_page="26" type="metho">
    <SectionTitle>
5 Knowledge Extractor
</SectionTitle>
    <Paragraph position="0"> The knowledge extraction module provides the basic frame elements used in the semantic matching process, described in the next section. We employ previously-implemented components (Demner-Fushman and Lin, 2005) that identify PICO elements within a MEDLINE citation using a combination of knowledge-based and statistical machine-learning techniques. Of the four PICO elements prescribed by evidence-based medicine practitioners, only the Problem and Outcome elements are relevant for this application (there are no Interventions and Comparisons for our question type). The Problem is the main disease under consideration in an abstract, and outcomes are statements that assert clinical findings, e.g., efficacy of a drug or a comparison between two drugs. The ability to precisely identify these clinically-relevant elements provides the foundation for semantic question answering capabilities.</Paragraph>
  </Section>
  <Section position="7" start_page="26" end_page="27" type="metho">
    <SectionTitle>
6 Semantic Matcher
</SectionTitle>
    <Paragraph position="0"> Evidence-based medicine identifies three different sets of factors that must be taken into account when assessing citation relevance. These considerationsarecomputationallyoperationalizedinthe null semantic matcher, which takes as input elements identified by the knowledge extractor and scores the relevance of each PubMed citation with respect to the question. After matching, the top-scoring abstracts are presented to the physician as answers. The individual score of a citation is comprised of three components:</Paragraph>
    <Paragraph position="2"> By codifying the principles of evidence-based medicine, our semantic matcher attempts to satisfy information needs through conceptual analysis, as opposed to simple keyword matching. In the following subsections, we describe each of these components in detail.</Paragraph>
    <Section position="1" start_page="26" end_page="27" type="sub_section">
      <SectionTitle>
6.1 PICO Matching
</SectionTitle>
      <Paragraph position="0"> The score of an abstract based on PICO elements, SPICO, is broken up into two separate scores:</Paragraph>
      <Paragraph position="2"> The first component in the above equation, Sproblem, reflects a match between the primary problem in the query frame and the primary problem identified in the abstract. A score of 1 is given if the problems match exactly, based on their unique UMLS concept id (as provided by MetaMap).</Paragraph>
      <Paragraph position="3"> Matching based on concept ids addresses the issue ofterminologicalvariation. Failinganexactmatch of concept ids, a partial string match is given a score of 0.5. If the primary problem in the query has no overlap with the primary problem from the abstract, a score of[?]1 is given.</Paragraph>
      <Paragraph position="4"> The outcome-based score Soutcome is the value assigned to the highest-scoring outcome sentence,  as determined by the knowledge extractor. Since the desired outcome (i.e., improve the patient's condition) is implicit in the clinical question, our system only considers the inherent quality of outcome statements in the abstract. Given a match on the primary problem, most clinical outcomes are likely to be of interest to the physician.</Paragraph>
      <Paragraph position="5"> For the drug treatment scenario, there is no intervention or comparison, and so these elements do not contribute to the semantic matching.</Paragraph>
    </Section>
    <Section position="2" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
6.2 Strength of Evidence
</SectionTitle>
      <Paragraph position="0"> The relevance score of a citation based on the strength of evidence is calculated as follows:</Paragraph>
      <Paragraph position="2"> Citations published in core and high-impact journals such as Journal of the American Medical Association (JAMA) get a score of 0.6 for Sjournal, and 0 otherwise. In terms of the study type, Sstudy, clinical trials receive a score of 0.5; observational studies, 0.3; all non-clinical publications, [?]1.5; and 0 otherwise. The study type is directly encoded as metadata in a MEDLINE citation.</Paragraph>
      <Paragraph position="3"> Finally, recency factors into the strength of evidence score according to the formula below: Sdate = (yearpublication [?]yearcurrent)/100 (4) A mild penalty decreases the score of a citation proportionally to the time difference between the date of the search and the date of publication.</Paragraph>
    </Section>
    <Section position="3" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
6.3 MeSH Matching
</SectionTitle>
      <Paragraph position="0"> The final component of the EBM score reflects task-specificconsiderations,andiscomputedfrom MeSH terms associated with each citation:</Paragraph>
      <Paragraph position="2"> The function a(t) maps MeSH terms to positive scores for positive indicators, negative scores for negative indicators, or zero otherwise.</Paragraph>
      <Paragraph position="3"> Negative indicators include MeSH headings associated with genomics, such as &amp;quot;genetics&amp;quot; and &amp;quot;cell physiology&amp;quot;. Positive indicators for therapy were derived from the clinical query filters used in PubMed searches (Haynes et al., 1994); examples include &amp;quot;drug administration routes&amp;quot; and any of its children in the MeSH hierarchy. A score of +-1 is giveniftheMeSHdescriptororqualifierismarked as the main theme of the article (indicated via the star notation by indexers), and+-0.5 otherwise.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="27" end_page="28" type="metho">
    <SectionTitle>
7 Evaluation Methodology
</SectionTitle>
    <Paragraph position="0"> Clinical Evidence (CE) is a periodic report created by the British Medical Journal (BMJ) Publishing Group that summarizes the best treatments for a few dozen diseases at the time of publication. We were able to mine the June 2004 edition to create a test collection to evaluate our system.</Paragraph>
    <Paragraph position="1"> Note that the existence of such secondary sources does not obviate the need for clinical question answering because they are perpetually falling out of date due to rapid advances in medicine. Furthermore, such reports are currently created by highlyexperienced physicians, which is an expensive and time-consuming process. From CE, we randomly extracted thirty diseases, creating a development set of five questions and a test set of twenty-five questions. Some examples include: acute asthma, chronic prostatitis, community acquired pneumonia, and erectile dysfunction.</Paragraph>
    <Paragraph position="2"> We conducted two evaluations--one automatic and one manual--that compare the original PubMed hits and the output of our semantic matcher. The first evaluation is based on ROUGE, a commonly-used summarization metric that computes the unigram overlap between a particular text and one or more reference texts.2 The treatment overview for each disease in CE is accompanied by a number of citations (used in writing the overview itself)--the abstract texts of these cited articles serve as our references. We adopt this approach because medical journals require abstracts that provide factual information summarizing the main points of the studies. We assume that the closer an abstract is to these reference abstracts (as measured by ROUGE-1 precision), the more relevant it is. On average, each disease overview contains 48.4 citations; however, we were only able to gather abstracts of those that were contained in MEDLINE (34.7 citations per disease, min 8, max 100). For evaluation purposes, we restricted abstracts under consideration to those that were published before our edition of CE. To quantify the performance of our system, we computed the average ROUGE score over the top one, three, five, and ten hits of our EBM and baseline systems.</Paragraph>
    <Paragraph position="3"> To supplement our automatic evaluation, we also conducted a double-blind manual evaluation  The EBM column represents performance of our complete domain model. PICO, SoE, and MeSH represent performance of each component. (* denotes n.s., triangle denotes sig. at 0.95, trianglesolid denotes sig. at 0.99) PubMed results EBM-reranked results Effect of vitamin A supplementation on childhood morbidity and mortality.</Paragraph>
    <Paragraph position="4"> Intrathecal chemotherapy in carcinomatous meningitis from breast cancer.</Paragraph>
    <Paragraph position="5"> Isolated leptomeningeal carcinomatosis (carcinomatous meningitis)aftertaxane-inducedmajorremissioninpatients with advanced breast cancer.</Paragraph>
    <Paragraph position="6"> A comparison of ceftriaxone and cefuroxime for the treatment of bacterial meningitis in children.</Paragraph>
    <Paragraph position="7"> Randomised comparison of chloramphenicol, ampicillin, cefotaxime, and ceftriaxone for childhood bacterial meningitis. null The beneficial effects of early dexamethasone administration in infants and children with bacterial meningitis.  meningitis?&amp;quot;, before and after applying our semantic reranking algorithm. of the system. The top five citations from both the original PubMed results and the output of our semantic matcher were gathered, blinded, and randomized (see Table 3 for an example of top results obtained by PubMed and our system). The first author of this paper, who is a medical doctor, manually evaluated the abstracts. Since the sources of the abstracts were hidden, judgments were guaranteed to be impartial. All abstracts were evaluated on a four point scale: not relevant, marginally relevant, relevant, and highly relevant, which corresponds to a score of zero to three.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML