File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0508_intro.xml

Size: 3,550 bytes

Last Modified: 2025-10-06 14:02:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0508">
  <Title>Answering Questions in the Genomics Domain</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> One of the core problems in exploiting scientific papers in research and clinical settings is that the knowledge that they contain is not easily accessible. Although various resources which attempt to consolidate such knowledge are being created (e.g. UMLS1, SWISS-PROT, OMIM, GeneOntology, GenBank, LocusLink), the amount of information available keeps growing exponentially (Stapley and Benoit, 2000).</Paragraph>
    <Paragraph position="1"> There is accordingly a pressing need for intelligent systems capable of accessing that information in an efficient and user-friendly way. Question Answering systems aim at providing a focused way to access the information contained in a document collection. Specific research in the area of Question Answering has been prompted in the last few years in particular by the Question Answering track of the Text REtrieval Conference (TREC-QA) competitions (Voorhees, 2001). The TREC-QA competitions focus on open-domain systems, i.e. systems that can (potentially) answer any generic question.</Paragraph>
    <Paragraph position="2"> As these competitions are based on large volumes of text, the competing systems (normally) resort to a relatively shallow text analysis.2 In contrast a question answering system working on a restricted domain can take advantage of the formatting and style  conventions in the text, can make use of the specific domain-dependent terminology, and of full parsing.</Paragraph>
    <Paragraph position="3"> In many restricted domains, including technical documentation and research papers, terminology plays a pivotal role. This is in fact one of the major differences between restricted domains and open domain texts. While in open domain systems Named Entities play a major role, in technical documentation, as well as in research papers, they have a secondary role, by contrast a far greater role is played by domain terminology. Terminology is a major obstacle for processing research papers and at the same time a key access path to the knowledge encoded in those papers. Terminology provides the means to name and access domain-specific concepts and objects.</Paragraph>
    <Paragraph position="4"> Restricted domains present the additional problem of &amp;quot;domain navigation&amp;quot;. Users of the system cannot always be expected to be completely familiar with the domain terminology. Unfamiliarity with domain terminology might lead to questions which contain imperfect formulations of domain terms. It becomes therefore essential to be able to detect terminological variants and exploit the relations between terms (like synonymy, meronymy, antonymy). The process of variation is well investigated in terminological research (Daille et al., 1996). In the Biomedical domain, an example of a system that deals with terminological variants (also called &amp;quot;aliases&amp;quot;) can be found in (Pustejovsky et al., 2002).</Paragraph>
    <Paragraph position="5"> In the rest of this paper we will first briefly describe our existing Question Answering system, ExtrAns (section 2). In the following section (3) we detail the specific problems encountered in the new domain and the steps that we have taken to solve them. We conclude the paper with an overview of related research (section 4).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML