File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1604_intro.xml

Size: 6,245 bytes

Last Modified: 2025-10-06 14:01:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1604">
  <Title>Exploiting Paraphrases in a Question Answering System</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
6. Inference.
</SectionTitle>
    <Paragraph position="0"> The stapler costs $10. / The price of the stapler is $10.</Paragraph>
    <Paragraph position="1"> Where is Thimphu located? / Thimphu is capital of what country? Of course combinations of the di erent types are possible, e.g. Oswald killed Kennedy / Kennedy was assassinated by Oswald is a combination of (1) and (2).</Paragraph>
    <Paragraph position="2"> Di erent types of knowledge and di erent linguistic resources are needed to deal with each of the above types. While type (1) can be dealt with using a resource such as WordNet (Fellbaum, 1998), type (2) needs e ective parsing and mapping of syntactic structures into a common deeper structure, possibly using a repository of nominalisations like NOMLEX (Meyers et al., 1998). More complex approaches are needed for the other types, up to type (6) where generic world knowledge is required, for instance to know that being a capital of a country implies being located in that country. 1 Such world knowledge could be expressed in the form of axioms, like the following: (X costs Y) iff (the price of X is Y) In this paper we focus on the role of paraphrases in a Question Answering (QA) system targeted at 1Note that the reverse is not true, and therefore this is not a perfect paraphrase.</Paragraph>
    <Paragraph position="3"> technical manuals. Technical documentation is characterised by vast amounts of domain-speci c terminology, which needs to be exploited for providing intelligent access to the information contained in the manuals (Rinaldi et al., 2002b). The approach taken by QA systems is to allow a user to ask a query (formulated in natural language) and have the system search a background collection of documents in order to locate an answer. The eld of Question Answering has ourished in recent years2, in part, due to the QA track of the TREC competitions (Voorhees and Harman, 2001). These competitions evaluate systems over a common data set allowing developers to benchmark performance in relation to other competitors.</Paragraph>
    <Paragraph position="4"> It is a common assumption that technical terminology is subject to strict controls and cannot vary within a given editing process. However this assumption proves all too often to be incorrect. Unless editors are making use of a terminology control system that forces them to use a speci c version of a term, they will naturally tend to use various paraphrases to refer to the intended domain concept. Besides in a query a user could use an arbitrary paraphrases of the target term, which might happen to be one of those used in the manual itself or might happen to be a novel one.</Paragraph>
    <Paragraph position="5"> We describe some potential solutions to this problem, taking our Question Answering system as an example. We show which bene ts our approach based on paraphrases bring to the system. So far two different domains have been targeted by the system.</Paragraph>
    <Paragraph position="6"> An initial application aims at answering questions about the Unix man pages (Moll a et al., 2000a; Moll a et al., 2000b). A more complex application targets the Aircraft Maintenance Manual (AMM) of the Airbus A320 (Rinaldi et al., 2002b). Recently we have started new work, using the Linux HOWTOs as a new target domain.</Paragraph>
    <Paragraph position="7"> In dealing with these domains we have identi ed two major obstacles for a QA system, which we can summarise as follows:</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
The Parsing Problem
The Paraphrase Problem
The Parsing Problem consists in the increased
</SectionTitle>
      <Paragraph position="0"> di culty of parsing text in a technical domain due to domain-speci c sublanguage. Various types of multi word expressions characterise these domains, in particular referring to speci c concepts like tools, parts or procedures. These multi word expressions might 2Although early work in AI already touched upon the topic, e.g. (Woods, 1977).</Paragraph>
      <Paragraph position="1"> include lexical items which are either unknown to a generic lexicon (e.g. coax cable) or have a speci c meaning unique to this domain. Abbreviations and acronyms are another common source of inconsistencies. In such cases the parser might either fail to identify the compound as a phrase and consequently fail to parse the sentence including such items. Alternatively the parser might attempt to 'guess' their lexical category (in the set of open class categories), leading to an exponential growth of the number of possible syntactic parses. Not only the internal structure of the compound can be multi-way ambiguous, even the boundaries of the compounds might be di cult to detect and the parsers might try odd combinations of the tokens belonging to the compounds with neighbouring tokens.</Paragraph>
      <Paragraph position="2"> The Paraphrase Problem resides in the imperfect knowledge of users of the systems, who cannot be expected to be completely familiar with the domain terminology. Even experienced users, who know very well the domain, might not remember the exact wording of a compound and use a paraphrase to refer to the underlying domain concept. Besides even in the manual itself, unless the editors have been forced to use some strict terminology control system, various paraphrases of the same compound will appear, and they need to be identi ed as co-referent.</Paragraph>
      <Paragraph position="3"> However, it is not enough to identify all paraphrases within the manual, novel paraphrases might be created by the users each time they query the system.</Paragraph>
      <Paragraph position="4"> In the rest of this paper we describe rst our Question Answering System (in Section 2) and brie y show how we solved the rst of the two problems described above. Then, in Section 3 we show in detail how the system is capable of coping with the Paraphrase Problem. Finally in Section 4 we discuss some related work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML