File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3003_metho.xml
Size: 18,673 bytes
Last Modified: 2025-10-06 14:10:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3003"> <Title>Modeling Reference Interviews as a Basis for Improving Automatic QA Systems</Title> <Section position="5" start_page="18" end_page="22" type="metho"> <SectionTitle> 3 System Capabilities </SectionTitle> <Paragraph position="0"> In order to increase the contribution of users to our question answering system, we expanded our traditional domain independent QA system by adding new capabilities that support system-user interaction.</Paragraph> <Section position="1" start_page="18" end_page="19" type="sub_section"> <SectionTitle> 3.1 Domain Independent QA </SectionTitle> <Paragraph position="0"> Our traditional domain-independent QA capability functions in two stages, the first information retrieval stage selecting a set of candidate documents, the second stage doing the answer finding within the filtered set. The answer finding process draws on models of question types and document-based knowledge to seek answers without additional feedback from the user. Again, drawing on the modeling of questions as they interact with the domain representation, the system returns answers of variable lengths on the fly in response to the nature of the question since factoid questions may be answered with a short answer, but complex questions often require longer answers. In addition, since our QA projects were based on closed collections, and since closed collections may not provide enough redundancy to allow for short answers to be returned, the variable length answer capability assists in finding answers to factoid questions. The QA system provides answers in the form of short answers, sentences, and answer-providing passages, as well as links to the full answer-providing documents. The user can provide relevance feedback by selecting the full documents that offer the best information. Using this feedback, the system can reformulate the question and look for a better set of documents from which to find an answer to the question.</Paragraph> <Paragraph position="1"> Multiple answers can be returned, giving the user a more complete picture of the information held within the collection.</Paragraph> <Paragraph position="2"> One of our first tactics to assist in both question and domain modeling for specific user needs was to develop tools for Subject Matter Experts (SMEs) to tailor our QA systems to a particular domain. Of particular interest to the interactive QA community is the Query Template Builder (QTB) and the Knowledge Base Builder (KBB).</Paragraph> <Paragraph position="3"> Both tools allow a priori alterations to question and domain modeling for a community, but are not sensitive to particular users. Then the interactive QA system permits question- and user-specific tailoring of system behavior simply because it allows subject matter experts to change the way the system understands their need at the time of the search.</Paragraph> <Paragraph position="4"> Question Template Builder (QTB) allows a subject matter expert to fine tune a question representation by adding or removing stopwords on a question-by-question basis, adding or masking expansions, or changing the answer focus. The QTB displays a list of Question-Answer types, allows the addition of new Answer Types, and allows users to select the expected answer type for specific questions. For example, the subject matter expert may want to adjust particular &quot;who&quot; questions as to whether the expected answer type is &quot;person&quot; or &quot;organization&quot;. The QTB enables organizations to identify questions for which they want human intervention and to build specialized term expansion sets for terms in the collection.</Paragraph> <Paragraph position="5"> They can also adjust the stop word list, and refine and build the Frequently or Previously Asked Question (FAQ/PAQ) collection.</Paragraph> <Paragraph position="6"> Knowledge Base Builder (KBB) is a suite of tools developed for both commercial and government customers. It allows the users to view and extract terminology that resides in their document collections. It provides useful statistics about the corpus that may indicate portions that require attention in customization. It collects frequent / important terms with categorizations to enable ontology building (semi-automatic, permitting human review), term collocation for use in identifying which sense of a word is used in the collection for use in term expansion and categorization review. KBB allows companies to tailor the QA system to the domain vocabulary and important concept types for their market. Users are able to customize their QA applications through human-assisted automatic procedures.</Paragraph> <Paragraph position="7"> The Knowledge Bases built with the tools are primarily lexical semantic taxonomic resources.</Paragraph> <Paragraph position="8"> These are used by the system in creating frame representations of the text. Using automatically harvested data, customers can review and alter categorization of names and entities and expand the underlying category taxonomy to the domain of interest. For example, in the NASA QA system, experts added categories like &quot;material&quot;, &quot;fuel&quot;, &quot;spacecraft&quot; and &quot;RLV&quot;, (Reusable Launch Vehicles). They also could specify that &quot;RLV&quot; is a subcategory of &quot;spacecraft&quot; and that space shuttles like &quot;Atlantis&quot; have category &quot;RLV&quot;. The KBB works in tandem with the QTB, where the user can find terms in either documents or example queries</Paragraph> </Section> <Section position="2" start_page="19" end_page="20" type="sub_section"> <SectionTitle> 3.2 Interactive QA Development </SectionTitle> <Paragraph position="0"> In our current NASA phase, developed for undergraduate aerospace engineering students to quickly find information in the course of their studies on reusable launch vehicles, the user can view immediate results, thus bypassing the Reference Interviewer, or they may take the opportunity to utilize its increased functionality and interact with the QA system. The capabilities we have developed, represented by modules added to the system, fall into two groups. Group One includes capabilities that draw on direct interaction with the user to clarify what is being asked and that address terminological issues. It includes Spell Checking, Expansion Clarification, and Answer Type Verification. Answers change dynamically as the user provides more input about what was meant. Group Two capabilities are dependent upon, and expand upon the user's history of interaction with the system and include User Profile, Session Tracking, Reference Resolution, Question Similarity and User Frustration Recognition modules. These gather knowledge about the user, help provide co-reference resolution within an extended dialogue, and monitor the level of frustration a user is experiencing.</Paragraph> <Paragraph position="1"> The capabilities are explained in greater detail below. Figure 1 captures the NASA system process and flow.</Paragraph> <Paragraph position="2"> Group One: In this group of interactive capabilities, after the user asks a query, answers are returned as in a typical system. If the answers presented aren't satisfactory, the system will embark on a series of interactive steps (described below) in which alternative spelling, answertypes, clarifications and expansions will be suggested. The user can choose from the system's suggestions or type in their own. The system will then revise the query and return a new set of answers. If those answers aren't satisfactory, the user can continue interacting with the system until appropriate answers are found.</Paragraph> <Paragraph position="3"> Spell checking: Terms not found in the index of the document collection are displayed as potentially misspelled words. In this preliminary phase, spelling is checked and users have the opportunity to select correct and/or alternative spellings.</Paragraph> <Paragraph position="4"> AnswerType verification: The interactive QA system displays the type of answer that the system is looking for in order to answer the question. For example for the question, Who piloted the first space shuttle?, the answer type is 'person', and the system will limit the search for candidate short answers in the collection to those that are a person's name. The user can either accept the system's understanding of the question or reject the type it suggests. This is particularly useful in semantically ambiguous questions such as &quot;Who makes Mountain Dew?&quot; where the system might interpret the question as needing a person, but the questioner actually wants the name of a company.</Paragraph> <Paragraph position="5"> Expansion: This capability allows users to review the possible relevant terms (synonyms and group members) that could enhance the question-answering process. The user can either select or deselect terms of interest which do or do not express the intent of the question. For example, if the user asks: How will aerobraking change the orbit size? then the system can bring back the following expansions for &quot;aerobraking&quot;: By aerobraking do you mean the following: 1) aeroassist, 2) aerocapture, 3) aeromaneuvering, 4) interplanetary transfer orbits, or 5) transfer orbits. Acronym Clarification: For abbreviations or acronyms within a query, the full explications known by the system for the term can be displayed back to the user. The clarifications implemented are a priori limited to those that are relevant to the domain. In the aerospace domain for example, if the question was What is used for the TPS of the RLV?, the clarifications of TPS would be thermal protection system, thermal protection subsystem, test preparation sheet, or twisted pair shielded, and the clarification of RLV would be reusable launch vehicle. The appropriate clarifications can be selected to assist in improving the search. For a more generic domain, the system would offer broader choices. For example, if the user types in the question: What educational programs does the AIAA offer?, then the system might return: By AIAA, do you mean (a) American Institute of</Paragraph> </Section> <Section position="3" start_page="20" end_page="22" type="sub_section"> <SectionTitle> User Profile: The User Profile keeps track of more </SectionTitle> <Paragraph position="0"> permanent information about the user. The profile includes a small standard set of user attributes, such as the user's name and / or research interests.</Paragraph> <Paragraph position="1"> In our commercially funded work, selected information gleaned from the question about the user was also captured in the profile. For example, if a user asks &quot;How much protein should my husband be getting every day?&quot;, the fact that the user is married can be added to their profile for future marketing, or for a new line of dialogue to ask his name or age. This information is then made available as context information for the QA system to resolve references that the user makes to themselves and their own attributes.</Paragraph> <Paragraph position="2"> For the NASA question-answering capability, to assist students in organizing their questions and results, there is an area for users to save their searches as standing queries, along with the results of searching (Davidson, 2006). This information, representing topics and areas of interest, can help to focus answer finding for new questions the user asks.</Paragraph> <Paragraph position="3"> Not yet implemented, but of interest, is the ability to save information such as a user's preferences (format, reliability, sources), that could be used as filters in the answer finding process. Reference Resolution: A basic feature of an interactive QA system is the requirement to understand the user's questions and responsive answers as one session. The sequence of questions and answers forms a natural language dialogue between the user and the system. This necessitates NLP processing at the discourse level, a primary task of which is to resolve references across the session. Building on previous work in this area done for the Context Track of TREC 2001 (Harabagiu et al, 2001) and additional work (Chai and Jin, 2004) suggesting discourse structures are needed to understand the question/answer sequence, we have developed session-based reference resolution capability. In a dialogue, the user naturally includes referring phrases that require several types of resolution.</Paragraph> <Paragraph position="4"> The simplest case is that of referring pronouns, where the user is asking a follow-up question, for example: Q1: When did Madonna enter the music business? A1: Madonna's first album, Madonna, came out in 1983 and since then she's had a string of hits, been a major influence in the music industry and become an international icon.</Paragraph> <Paragraph position="5"> Q2: When did she first move to NYC? In this question sequence, the second question contains a pronoun, &quot;she&quot;, that refers to the person &quot;Madonna&quot; mentioned both in the previous question and its answer. Reference resolution would transform the question into &quot;When did Madonna first move to NYC?&quot; Another type of referring phrase is the definite common noun phrase, as seen in the next example: programs? The second question has a definite noun phrase &quot;this company&quot; that refers to &quot;Glaxo-Wellcome, Inc.&quot; in the previous answer, thus transforming the question to &quot;Does Glaxo-Wellcome, Inc. have other assistance programs?&quot; Currently, we capture a log of the question/answer interaction, and the reference resolution capability will resolve any references in the current question that it can by using linguistic techniques on the discourse of the current session. This is almost the same as the narrative coreference resolution used in documents, with the addition of the need to understand first and second person pronouns from the dialogue context. The coreference resolution algorithm is based on standard linguistic discourse processing techniques where referring phrases and candidate resolvents are analyzed along a set of features that typically includes gender, animacy, number, person and the distance between the referring phrase and the candidate resolvent.</Paragraph> <Paragraph position="6"> Question Similarity: Question Similarity is the task of identifying when two or more questions are related. Previous studies (Boydell et al., 2005, Balfe and Smyth, 2005) on information retrieval have shown that using previously asked questions to enhance the current question is often useful for improving results among like-minded users.</Paragraph> <Paragraph position="7"> Identifying related questions is useful for finding matches to Frequently Asked Questions (FAQs) and Previously Asked Questions (PAQs) as well as detecting when a user is failing to find adequate answers and may be getting frustrated.</Paragraph> <Paragraph position="8"> Furthermore, similar questions can be used during the reference interview process to present questions that other users with similar information needs have used and any answers that they considered useful.</Paragraph> <Paragraph position="9"> CNLP's question similarity capability comprises a suite of algorithms designed to identify when two or more questions are related. The system works by analyzing each query using our Language-to-Logic (L2L) module to identify and weight keywords in the query, provide expansions and clarifications, as well as determine the focus of the question and the type of answer the user is expecting (Liddy et al., 2003). We then compute a series of similarity measures on two or more L2L queries. Our measures adopt a variety of approaches, including those that are based on keywords in the query: cosine similarity, keyword string matching, expansion analysis, and spelling variations. In addition, two measures are based on the representation of the whole query:answer type and answer frame analysis. An answer frame is our representation of the meaningful extractions contained in the query, along with metadata about where they occur and any other extractions that relate to in the query.</Paragraph> <Paragraph position="10"> Our system will then combine the weighted scores of two or more of these measures to determine a composite score for the two queries, giving more weight to a measure that testing has determined to be more useful for a particular task. We have utilized our question similarity module for two main tasks. For FAQ/PAQ (call it XAQ) matching, we use question similarity to compare the incoming question with our database of XAQs. Through empirical testing, we determined a threshold above which we consider two questions to be similar.</Paragraph> <Paragraph position="11"> Our other use of question similarity is in the area of frustration detection. The goal of frustration detection is to identify the signs a user may be giving that they are not finding relevant answers so that the system can intervene and offer alternatives before the user leaves the system, such as similar questions from other users that have been successful.</Paragraph> </Section> </Section> <Section position="6" start_page="22" end_page="22" type="metho"> <SectionTitle> 4 Implementations: </SectionTitle> <Paragraph position="0"> The refinements to our Question Answering system and the addition of interactive elements have been implemented in three different, but related working systems, one of which is strictly an enhanced IR system. None of the three incorporates all of these capabilities. In our work for MySentient, Ltd, we developed the session-based reference resolution capability, implemented the variable length and multiple answer capability, modified our processing to facilitate the building of a user profile, added FAQ/PAQ capability, and our Question Similarity capability for both FAQ/PAQ matching and frustration detection. A related project, funded by Syracuse Research Corporation, extended the user tools capability to include a User Interface for the KBB and basic processing technology. Our NASA project has seen several phases. As the project progressed, we added the relevant developed capabilities for improved performance. In the current phase, we are implementing the capabilities which draw on user choice.</Paragraph> </Section> class="xml-element"></Paper>