File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-1007_metho.xml
Size: 19,150 bytes
Last Modified: 2025-10-06 14:08:08
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1007"> <Title>Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP&quot;, Proceedings of</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Clarification dialogues in Question </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Answering Question Answering Systems aim to determine an </SectionTitle> <Paragraph position="0"> answer to a question by searching for a response in a collection of documents (see Voorhees 2002 for an overview of current systems). In order to achieve this (see for example Harabagiu et al. 2002), systems narrow down the search by using information retrieval techniques to select a subset of documents, or paragraphs within documents, containing keywords from the question and a concept which corresponds to the correct question type (e.g. a question starting with the word &quot;Who?&quot; would require an answer containing a person). The exact answer sentence is then sought by either attempting to unify the answer semantically with the question, through some kind of logical transformation (e.g. Moldovan and Rus 2001) or by some form of pattern matching (e.g. Soubbotin 2002; Harabagiu et al. 1999).</Paragraph> <Paragraph position="1"> Often, though, a single question is not enough to meet user's goals and an elaboration or clarification dialogue is required, i.e. a dialogue with the user which would enable the answering system to refine its understanding of the questioner's needs (for reasons of space we shall not investigate here the difference between elaboration dialogues, clarification dialogues and coherent topical subdialogues and we shall hence refer to this type of dialogue simply as &quot;clarification dialogue&quot;, noting that this may not be entirely satisfactory from a theoretical linguistic point of view). While a number of researchers have looked at clarification dialogue from a theoretical point of view (e.g. Ginzburg 1998; Ginzburg and Sag 2000; van Beek at al. 1993), or from the point of view of task oriented dialogue within a narrow domain (e.g.</Paragraph> <Paragraph position="2"> Ardissono and Sestero 1996), we are not aware of any work on clarification dialogue for open domain question answering systems such as the ones presented at the TREC workshops, apart from the experiments carried out for the (subsequently abandoned) &quot;context&quot; task in the TREC-10 QA workshop (Voorhees 2002; Harabagiu et al. 2002). Here we seek to partially address this problem by looking at some particular aspect of clarification dialogues in the context of open domain question answering. In particular, we examine the problem of recognizing that a clarification dialogue is occurring, i.e. how to recognize that the current question under consideration is part of a previous series (i.e.</Paragraph> <Paragraph position="3"> clarifying previous questions) or the start of a new series; we then show how the recognition that a clarification dialogue is occurring can simplify the problem of answer retrieval.</Paragraph> </Section> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The TREC Context Experiments </SectionTitle> <Paragraph position="0"> The TREC-2001 QA track included a &quot;context&quot; task which aimed at testing systems' ability to track context through a series of questions (Voorhees 2002). In other words, systems were required to respond correctly to a kind of clarification dialogue in which a full understanding of questions depended on an understanding of previous questions. In order to test the ability to answer such questions correctly, a total of 42 questions were prepared by NIST staff, divided into 10 series of related question sentences which therefore constituted a type of clarification dialogue; the sentences varied in length between 3 and 8 questions, with an average of 4 questions per dialogue. These clarification dialogues were however presented to the question answering systems already classified and hence systems did not need to recognize that clarification was actually taking place. Consequently systems that simply looked for an answer in the subset of documents retrieved for the first question in a series performed well without any understanding of the fact that the questions constituted a coherent series.</Paragraph> <Paragraph position="1"> In a more realistic approach, systems would not be informed in advance of the start and end of a series of clarification questions and would not be able to use this information to limit the subset of documents in which an answer is to be sought.</Paragraph> <Paragraph position="2"> 3 Analysis of the TREC context questions We manually analysed the TREC context question collection in order to determine what features could be used to determine the start and end of a question series, with the following conclusions: * Pronouns and possessive adjectives: questions such as &quot;When was it born?&quot;, which followed &quot;What was the first transgenic mammal?&quot;, were referring to some previously mentioned object through a pronoun (&quot;it&quot;). The use of personal pronouns (&quot;he&quot;, &quot;it&quot;, ...) and possessive adjectives (&quot;his&quot;, &quot;her&quot;,...) which did not have any referent in the question under consideration was therefore considered an indication of a clarification question..</Paragraph> <Paragraph position="3"> * Absence of verbs: questions such as &quot;On what body of water?&quot; clearly referred to some previous question or answer.</Paragraph> <Paragraph position="4"> * Repetition of proper nouns: the question series starting with &quot;What type of vessel was the modern Varyag?&quot; had a follow-up question &quot;How long was the Varyag?&quot;, where the repetition of the proper noun indicates that the same subject matter is under investigation.</Paragraph> <Paragraph position="5"> * Importance of semantic relations: the first question series started with the question &quot;Which museum in Florence was damaged by a major bomb explosion?&quot;; follow-up questions included &quot;How many people were killed?&quot; and &quot;How much explosive was used?&quot;, where there is a clear semantic relation between the &quot;explosion&quot; of the initial question and the &quot;killing&quot; and &quot;explosive&quot; of the following questions. Questions belonging to a series were &quot;about&quot; the same subject, and this aboutness could be seen in the use of semantically related words.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Experiments in Clarification Dialogue Recognition </SectionTitle> <Paragraph position="0"> It was therefore speculated that an algorithm which made use of these features would successfully recognize the occurrence of clarification dialogue. Given that the only available data was the collection of &quot;context&quot; questions used in TREC-10, it was felt necessary to collect further data in order to test our algorithm rigorously. This was necessary both because of the small number of questions in the TREC data and the fact that there was no guarantee that an algorithm built for this dataset would perform well on &quot;real&quot; user questions. A collection of 253 questions was therefore put together by asking potential users to seek information on a particular topic by asking a prototype question answering system a series of questions, with &quot;cue&quot; questions derived from the TREC question collection given as starting points for the dialogues.</Paragraph> <Paragraph position="1"> These questions made up 24 clarification dialogues, varying in length from 3 questions to 23, with an average length of 12 questions (the data is available from the main author upon request).</Paragraph> <Paragraph position="2"> The differences between the TREC &quot;context&quot; collection and the new collection are summarized in the following table: Groups Qs Av. len Max Min TREC 10 41 4 8 4 New 24 253 12 23 3 The questions were recorded and manually tagged to recognize the occurrence of clarification dialogue.</Paragraph> <Paragraph position="3"> The questions thus collected were then fed into a system implementing the algorithm, with no indication as to where a clarification dialogue occurred. The system then attempted to recognize the occurrence of a clarification dialogue. Finally the results given by the system were compared to the manually recognized clarification dialogue tags. In particular the algorithm was evaluated for its capacity to: * recognize a new series of questions (i.e. to tell that the current question is not a clarification of any previous question) (indicated by New in the results table) * recognize that the current question is clarifying a previous question (indicated by Clarification in the table)</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Clarification Recognition Algorithm </SectionTitle> <Paragraph position="0"> Our approach to clarification dialogue recognition looks at certain features of the question currently under consideration (e.g. pronouns and proper nouns) and compares the meaning of the current question with the meanings of previous questions to determine whether they are &quot;about&quot; the same matter.</Paragraph> <Paragraph position="1"> Given a question q0 and n previously asked questions q-1..q-n we have a function Clarification_Question which is true if a question is considered a clarification of a previously asked question. In the light of empirical work such as (Ginzburg 1998), which indicates that questioners do not usually refer back to questions which are very distant, we only considered the set of the previously mentioned 10 questions.</Paragraph> <Paragraph position="2"> A question is deemed to be a clarification of a previous question if: 1. There are direct references to nouns mentioned in the previous n questions through the use of pronouns (he, she, it, ...) or possessive adjectives (his, her, its...) which have no references in the current question.</Paragraph> <Paragraph position="3"> 2. The question does not contain any verbs 3. There are explicit references to proper and common nouns mentioned in the previous n questions, i.e. repetitions which refer to an identical object; or there is a strong sentence similarity between the current question and the previously asked questions.</Paragraph> <Paragraph position="4"> is true if 1. q0 has pronoun and possessive adjective references to q-1..q-n 2. q0 does not contain any verbs 3. q0 has repetition of common or proper nouns in q-1..q-n or q0 has a strong semantic similarity to some q [?] q-1..q-n</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Sentence Similarity Metric </SectionTitle> <Paragraph position="0"> A major part of our clarification dialogue recognition algorithm is the sentence similarity metric which looks at the similarity in meaning between the current question and previous questions. WordNet (Miller 1999; Fellbaum 1998), a lexical database which organizes words into synsets, sets of synonymous words, and specifies a number of relationships such as hypernym, synonym, meronym which can exist between the synsets in the lexicon, has been shown to be fruitful in the calculation of semantic similarity. One approach has been to determine similarity by calculating the length of the path or relations connecting the words which constitute sentences (see for example Green 1997 and Hirst and St-Onge 1998); different approaches have been proposed (for an evaluation see (Budanitsky and Hirst 2001)), either using all WordNet relations (Budanitsky and Hirst 2001) or only is-a relations (Resnik 1995; Jiang and Conrath 1997; Mihalcea and Moldvoan 1999). Miller (1999), Harabagiu et al. (2002) and De Boni and Manandhar (2002) found WordNet glosses, considered as micro-contexts, to be useful in determining conceptual similarity. (Lee et al. 2002) have applied conceptual similarity to the Question Answering task, giving an answer A a score dependent on the number of matching terms in A and the question.</Paragraph> <Paragraph position="1"> Our sentence similarity measure followed on these ideas, adding to the use of WordNet relations, part-of-speech information, compound noun and word frequency information.</Paragraph> <Paragraph position="2"> In particular, sentence similarity was considered as a function which took as arguments a sentence s1 and a second sentence s2 and returned a value representing the semantic relevance of s1 in respect of s2 in the context of knowledge B, i.e.</Paragraph> <Paragraph position="4"> less relevant than s2 in respect to the sentence s and the context B. In our experiments, B was taken to be the set of semantic relations given by WordNet. Clearly, the use of a different knowledge base would give different results, depending on its completeness and correctness.</Paragraph> <Paragraph position="5"> In order to calculate the semantic similarity between a sentence s1 and another sentence s2, s1 and s2 were considered as sets P and Q of word stems. The similarity between each word in the question and each word in the answer was then calculated and the sum of the closest matches gave the overall similarity. In other words, given two sets Q and P, where Q={qw1,qw2,...,qwn} and P={pw1,pw2,...,pwm}, the similarity between Q and P is given by 1<p<n Argmaxm similarity( qwp, pwm) The function similarity( w1, w2) maps the stems of the two words w1 and w2 to a similarity measure m representing how semantically related the two words are; similarity( wi, wj)< similarity( wi, wk) represents the fact that the word wj is less semantically related than wk in respect to the word wi. In particular similarity=0 if two words are not at all semantically related and similarity=1 if the words are the same.</Paragraph> <Paragraph position="6"> similarity( w1, w2) = h [?]a0 where 0 [?] h [?] 1. In particular, similarity( w1, w2) = 0 if w1[?]ST [?] w2[?]ST, where ST is a set containing a number of stop-words (e.g. &quot;the&quot;, &quot;a&quot;, &quot;to&quot;) which are too common to be able to be usefully employed to estimate semantic similarity. In all other cases, h is calculated as follows: the words w1 and w2 are compared using all the available WordNet relationships (is-a, satellite, similar, pertains, meronym, entails, etc.), with the additional relationship, &quot;same-as&quot;, which indicated that two words were identical. Each relationship is given a weighting indicating how related two words are, with a &quot;same as&quot; relationship indicating the closest relationship, followed by synonym relationships, hypernym, hyponym, then satellite, meronym, pertains, entails.</Paragraph> <Paragraph position="7"> So, for example, given the question &quot;Who went to the mountains yesterday?&quot; and the second question &quot;Did Fred walk to the big mountain and then to mount Pleasant?&quot;, Q would be the set {who, go, to, the, mountain, yesterday} and P would be the set {Did, Fred, walk, to, the, big, mountain, and, then, to, mount, Pleasant}.</Paragraph> <Paragraph position="8"> In order to calculate similarity the algorithm would consider each word in turn. &quot;Who&quot; would be ignored as it is a common word and hence part of the list of stopwords. &quot;Go&quot; would be related to &quot;walk&quot; in a is-a relationship and receive a score h1. &quot;To&quot; and &quot;the&quot; would be found in the list of stop-words and ignored.</Paragraph> <Paragraph position="9"> &quot;Mountain&quot; would be considered most similar to &quot;mountain&quot; (same-as relationship) and receive a score h2: &quot;mount&quot; would be in a synonym relationship with &quot;mountain&quot; and give a lower score, so it is ignored. &quot;Yesterday&quot; would receive a score of 0 as there are no semantically related words in Q. The similarity measure of Q in respect to P would therefore be given by h1 + h2.</Paragraph> <Paragraph position="10"> In order to improve performance of the similarity measure, additional information was considered in addition to simple word matching (see De Boni and Manandhar 2003 for a complete discussion): * Compound noun information. The motivation behind is similar to the reason for using chunking information, i.e. the fact that the word &quot;United&quot; in &quot;United States&quot; should not be considered similar to &quot;United&quot; as in &quot;Manchester United&quot;. As opposed to when using chunking information, however, when using noun compound information, the compound is considered a single word, as opposed to a group of words: chunking and compound noun information may therefore be combined as in &quot;[the [United States] official team]&quot;.</Paragraph> <Paragraph position="11"> * Proper noun information. The intuition behind this is that titles (of books, films, etc.) should not be confused with the &quot;normal&quot; use of the same words: &quot;blue lagoon&quot; as in the sentence &quot;the film Blue Lagoon was rather strange&quot; should not be considered as similar to the same words in the sentence &quot;they swan in the blue lagoon&quot; as they are to the sentence &quot;I enjoyed Blue Lagoon when I was younger&quot;.</Paragraph> <Paragraph position="12"> * Word frequency information. This is a step beyond the use of stop-words, following the intuition that the more a word is common the less it is useful in determining similarity between sentence. So, given the sentences &quot;metatheoretical reasoning is common in philosophy&quot; and &quot;metatheoretical arguments are common in philosophy&quot;, the word &quot;metatheoretical&quot; should be considered more important in determining relevance than the words &quot;common&quot;, &quot;philosophy&quot; and &quot;is&quot; as it is much more rare and therefore less probably found in irrelevant sentences. Word frequency data was taken from the Given that the questions examined were generic queries which did not necessarily refer to a specific set of documents, the word frequency for individual words was taken to be the word frequency given in the British National Corpus (see BNCFreq 2003). The top 100 words, making up 43% of the English Language, were then used as stop-words and were not used in calculating semantic similarity.</Paragraph> </Section> class="xml-element"></Paper>