File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-1202_metho.xml
Size: 17,915 bytes
Last Modified: 2025-10-06 14:07:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1202"> <Title>MAYA: A Fast Question-answering System Based On A Predictive Answer Indexer*</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Previous Works </SectionTitle> <Paragraph position="0"> The current QA approaches can be classified into two groups; text-snippet extraction systems and noun-phrase extraction systems (also called closed-class QA) (Vicedo and Ferrandex, 2000).</Paragraph> <Paragraph position="1"> The text-snippet extraction approaches are based on locating and extracting the most relevant sentences or paragraphs to the query by assuming that this text will probably contain the correct answer to the query. These approaches have been the most commonly used by participants in last TREC QA Track (Ferret et al., 1999; Hull, 1999; Moldovan et al., 1999; Prager et al., 1999; Srihari and Li, 1999). ExtrAns (Berri et al., 1998) is a representative QA system in the text-snippet extraction approaches.</Paragraph> <Paragraph position="2"> The system locates the phrases in a document from which a user can infer an answer. However, it is difficult for the system to be converted into other domains because the system uses syntactic and semantic information that only covers a very limited domain (Vicedo and Ferrandex, 2000).</Paragraph> <Paragraph position="3"> The noun-phrase extraction approaches are based on finding concrete information, mainly noun phrases, requested by users' closed-class questions. A closed-class question is a question stated in natural language, which assumes some definite answer typified by a noun phrase rather than a procedural answer. MURAX (Kupiec, 1993) is one of the noun-phrase extraction systems. MURAX uses modules for the shallow linguistic analysis: a Part-Of-Speech (POS) tagger and finite-state recognizer for matching lexico-syntactic pattern. The finite-state recognizer decides users' expectations and filters out various answer hypotheses. For example, the answers to questions beginning with the word Who are likely to be people's name. Some QA systems participating in Text REtrieval Conference (TREC) use a shallow linguistic knowledge and start from similar approaches as used in MURAX (Hull, 1999; Vicedo and Ferrandex, 2000). These QA systems use specialized shallow parsers to identify the asking point (who, what, when, where, etc). However, these QA systems take a long response time because they apply some rules to each sentence including answer candidates and give each answer a score on retrieval time.</Paragraph> <Paragraph position="4"> MAYA uses shallow linguistic information such as a POS tagger, a lexico-syntactic parser similar to finite-state recognizer in MURAX and a Named Entity (NE) recognizer based on dictionaries. However, MAYA returns answer phrases in very short time compared with those previous systems because the system extracts answer candidates and gives each answer a score using pre-defined rules on indexing time.</Paragraph> </Section> <Section position="4" start_page="0" end_page="21" type="metho"> <SectionTitle> 3 MAYA Q/A approach </SectionTitle> <Paragraph position="0"> MAYA has been designed as a separate component that interfaces with a traditional IR system. In other words, it can be run without IR system. It consists of two engines; an indexing engine and a searching engine.</Paragraph> <Paragraph position="1"> The indexing engine first extracts all answer candidates from collected documents. For answer extraction, it uses the NE recognizer based on dictionaries and the finite-state automata. Then, it gives scores to the terms that surround each candidate. Next, it stores each candidate and the surrounding terms with scores in Index DataBase (DB). For example, if n surrounding terms affects a candidate, n pairs of the candidate and terms are stored into DB with n scores. As shown in Figure 1, the indexing engine keeps separate index DBs that are classified into pre-defined semantic categories (i.e. users' asking points or question types).</Paragraph> <Paragraph position="2"> The searching engine identifies a user's asking point, and selects an index DB that includes answer candidates of his/her query.</Paragraph> <Paragraph position="3"> Then, it calculates similarities between terms of his/her query and the terms surrounding the candidates. The similarities are based on p-Norm model (Salton et al., 1983). Next, it ranks the candidates according to the similarities.</Paragraph> <Paragraph position="5"> Figure 2 shows a total architecture of MAYA that combines with a traditional IR system. As shown in Figure 2, the total system has two index DBs. One is for the IR system that retrieves relevant documents, and the other is for MAYA that extracts relevant answer phrases.</Paragraph> <Paragraph position="7"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Predictive Answer indexing </SectionTitle> <Paragraph position="0"> The answer indexing phase can be separated in 2 stages; Answer-finding and Term-scoring. For answer-finding, we classify users' asking points into 14 semantic categories; person, country, address, organization, telephone number, email address, homepage Uniform Resource Locator (URL), the number of people, physical number, the number of abstract things, rate, price, date, and time. We think that the 14 semantic categories are frequently questioned in general IR systems. To extract answer candidates belonging to each category from documents, the indexing engine uses a POS tagger and a NE recognizer. The NE recognizer makes use of two dictionaries and a pattern matcher. One of the dictionaries, which is called PLO dictionary (487,782 entries), contains the names of people, countries, cities, and organizations. The other dictionary, called unit dictionary (430 entries), contains the units of length (e.g. cm, m, km), the units of weight (e.g. mg, g, kg), and others. After looking up the dictionaries, the NE recognizer assigns a semantic category to each answer candidate after disambiguation using POS tagging. For example, the NE recognizer extracts 4 answer candidates annotated with 4 semantic categories in the sentence, &quot; a42 a43 a44 a45</Paragraph> <Paragraph position="2"> the size of the storage for free email service to 6 mega-bytes.)&quot;. a67 a68 a69 a70 a71 (Yahoo Korea) belongs to organization, and a72 a73 a74 (Jinsup Yeom) is person. www.yahoo.co.kr means homepage URL, and 6 a75 a76 (6 mega-bytes) is physical number. Complex lexical candidates such as www.yahoo.co.kr are extracted by the pattern matcher. The pattern matcher extracts formed answers such as telephone number, email address, and homepage URL. The patterns are described as regular expressions. For example, Homepage URL satisfies the following</Paragraph> <Paragraph position="4"> In the next stage, the indexing engine gives scores to content words within a context window that occur with answer candidates. The maximum size of the context window is 3 sentences; a previous sentence, a current sentence, and a next sentence. The window size can be dynamically changed. When the indexing engine decides the window size, it checks whether neighboring sentences have anaphora or lexical chains.</Paragraph> <Paragraph position="6"> If the next sentence has anaphors or lexical chains of the current sentence and the current sentence does not have anaphors or lexical chains of the previous sentence, the indexing engine sets the window size as 2. Unless neighboring sentences have anaphors or lexical chains, the window size is 1. Figure 3 shows an example in which the window size is adjusted.</Paragraph> <Paragraph position="7"> The scores of the content words indicate the magnitude of influences that each content word causes to answer candidates. For example, when www.yahoo.co.kr is an answer candidate in the</Paragraph> <Paragraph position="9"> strong clue to www.yahoo.co.kr. We call the score a term score. The indexing engine assigns term scores to content words according to 5 scoring features described below.</Paragraph> <Paragraph position="10"> POS: the part-of-speech of a content word. The indexing engine gives 2 points to each content word annotated with a proper noun tag and gives 1 point to each content word annotated with other tags such as noun, number, and etc. For example, a87 a88 a89 a90 a91 (Yahoo Korea) obtains 2 points, and a92 a93 a94 (service) obtains 1 the subcategorized functions of the main verb in a sentence. The indexing engine gives 4 points to a topic word, 3 points to a subject, 2 points to an object and 1 point to the rests. The grammatical roles can be decided by case markers like a113 / a114 (un/nun), a115 / a116 (i/ga) and a117 / a118 (ul/lul) since Korean is a language with well-developed morphemic markers. For example, a119</Paragraph> <Paragraph position="12"> because it is a subject, and a124 a125 a126 (service) obtains 2 point because it is an object in the above sample sentence.</Paragraph> <Paragraph position="13"> a127 Lexical Chain: the re-occurring words in adjacent sentences. The indexing engine gives 2 points to each word that forms lexical chains and gives 1 point to others. For example, if the next sentence of the above sample sentence is</Paragraph> <Paragraph position="15"> of the service can use the free storages of 6 mega-bytes for email.)&quot;, a157 a158 a159 (service) obtains 2 points.</Paragraph> <Paragraph position="16"> a160 Distance: the distance between a sentence including a target content word and a sentence including an answer candidate. The indexing engine gives 2 points to each content word in the sentence including the answer candidate.</Paragraph> <Paragraph position="17"> The engine gives 1 point to others. For example,</Paragraph> <Paragraph position="19"> in the above sample sentence obtain 2 points respectively because the content words are in the sentence including the answer candidate, www.yahoo.co.kr.</Paragraph> <Paragraph position="20"> a169 Apposition: the IS-A relation between a content word and an answer candidate. The indexing engine extracts appositive terms by using syntactic information such as Explicit IS-A relation, Pre-modification and Post-modification. For example, a161 a162 a163 a164 a165 (Yahoo Korea) is Pre-modification relation with www.yahoo.co.kr in the above sample sentence. The indexing engine gives 2 points to each appositive word and gives 1 point to others.</Paragraph> <Paragraph position="21"> The indexing engine adds up the scores of the 5 features, as shown in Equation 1.</Paragraph> <Paragraph position="23"> tsi is the term score of the ith term, and fij is the score of the jth feature in the ith term. A, B, C, D and E are weighting factors that rank 5 features according to preference. The indexing engine uses the following ranking order: E > C > B > A > D. The weighted term scores are normalized, as shown in Equation 2.</Paragraph> <Paragraph position="25"> Equation 2 is similar to TF[?]IDF equation (Fox, 1983). In Equation 2, tsij is the term score of the ith term in the context window that is relevant to the jth answer candidate. Max_tsj is the maximum value among term scores in the context window that is relevant to the jth answer candidate. n is the number of answer candidates that are affected by the ith term. N is the number of answer candidates of the same semantic category. The indexing engine saves the normalized term scores with the position information of the relevant answer candidate in the DB. The position information includes a document number and the distance between the beginning of the document and the answer candidate. As a result, the indexing engine creates 14 DB's that correspond to the 14 semantic categories. We call them answer DB's.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Lexico-syntactic Query processing </SectionTitle> <Paragraph position="0"> In the query processing stage, the searching engine takes a user's question and converts it into a suitable form, using a semantic dictionary, called a query dictionary. The query dictionary contains the semantic markers of words. Query words are converted into semantic markers before pattern matching. For example, the query &quot; a170 a171 a172 a173 a174 a175 a176 a177 a178 a179 a180 a181 a182 a183 ? (Who is the CEO of Yahoo Korea?)&quot; is translated into &quot; a184 a185 a186 a187 a188 j % a189 a190 j % a191 a192 jp ef sf (%who auxiliary-verb %person preposition Yahoo Korea symbol)&quot;. In the example, % a193 a190 (%person) and % a191 a192 (%who) are the semantic markers. The content words out of the query dictionary keep their lexical forms. The functional words (e.g. auxiliary verb, preposition) are converted into POS's. After conversion, the searching engine matches the converted query against one of 88 lexico-syntactic patterns, and classifies the query into the one of 14 semantic categories. When two or more patterns match the query, the searching engine returns the first matched category.</Paragraph> <Paragraph position="1"> for person category. The above sample query matches the first pattern in Figure 4.</Paragraph> <Paragraph position="2"> After classifying the query into a semantic category, the searching engine calculates the term scores of the content words in the query.</Paragraph> <Paragraph position="3"> As shown in Rule 1, the term scores are computed by some heuristic rules, and the range of the term scores is between 0 and 1. Using the heuristic rules, the searching engine gives high scores to content words that focus a user's intention. For example, when a user inputs the query &quot; a214 a215 a216 a217 a218 a219 a220 a221 a222 a223 a224 a225 ? (In what year is Yahoo founded?)&quot;, he/she wants to know only the year, rather than the organizer or the URL of Yahoo. So, the QA searching engine gives a higher score to a226 a227 (year) than to a228 a229 (Yahoo) in contrast to the ordinary IR searching engine.</Paragraph> <Paragraph position="4"> 1. The last content word in a sentence receives a high score. For example, a230 a231 (CEO) in &quot; a232 a233 a234 a231 a235 ? (The CEO of Yahoo?)&quot; receives a high score.</Paragraph> <Paragraph position="5"> 2. The next content words of specific interrogatives such as a236 a237 (which), a238 a239 (what) receive high scores. For example, a240 (mountain) in &quot; a241 a242 a243 a244 a245 a246 a247 a248 a249 ? (Which mountain is the highest?)&quot; receives a high score.</Paragraph> <Paragraph position="6"> 3. The next content words of specific prepositions like a250 a251 (about) receive low scores, and the previous content words receive high scores. For example, the score of a252 a253 (article) in &quot; a254 a255 a0 a1 a2 a3 a4 (the article about China)&quot; is lower than that of a5 a6 (China).</Paragraph> <Paragraph position="7"> Rule 1. Heuristic rules for scoring query terms</Paragraph> </Section> <Section position="3" start_page="0" end_page="21" type="sub_section"> <SectionTitle> 3.3 Answer scoring and ranking </SectionTitle> <Paragraph position="0"> The searching engine calculates the similarities between query and answer candidates, and ranks the answer candidates according to the similarities. To check the similarities, the searching engine uses the AND operation of a well-known p-Norm model (Salton et al., 1983), as shown in Equation 3.</Paragraph> <Paragraph position="2"> 2211 )1()1()1(1),( (3) In Equation 3, A is an answer candidate, and ai is the ith term score in the context window of the answer candidate. ai is stored in the answer DB. qi is the ith term score in the query. p is the P-value in the p-Norm model.</Paragraph> <Paragraph position="3"> It takes a relatively short time for answer scoring and ranking phase because the indexing engine has already calculated the scores of the terms that affect answer candidates. In other words, the searching engine simply adds up the weights of co-occurring terms, as shown in Equation 3. Then, the engine ranks answer candidates according to the similarities. The method for answer scoring is similar to the method for document scoring of traditional IR engines. However, MAYA is different in that it indexes, retrieves, and ranks answer candidates, but not documents.</Paragraph> <Paragraph position="4"> We can easily combine MAYA with a traditional IR system because MAYA has been designed by a separate component that interfaces with the IR system. We implemented an IR system that is based on TF[?]IDF weight and p-Norm model (Lee et al., 1999).</Paragraph> <Paragraph position="5"> To improve the precision rate of the IR system, we combine MAYA with the IR system.</Paragraph> <Paragraph position="6"> The total system merges the outputs of MAYA with the outputs of the IR system. MAYA can produce multiple similarity values per document if two or more answer candidates are within a document. However, the IR system produces a similarity value per document. Therefore, the total system adds up the similarity value of the IR system and the maximum similarity value of MAYA, as shown in Equation 4.</Paragraph> <Paragraph position="7"> ba ba + [?]+[?]= ),(),(),( QAQAsimQDIRsimQDSim id (4) In Equation 4, QAsimd(Ai,Q) is the similarity value between query Q and the ith answer candidate Ai in document d. IRsim(D,Q) is the similarity value between query Q and document D. a8 and a9 are weighting factors. We set a10 and a9 to 0.3 and 0.7.</Paragraph> <Paragraph position="8"> The total system ranks the retrieved documents by using the combined similarity values, and shows the sentences including answer candidates in the documents.</Paragraph> </Section> </Section> class="xml-element"></Paper>