File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3006_metho.xml
Size: 20,190 bytes
Last Modified: 2025-10-06 14:10:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3006"> <Title>Answering questions of Information Access Dialogue (IAD) task using ellipsis handling of follow-up questions</Title> <Section position="3" start_page="41" end_page="45" type="metho"> <SectionTitle> 2 Ellipsis handling </SectionTitle> <Paragraph position="0"> In this section, we explain what kinds of ellipsis patterns exist in the follow-up questions of a series of questions and how to resolve each ellipsis to apply them to core QA system.</Paragraph> <Section position="1" start_page="41" end_page="43" type="sub_section"> <SectionTitle> 2.1 Ellipsis in questions </SectionTitle> <Paragraph position="0"> We have analyzed 319 questions (46sets) which were used in subtask3 of QAC1 and QAC2 and then, classi ed ellipsis patterns into 3 types as follows: Replacing with pronoun In this pattern, pronoun is used in a follow-up question and this pronoun refers an element or answer of the previous question.</Paragraph> <Paragraph position="1"> Ex1-1 a0a2a1a4a3a6a5a8a7a10a9a12a11a2a13a15a14a17a16a19a18a12a20a2a21a23a22 (Who is the president of America?) Ex1-2 a24a26a25a28a27a17a29a31a30a33a32a35a34 a7a12a14a12a36a38a37a8a18a38a20a39a21a40a22 (When did it become independent?) In the above example, pronoun a24a4a25 (it) of question Ex1-2 refers a word a0a31a1a15a3a6a5 (America) of question Ex1-1. The question Ex1-2 should be</Paragraph> <Paragraph position="3"> America become independent?) in a completed form.</Paragraph> <Paragraph position="4"> Ex2-1 a0a2a1a4a3a6a5a8a7a10a9a12a11a2a13a15a14a17a16a19a18a12a20a2a21a23a22 (Who is the president of America?) Ex2-2 a47 a7a39a48a35a49a12a50a41a14a52a51 a25 a18a12a20a2a21a23a22 (Where is his birth place?) In the above example, pronoun a47 (his) of question Ex2-2 refers an answer word a53a55a54a17a56a31a57 (J. Bush) of question Ex2-1. The question Ex2-2 should be a53a40a54a58a56a35a57 a7a23a48a59a49a43a50a26a14a55a51 a25 a18a17a20a38a21a43a22 (Where is J. Bush's birth place?) in a completed form.</Paragraph> <Paragraph position="5"> Ellipsis of an obligatory case element of verb In this pattern, an obligatory case element verb in follow-up question is omitted, and the omitted element refers an element or answer of the previous question. An example of this pattern is as follows: Ex3-1 a0a2a1a4a3a6a5a8a7a10a9a12a11a2a13a15a14a17a16a19a18a12a20a2a21a23a22 (Who is the president of America?) Ex3-2 a36a38a37a46a60a31a61 a32a43a62a63a32a28a34 a21a23a22 (When did ph inaugurate?) In the above example, the verb a60a63a61a41a20a4a64 (inaugurate) has two obligatory case frames agent and goal , and the elements of each case frame are omitted. The element of agent is the answer of Ex3-1, and the element of goal is a9a19a11a19a13 (the President) of Ex3-1. Therefore, Ex3-2 should be (the answer of Ex3-1) a14a38a36a40a37a46a9a23a11a23a13a15a65a35a60a12a61 a32a28a62a55a32 a21a46a22 (When did (the answer of Ex3-1) inaugurated as the President?) .</Paragraph> <Paragraph position="6"> Ellipsis of a modi er or modi cand This pattern is the case of ellipsis of modi er. When there is modi cation relation between two words of a question, either of them (modifying element or the modi ed element) modi es an element of the next question but is omitted. We call the modifying element modi er and we call the modi ed element modi cand. The following example shows ellipsis of modi er.</Paragraph> <Paragraph position="7"> Ex4-1 a0a2a1a4a3a6a5a8a7a10a9a12a11a2a13a15a14a17a16a19a18a12a20a2a21a23a22 (Who is the president of America?) Ex4-2 a66a10a67a2a68a15a69 a14a28a16a19a18a12a20a2a21a23a22 (Who is a minister of state?) In the above example, the word a0a23a1a55a3a58a5 (America) is modi er of a9a39a11a8a13 (the president) in the question Ex4-1. Then, the word a0a19a1a4a3a70a5 (America) also modi es a66a43a67a2a68a52a69 (a minister of state) of Ex4-2 and is also omitted. The question Ex4-2 should be a0a12a1a15a3a58a5a23a7 a66a45a67a38a68a26a69 a14a71a16a8a18a46a20a23a21a72a22 (Who is a minister of state of America?) .</Paragraph> <Paragraph position="8"> The following example shows ellipsis of modi cand. null Ex5-1 a0a2a1a4a3a6a5a8a7a10a9a12a11a2a13a15a14a17a16a19a18a12a20a2a21a23a22 (Who is the president of America?)</Paragraph> <Paragraph position="10"> (Who is ph of France?) In this example, the word a9a40a11a46a13 (the president) is modi cand of the word a0a39a1a77a3a59a5 (America) in the question Ex5-1. In the question Ex5-2, the word a73a8a74a55a75a35a76 (France) should modi es the word a9 a11a2a13 (the president) which is omitted in the question Ex5-2. Then the question Ex5-2 should be a73 We will show ellipsis resolution method of these three patterns. For the rst pattern, we replace the pronoun with a word which referred by it. For the second pattern, we try to ll up obligatory case frames of the verb. For the third pattern, we take a word from the previous question based on co-occurrence frequency. We assumed that the antecedent of an elliptical question exists in a question which appears just before, so the previous question indicates immediately previous question in our method. We show the process as follows: Step1 Estimate the pattern of ellipsis: When a follow-up question has pronoun, this is the case of the rst pattern. When a follow-up question has some verb which has an omitted case element, this is the case of the second pattern. When a follow-up question has no pronoun and such a verb, this is the case of the third pattern.</Paragraph> <Paragraph position="11"> Step2 Estimate kinds of the omitted word: Step2a When the ellipsis pattern is the rst pattern: Estimate the kind of word which the pronoun refers. When the pronoun directly indicates kinds of word (ex: a47 : he), depend on it. If the pronoun does not directly indicate kinds of word (ex: a24 a7 :its +noun), use the kind of the word which exists just behind the pronoun.</Paragraph> <Paragraph position="12"> Step2b When the ellipsis pattern is the second pattern: null Estimate obligatory case frame of the verb of the follow-up question. Then, estimate omitted element of the case frame and the type of the element.</Paragraph> <Paragraph position="13"> Step2c When the ellipsis pattern is the third pattern: Get a noun X which appears with Japanese particle a14 (ha) 1 in the follow-up question.</Paragraph> <Paragraph position="14"> When compound noun appears with a14 (ha) , the last word is assumed to be X. Then, collect words which are modi er or modi cand of X from corpus. If the same word as collected words is in the previous question, take over the word and skip step3. Otherwise, estimate the kind of word which is suitable to modi er (or modi cand) of X. Estimate the kind of collected modi ers and modi cands, and adopt one which has the highest frequency.</Paragraph> <Paragraph position="15"> Step3 Decide the succeeded word of the previous question: Estimate type of answer of previous question 2 and kind of each word used in previous question from rear to front. When a word has a kind t for the estimate in step2, take the word to follow-up question.</Paragraph> <Paragraph position="16"> We have used thesauruses of EDR dictionary to estimate the kind of words, obligatory case frame of verbs, omitted element of case frame, and to collect modi er and modi cand of a word. Details are as follows: Estimation of word type We used EDR Japanese Word Dictionary and EDR Concept Dictionary. Japanese Word Dictionary records Japanese words and its detailed concept as Concept Code, and Concept Dictionary records each Concept Code and its upper concept. We check get its detailed concept code. Then, we generalize type of the word using concept code of Concept Dictionary. null For example, concept code of a word a78a12a79 (company) is 3ce735 which means a group of people combined together for business or trade . We will check its upper concept using Concept Dictionary, for example, upper concept of 3ce735 is 4449f5, upper concept of 4449f5 is 30f74c, and so on. Finally, we can get word type of 3ce735 as 3aa912 which means agent (self-functioning entity) . Therefore, we can estimate that type of word a78a23a79 (company) is an agent.</Paragraph> <Paragraph position="17"> Estimation of obligatory case frame of verb and omitted element We will use EDR Japanese Cooccurrence Dictionary for estimation of omitted case element. Japanese Cooccurrence Dictionary contains information of verb case frame and concept code with Japanese particle for each case. We will check obligatory case frame and omitted element. Firstly, we check a verb with Japanese Cooccurrence Dictionary and get its case frame, concept code and particle information. Then we can recognize omitted case element by particle information and estimate word type of omitted element.</Paragraph> <Paragraph position="18"> For example, according to the Japanese Cooccurrence Dictionary, a verb a60a15a61a4a20a80a64 (inaugurate) has two case frames, agent (30f6b0) and goal estimate that agent and goal are omitted. Then, we estimate kind of the omitted element same as Estimation of kind of words .</Paragraph> <Paragraph position="19"> Collection of modi er and modi cand</Paragraph> </Section> <Section position="2" start_page="43" end_page="44" type="sub_section"> <SectionTitle> Japanese Cooccurrence Dictionary contains </SectionTitle> <Paragraph position="0"> Japanese co-occurrence data of various modi cations. We will use the co-occurrence data to collect modi er or modi cand of word X. Details as follows: 1. Search X a7 (no) noun (noun of X) and noun a7 (no) X (X of noun) pattern from Japanese Cooccurrence Dictionary 2. When Y appears in the Y a7 (no) X (X of Y) pattern, we can estimate Y as modi er of X. 3. When Y appears in the X a7 (no) Y (Y of X) pattern, we can estimate Y as modi cand of X. We will show above examples of ellipsis handling in the following.</Paragraph> <Paragraph position="1"> (When did America become independent?) In the above example, Ex1-2 has a pronoun a24a39a25 (it) , so we classi ed ellipsis pattern of Ex1-2 into the rst pattern. Pronoun a24a55a25 (it) refers organization or location by information of pronoun. The word a0a19a1a4a3a59a5 (America) has information of location but the word a9a31a11a31a13 (the president) are not organization or location. Then we can estimate that pronoun a24a26a25 (it) of Ex1-2 refers the word a0a8a1 (When did (answer of Ex3-1) inaugurated?) In the above example, Ex3-2 has a verb a60a39a61a63a20 a64 (inaugurate) , so we classi ed ellipsis pattern of Ex3-2 into the second pattern. The word a60a19a61a26a20 a64 (inaugurate) has two obligatory case: agent (human) and goal (managerial position). Ex3-2 doesn't have word which is suitable for obligatory cases of a60a31a61a26a20a41a64 (inaugurate) . Therefore we estimate that the agent and the goal are omitted. Then, we estimate answer type of Ex3-1 and kind of each word of Ex3-1. The answer type of Ex3-1 is human, so it 3Exm-n' indicates complemented question of Exm-n is suitable for the agent. The kind of a9a39a11a2a13 (the president) is managerial position, so it is suitable for the goal. Finally, we take the answer of Ex3- null (Who is a minister of state of America?) In the above example, Ex4-2 doesn't have any pronoun and verb, so we classi ed ellipsis pattern of Ex4-2 into the third pattern. Then we search noun</Paragraph> <Paragraph position="3"> noun (noun of a minister) pattern from the Japanese Cooccurrence Dictionary. In the Japanese Cooccurrence Dictionary, we can nd a0a12a1a41a3a42a5a23a7 a66a45a67a23a68a63a69 (a minister of America) pattern. a0a8a1a41a3a6a5 (America) is used in Ex4-1, so we take over a0a26a1a83a3a45a5</Paragraph> </Section> <Section position="3" start_page="44" end_page="45" type="sub_section"> <SectionTitle> 3.1 Evaluation method </SectionTitle> <Paragraph position="0"> We have evaluated our QA system only on ellipses handling. The following example shows question sets of the Formal Run and Reference Run. In Qmn, m and n indicates series ID and its question number which we gave and Rm-n indicates a question which correspond to Qm-n.</Paragraph> <Paragraph position="1"> In IAD task, one series of questions consists of the rst question and several follow-up questions which contain ellipsis. In our current implementation, we assumed that antecedent of an elliptical question exists in its just before question. For example, the antecedent of Q1-2 is a84a8a85a108a86a80a87a10a88a15a89a8a88 (Mt.Fuji radar) of Q1-1. The antecedent of Q1-4 is a84a31a85 a86a77a87a71a88a39a89a40a88 (Mt.Fuji radar) of Q1-1 actually, however, if Q1-3 is completed correctly (as R1-3), a84a72a85 a86a77a87a71a88a26a89a40a88 (Mt.Fuji radar) exists in Q1-3. Therefore, we prepared evaluation data from QAC test set, 310 pairs of questions. One pair consists of a question of Reference Run and a question of Formal Run.</Paragraph> <Paragraph position="2"> For example, R1-1 and Q1-2 is one pair of the evaluation data, R1-3 and Q1-4 is other one. We have evaluated our method using this data. Correctness has been judged by human. When the system must take an answer of previous question, we have used <ANS> which indicates the answer of previous question. 4</Paragraph> </Section> <Section position="4" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 3.2 Results </SectionTitle> <Paragraph position="0"> Our system could complete 52 of 310 questions correctly as results. 28 among 52 success cases are done by ellipsis handling method proposed in the previous QAC evaluation. Our previous approach is based on topic presentation in question sentences.</Paragraph> <Paragraph position="1"> If there is an ellipsis in a question, we will use information of topic information in the previous question. Topic presentation is detected by Japanese particle a14 (ha) . The other cases of 24 were succeeded by the approach described above. We will show the details as follows: * Replacing with pronoun: System classi ed 88 of 310 questions in this pattern. The all of 88 classi cations were correct. 12 of 88 questions were completed correctly. null * Ellipsis of an obligatory case element of verb: System classi ed 158 of 310 questions as this pattern. 105 of 158 classi cations were correct.</Paragraph> <Paragraph position="2"> 8 of 105 questions were completed correctly.</Paragraph> <Paragraph position="3"> * Ellipsis of a modi er or modi cand: System classi ed 64 of 310 questions as this pattern. 44 of 64 classi cations were correct. 4 of 44 questions were completed correctly.</Paragraph> <Paragraph position="4"> Major failure cases and their numbers which are indicated with dots are as follows: Failure of classi cation of ellipsis pattern</Paragraph> </Section> </Section> <Section position="4" start_page="45" end_page="47" type="metho"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> Our system could work well for some elliptical questions as described in the previous section. We will show some examples and detail of major failure analysis results in the following.</Paragraph> <Paragraph position="1"> ll up its obligatory cases because every obligatory cases of this verb had already lled up. It is necessary to handle these delexical verbs such as a36a15a64 , a102a15a64 , a36a108a94 and so on as stop words.</Paragraph> <Paragraph position="2"> Otherwise, there were several questions in which all obligatory cases of verb has already lled up. In this case, it is necessary to apply the other approach. In the example name who attended opening event in the rst day?) , some additional information for opening event is omitted. Moreover, there were some verbs which had no case information in EDR dictionary. It would be helpful to check co-occurrence with this word in the previous question.</Paragraph> <Paragraph position="3"> 2. Morphological analysis failure: The expression a24a15a25 a18 (sokode) in question sentence was recognized as one conjunction a24a55a25 a18 (then) although it should be analyzed in a24a55a25 (soko: there) + a18 (de: at) . If morphological analyzer works well, our algorithm could handle ellipsis correctly.</Paragraph> <Paragraph position="4"> 3. Lack of rules for pronoun: In the expression a25 a7a35a123a63a124 a76a40a125a8a88a12a56a15a126a71a75 (this space station) of question sentence, ellipsis handling rule for pronoun a25 a7 (this) was not implemented, then our method could not handle this case. It is necessary to expand our algorithm for this case.</Paragraph> <Paragraph position="5"> In the above example (q1 is the rst question and q2 is follow-up question), system checks obligatory case elements of verb a139a144a143 (write) of question q1. The verb a139a145a143 has three obligatory cases: agent, object and goal according to EDR dictionary. System estimated that every obligatory case element were omitted, and checks a127a2a128a26a129a26a130a19a131 (Ms. Sawako Agawa) , a132a8a133a17a76a41a134a23a88 (TV caster) , a132a39a133a72a76 a134a23a88 (TV caster) respectively. However, object case of verb a139a146a143 was a68a39a140a8a141a8a142 (long novel) of question q2 actually. In this question, this element was modi ed by verb a139a147a143 (write) , then system failed to estimate that the object was already lled. So, our algorithm tried to ll this object case up as a132a39a133a17a76a15a134a23a88 In the above example, q3 is the rst question and q4 is the follow-up question. The question q4 is replaced with q4' using ellipsis handling. In this case, system took wrong modi er a112a70a148a63a149a63a150a4a151 (Nikko Toshogu) for a158 a159a41a74a2a159a161a117 (highlight) . It is caused by lack of co-occurrence information in EDR Japanese Cooccurrence Dictionary because these words are proper nouns which are not frequently used. In order to handle such cases, it is necessary to use co-occurrence information using large corpus. null 6. Passive verb expression: In our current implementation, our system has no rule to handle passive verb. In case of passive voice, it is necessary to check other case element for ellipsis handling.</Paragraph> <Paragraph position="6"> 7. Multiple candidates: In the above example, q5 is the rst question and q6 is the follow-up question. The question q6 is replaced with q6' using ellipsis handling rules. System replaced a47 (his) of q6 with the answer of q5. Because a47 (his) refers human and the answer type of q5 is human, and the answer of q5 was the nearest word which suitable to a47 (his) . But, a47 (his) referred a162 a3 a75a41a163 a165a82a167a10a168a23a115 (Mr. Colin Powell) actually. In this case, a162 a3 a75a164a163a166a165a81a167a10a168a23a115 (Mr. Colin Powell) was the topic of q5, so a162 a3 a75a55a163a175a165a77a167a35a168a72a115 (Mr. Colin Powell) would be better one than the answer of q5. Topic information handling would be implemented in our algorithm.</Paragraph> </Section> <Section position="5" start_page="47" end_page="47" type="metho"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we have presented ellipsis handling method for follow-up questions in IAD task. We have classi ed ellipsis pattern of question sentences into three types and proposed ellipsis handling algorithm for each type. In the evaluation using Formal Run and Reference Run data, there were several cases which our algorithm could not handle ellipsis correctly. According to the analysis of evaluation results, the main reason of low performance was lack of word information for recognition of referential elements. If our system can recognize word meanings correctly, some errors will not occur and ellipsis handling works well.</Paragraph> <Paragraph position="1"> We have already improved our ellipsis handling method with recognition of target question. In the evaluation of QAC3, our system searches elliptical element in the previous question. However, we have not tested this new algorithm using test correction.</Paragraph> <Paragraph position="2"> In the future work, we will test this algorithm and apply it for other QA application.</Paragraph> </Section> class="xml-element"></Paper>