File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/j97-1006_evalu.xml
Size: 32,236 bytes
Last Modified: 2025-10-06 14:00:19
<?xml version="1.0" standalone="yes"?> <Paper uid="J97-1006"> <Title>Smith and Gordon Human-Computer Dialogue INPUTS OUTP\[.q ~ Current Computer Goal Current User Focus Dialog Mode Computer Response Selection Algorithm Selected Task Goal</Title> <Section position="9" start_page="155" end_page="165" type="evalu"> <SectionTitle> 6. Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="155" end_page="155" type="sub_section"> <SectionTitle> 6.1 Data Inclusion and Statistical Analysis </SectionTitle> <Paragraph position="0"> Subjects attempted a total of 141 dialogues, of which 118 or 84% were completed successfully. 9 The average speech rate by subjects was 2.9 sentences per minute, and the average task completion time for successful dialogues was 6.5 minutes. The system had an average response time of 8.1 seconds during the formal experiment. Later, a faster parsing algorithm was implemented and the system was ported to a SPARC II workstation from the Sun 4 used during the experiment. During test dialogues using the enhanced system, average response time was 2.2 seconds.</Paragraph> <Paragraph position="1"> In general, differences in user behavior depending on the level of computer initiative were observed. When the computer operated in declarative mode--yielding the initiative to human users, who could then take advantage of their acquired expertise-the dialogues: * were completed faster (4.5 minutes versus 8.5 minutes).</Paragraph> <Paragraph position="2"> * had fewer user-utterances per dialogue (10.7 versus 27.6).</Paragraph> <Paragraph position="3"> * had users speaking longer utterances (63% of the user-utterances were multiword versus 40% in directive mode).</Paragraph> <Paragraph position="4"> While users given the initiative in the final session were somewhat more efficient at completing the dialogues than users given the initiative in the second session (completing dialogues approximately 1.5 minutes faster and speaking on average 2.7 fewer utterances), the large standard deviations, which ranged from 50% to 90% of the associated sample means, and the small number of subjects tested indicate that we should use caution in generalizing from our results.</Paragraph> <Paragraph position="5"> Unless explicitly noted, the results on human subjects' linguistic behavior that will be reported throughout this section are based only on the 118 dialogues that were successfully completed. While the 23 incomplete dialogues also contain interesting phenomena, we chose to focus the analysis on the completed dialogues, as they represent the linguistic record of successful interactions with the system. In reality, there are only slight differences in the results when the unsuccessful dialogues are included. Furthermore, a valid statistical analysis could only be performed on the completed dialogues. Reporting data values from only the successful dialogues maintains consistency with the reported statistical values.</Paragraph> </Section> <Section position="2" start_page="155" end_page="158" type="sub_section"> <SectionTitle> 6.2 Utterance Classification into Subdialogues </SectionTitle> <Paragraph position="0"> 6.2.1 Hypotheses. For users to take the initiative in the task domain, they must have some expertise in the domain. Once this expertise is gained, and the computer yields task control to the human user, it is expected that users will exploit the situation to restrict the dialogue to specific issues of interest. Presumably, such users have substantial knowledge about the general behavior of the circuit, how to determine when 9 Due to time constraints, not all subjects were able to attempt all possible dialogues. Only three of the eight subjects successfully completed all possible dialogues. Of the 23 dialogues not completed, 22 were terminated prematurely due to excessive time being spent on the dialogue. Misunderstandings due to misrecognition were the cause in 13 of these failures. Misunderstandings due to inadequate grammar coverage occurred in 3 of the failures. In 4 of the failures, the subject misconnected a wire. In one failure there was confusion by the subject about when the circuit was working, and in another failure there were problems with the system software. A hardware failure caused termination of the final dialogue. it is working, and the basic nature of repairs, but will need some assistance with diagnosing specific problems. Consequently, we would expect the following differences between modes for users who are able to take the initiative: * Introduction Subdialogue: The number of utterances will change little, since problem introduction seems independent of initiative.</Paragraph> <Paragraph position="1"> * Assessment Subdialogue: The number of utterances will be reduced slightly in declarative mode, as users who take the initiative may exploit their control of the dialogue to carry out some preliminary steps without verbal interaction.</Paragraph> <Paragraph position="2"> * Diagnosis Subdialogue: The number of utterances will change little, since all users presumably need the computer's assistance in problem diagnosis.</Paragraph> <Paragraph position="3"> * Repair Subdialogue: The change should be dependent on the task domain. If the repair process is basically the same once the error is diagnosed, few utterances will be required as repairs can be done without discussion. If the repair process is highly dependent on the type of error (e.g., debugging a program), even the skilled user may require significant advice from the system. For our domain, we expect a reduction in the number of utterances spoken in declarative mode, since the repair process (adding a wire) is similar across the different problem types.</Paragraph> <Paragraph position="4"> * Test Subdialogue: The number of utterances is significantly reduced (i.e., users who take the initiative can verify the circuit behavior without dialogue).</Paragraph> <Paragraph position="5"> 6.2.2 Overall Averages. Table 2 shows the average and relative number of utterances spoken per dialogue in each of the main task subdialogues. The reported data combine both computer and user utterances. Note that virtually no utterances were ever spoken during the Repair phase of declarative mode dialogues. This is because the repair process was always the addition of a missing wire to the circuit, a process that users quickly became able to do without explicit guidance. However, since not many utterances were spoken in the Repair phase of the directive mode dialogues either, the major source of the reduction in the absolute number of utterances spoken per dialogue occurred in the Assessment, Diagnosis, and Test phases, especially the Test phase. Although we originally expected little change in the number of utterances as a function of initiative for the Diagnosis phase, the large increase in the number of utterances spoken for that phase for problem 6, during directive mode interactions had a major impact on the overall averages. Excluding problem 6, the average number of utterances spoken in the Diagnosis phase was 9.4 in directive mode and 7.2 in declarative mode.</Paragraph> <Paragraph position="6"> mental sessions are balanced (Section 4.2), we must distinguish between the first five problems of each session, where there was a single missing wire in each problem and problems 6 through 8 in each session, which have two missing wires. Not all subjects completed the same number of dialogues for problems 6 through 8 in the two experimental sessions. Consequently, including them in the computation of the average number of utterances spoken in a given subdialogue phase would distort the averages used in a statistical analysis} deg Therefore, we apply the statistical technique of analysis of variance (ANOVA) to the data from the first five problems of each session, the single-missing-wire problems. This represents a total of 60 completed dialogues. A 2 X 4 design (mode X subdialogue phase) was used (the Introduction phase was omitted). 11 Table 3 summarizes the results of the statistical analysis. The analysis was conducted using the averages by subjects as well as by items (problems). The individual main effects showed very strong statistical significance under both forms of analysis while the interaction effect of mode and subdialogue phase also appears to be statistically significant, but not quite as strongly as the main effects individually. We now turn our attention to the order effect. Did the order in which subjects were given the initiative affect their performance? ment problems according to type, such that problem k of both sessions 2 and 3 was the same type of problem. Furthermore, we balanced the subjects also. Half the subjects used the system when it was operating in directive mode for session 2 while the other half used the system when it was operating in declarative mode for session 2. The mode was, of course, reversed for session 3 for both groups. One of our claims has been that as users gain experience and are given the initiative by the system, they Smith and Gordon Human-Computer Dialogue will take advantage of that. We might expect then, that subjects given the initiative in session 3 would behave differently than subjects given the initiative in session 2. Furthermore, we might expect difficulties for subjects given the initiative in session 2 who then had to work with the system in directive mode in session 3. What do we find in the results? We conducted a paired t-test on the paired differences ~2 in the average number of utterances spoken per dialogue between the two modes, as a function of the problem number. Computing this test statistic for the two subdialogue phases in the domain where we would expect additional experience to have the most effect, Assessment and Diagnosis, yields the following results. For the Assessment phase, the test statistic is 0.854 with a corresponding p value of 0.42 for 7 degrees of freedom. For the Diagnosis phase, the test statistic is 0.556 with a corresponding p value of 0.60. Consequently, we do not find that the order in which a subject was given the initiative has a significant effect on the number of utterances spoken in a given subdialogue phase. We do not find this result surprising because: Some expertise was gained during the preliminary training session, so some subjects were ready to be given initiative in session 2. In fact, the two subjects who struggled with using declarative mode in session 2 only contribute 5 of the 48 declarative mode data points used in computing the averages.</Paragraph> <Paragraph position="7"> Some subjects, as part of their expertise, developed a somewhat ritualistic style of interaction with the machine, which may have lengthened their interactions.</Paragraph> </Section> <Section position="3" start_page="158" end_page="159" type="sub_section"> <SectionTitle> 6.3 User Initiation of Subdialogue Transitions </SectionTitle> <Paragraph position="0"> When the computer has total control of the dialogue, in directive mode, it is expected that the computer will initiate the transitions between subdialogues. How will this change when the computer operates in declarative mode and control is given back to the user? While user control means the user's goals have priority, it does not necessarily mean the user will initiate every transition from one subdialogue to the next. The user controls the dialogue but still requires computer assistance. Consequently, it is expected that the computer will still initiate many of the transitions to the Assessment and Diagnosis phases in order to provide assistance in these areas, but that the user will be able to transition to other subdialogues as deemed appropriate. In particular, it is expected that the user will initiate most of the transitions to the final Test phase for confirming circuit behavior, since an experienced user would have learned how the circuit should function.</Paragraph> <Paragraph position="1"> These hypotheses are generally supported by the results in Table 4. When the computer had the initiative (the directive mode dialogues), very few subdialogue transitions were ever initiated by the user other than to the final Test phase when the repair would cause the circuit to begin to function normally. When the computer yielded the initiative (the declarative mode dialogues), users initiated the transition 12 For example, the value of 12 for problem 3 in the Assessment phase for subjects who operated in declarative mode in session 2 and directive mode in session 3 is obtained by subtracting the declarative mode average for the number of Assessment utterances spoken per dialogue, 9, from the directive mode average, 21. This value would be paired with the value 8 (18 - 10) also for problem 3 in the Assessment phase, but for subjects who operated in directive mode in session 2 and declarative mode in session 3.</Paragraph> <Paragraph position="2"> to the final stage of the dialogue almost every time. In the intermediate stages, the computer still initiated most subdialogues, but users occasionally felt compelled to cause a change to a different phase. This rarely happened when the computer had the initiative. Not counting the Introduction, which had to be initiated by the computer, only 9% of all subdialogues in directive mode were initiated by the user while 37% of the subdialogues in declarative mode were user-initiated.</Paragraph> </Section> <Section position="4" start_page="159" end_page="160" type="sub_section"> <SectionTitle> 6.4 General Subdialogue Transitions </SectionTitle> <Paragraph position="0"> As described in Section 5.2, the natural course of transition from subdialogue to sub-dialogue is described by the following regular expression: I+A+(D+R*T+)nF where n represents the number of individual repairs in the problem (i.e., number of missing wires in our domain). If every dialogue followed this model, then we would expect to see all transitions out of the Introduction phase go to the Assessment phase, all transitions out of the Assessment phase go to the Diagnosis phase, and all transitions out of the Repair phase go to the Test phase. However, with the potential for miscommunication as well as the potential for users to exploit their expertise and control of the dialogue to skip discussion of some task steps, it is highly unlikely that the actual results will follow the idealized model. Where might we see differences? Table 5 shows the actual breakdown in percentages. The row value represents the initial subdialogue phase and the column represents the new subdialogue. The F column represents the finished state (i.e., dialogue completion). For example, the percentage of all transitions out of the Diagnosis phase that went to the Assessment phase is 18.8% in directive mode and 38.8% in declarative mode. The X entries along the main diagonal represent impossible exit transitions (i.e., there cannot be a transition from Diagnosis to Diagnosis). The &quot;--&quot; entries represent values of less than 5%. 13 If the dialogues follow the transition model, then the largest entries should be in the values in the diagonal just above the main diagonal. The resulting largest entry in each row is noted in boldface.</Paragraph> <Paragraph position="1"> For the most part, the percentages are consistent with the model, especially in the early phase transitions and in the transitions out of the Test subdialogue. Based on the relative number of completed dialogues that required the repair of two missing wires (17 in directive mode, 21 in declarative mode), the expected percentage of transitions from Test-to-Diagnosis would be 22.7% in directive mode and 25.9% in declarative mode. TM The actual values of 24.7% and 24.1% compare favorably with the expected results. The large relative difference in percentages for transitions from Diagnosis to either Repair or Test in the two modes is also expected, given that users who take the initiative can make the repair themselves without discussing it with the computer. The transition percentages that are most surprising are the Diagnosis-to-Assessment transitions in both modes and the Test-to-Repair transitions in directive mode. The Diagnosis-to-Assessment transitions are indicative of attempts at error correction. That is, at some point during Diagnosis either the computer or the user becomes suspicious of the initial problem assessment and consequently moves back to Assessment to be sure that the erroneous circuit behavior is properly understood. The Test-to-Repair transition is common when the user makes the repair without mentioning it. That is, the user has prematurely moved from Repair to Test without notifying tile computer that the repair has actually been made. In directive mode dialogues, the computer will require verbal verification of the repair before transitioning to the Test phase.</Paragraph> <Paragraph position="2"> In general, 64% of the dialogues in directive mode have no &quot;unusual&quot; transitions (where we define unusual as a transition not described by our model). In contrast, only 33% of the declarative mode dialogues had no unusual transitions, again demonstrating how users felt free to skip steps without discussion. This particularly increased as users gained more experience, with only 26% of the 35 declarative dialogues of the final session containing no unusual transitions.</Paragraph> </Section> <Section position="5" start_page="160" end_page="163" type="sub_section"> <SectionTitle> 6.5 Task Control versus Linguistic Control </SectionTitle> <Paragraph position="0"> As described in Section 2.1, our view of initiative concerns which participant's task goals currently have priority. Walker and Whittaker's (1990) study of mixed-initiative dialogue used a notion of control based on linguistic goals as specified in the control rules first presented in Section 2.3 and repeated below. These rules are a function of 14 These Test-to-Diagnosis transitions occur because after repairing one of the missing wires, the Test phase would show that the circuit is still not working due to the other missing wire, causing a transition back to the Diagnosis phase to discover the other problem.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 23, Number 1 the classification of the linguistic goal of the current utterance (Assertion, Command, Question, or Prompt) and reflect the status of initiative after the utterance was made.</Paragraph> <Paragraph position="3"> Assertion: The speaker has the initiative unless the utterance is a response to a Question.</Paragraph> <Paragraph position="4"> Command: The speaker has the initiative.</Paragraph> <Paragraph position="5"> Question: The speaker has the initiative unless the utterance is a response to a question or command.</Paragraph> <Paragraph position="6"> Prompt: The hearer has the initiative.</Paragraph> <Paragraph position="7"> We analyzed our dialogues using this notion of control with one modification-assertions that were a continuation of the current topic left the initiative unchanged. Consider the following dialogue excerpt: C: The LED is supposed to be displaying an alternately flashing one and seven.</Paragraph> <Paragraph position="8"> U: The LED is off.</Paragraph> <Paragraph position="9"> C: The power is on when the switch is up.</Paragraph> <Paragraph position="10"> U: The switch is up.</Paragraph> <Paragraph position="11"> C: The switch is connecting to the battery when there is a wire between connectors 111 and 120.</Paragraph> <Paragraph position="12"> In both cases the user's assertions continue the topic introduced by the computer and do not cause a change of control. Contrast this with the following: C: The LED is supposed to be displaying an alternately flashing one and seven.</Paragraph> <Paragraph position="13"> U: There is no wire between connector eight four and connector nine nine.</Paragraph> <Paragraph position="14"> C: There is supposed to be a wire between connector 84 and connector 99.</Paragraph> <Paragraph position="15"> In this case the user's assertion does change control, as it is a change of topic. Our rule modification reflects this issue.</Paragraph> <Paragraph position="16"> 6.5.1 Hypotheses. The two primary measures reported by Walker and Whittaker are average number of utterances between control shifts and percent of total utterances controlled by the computer. Their results for task-oriented dialogues about constructing a water pump showed that experts had control of the dialogue about 90% of the time. In contrast, their results for advisory dialogues where clients talked to an expert over the phone to obtain assistance in diagnosing and repairing various software faults showed that experts had control only about 50% of the time. While our problem domain is more similar to the advisory dialogues, the nature of our dialogues is more similar to the task-oriented dialogues as the task of circuit repair is being completed concurrently with the dialogue. Therefore, we expect the computer to show strong linguistic control when it has task initiative. Conversely, when users control the task initiative, we expect more assertions by the user concerning the user's own task goals, rather than direct responses to computer questions or commands. Nevertheless, because the computer is the ultimate expert, we still expect it to respond with assertions of facts designed to assist the user that take a linguistic form that would be classified as continuing or regaining linguistic control (e.g., &quot;The power is on when the switch is up,&quot; from the first excerpt). The net effect should be that user task control in declarative mode will lead to more frequent linguistic control shifts although the computer will still have overall control of most utterances.</Paragraph> <Paragraph position="17"> tic control only 14.3% of the time in declarative mode, this is much more often than in directive mode. Correspondingly, the average number of utterances between control shifts is reduced by a factor of almost 4.8. A detailed examination shows that 79% of the 248 control shifts were caused either by the user attempting to correct a computer misunderstanding (Section 6.6.2) or by the user initiating a task topic change by asserting new task information. These types of control shifts occurred once every 4.4 user-utterances in declarative mode, but only once every 32.0 user-utterances in directive mode. The remaining control shifts were due to requests for repetition of the previous utterance or requests for other information. Table 7 presents the mean difference in the average number of utterances between control shifts for each of the balanced problems. Thus, the value 21.2 for problem i means that the difference in the average number of utterances between control shifts was greater by 21.2 utterances in directive mode over declarative mode. These results show that there is a relationship between our notion of task control and the Whittaker and Stenton (1988) notion of linguistic control evaluated by Walker and Whittaker (1990)--namely, that as users exploit their task expertise, linguistic control shifts occur much more frequently. This result may prove useful as a possible cue for when the system needs to release task initiative to the user during a mixed-initiative dialogue--as linguistic control shifts begin to occur more frequently, it may be an indicator that a user is gaining experience and can take more overall control of the dialogue. Further development and testing of this hypothesis are needed.</Paragraph> </Section> <Section position="6" start_page="163" end_page="164" type="sub_section"> <SectionTitle> Computational Linguistics Volume 23, Number 1 6.6 The Impact of Miscommunication </SectionTitle> <Paragraph position="0"> One important phenomenon of interactive dialogue that has recently begun to receive attention in the computational linguistics community is the handling of miscommunication (e.g., McRoy and Hirst \[1995\], Brennan and Hulteen \[1995\], and Lambert and Carberry \[1992\]). In the Circuit Fix-It Shop the computer misunderstood user-utterances 18.5% of the time. The primary cause of these misunderstandings was the misrecognition of the words spoken by the user--only 50% of the user's utterances were correctly recognized word for word. Consequently, misunderstanding occurred more often in declarative mode (24.7% of user-utterances) than directive mode (15.0% of user-utterances). This is due to the fact that, on average, users spoke longer utterances in declarative mode. Speech recognition technology has improved dramatically since this system was tested, but the need for handling miscommunication is still relevant as users and designers will continually test the performance limits of available technology. Human-human communication frequently contains miscommunication, so we should expect it in human-computer dialogue as well. For the current system, how did miscommunication impact on the dialogue structure? 6.6.1 Frequency of Experimenter Interaction. As mentioned in Section 4.4, when the computer made a serious misinterpretation the experimenter was allowed to tell the user about the computer's erroneous interpretation without telling the user what to do. Computer misinterpretation of the user's utterances due to misrecognition of words can cause confusion between the user and computer, and ultimately, failure of the dialogue. With the computer running in declarative mode, the experimenter chose to make such statements once every 8.5 user-utterances, but only once every 26.5 user-utterances in directive mode. Not all misrecognitions required experimenter interaction. 15 the user of a serious misrecognition leaves the responsibility with the user to try to correct the computer's misunderstanding. It is hypothesized that when the computer has yielded the initiative, users are more likely to attempt to redirect the computer's focus when an error situation occurs. Conversely, users will tend to give up trying to redirect the computer's attention when the computer has the initiative because the machine will proceed on its own line of reasoning, ignoring what it perceives as user interrupts even when these interrupts are actually attempts at resolving previous miscommunications. This is borne out by the results. Overall, while the computer was operating in directive mode, the user attempted to correct only 24% of the misunderstandings for which the user received notification. In contrast, while the computer was operating in declarative mode, the user attempted to correct 52% of the misunderstandings.</Paragraph> <Paragraph position="1"> 15 As reported in Smith and Gordon (1996), there were a total of 250 misunderstandings in declarative mode, 215 for which the experimenter was allowed to notify the user. The experimenter chose to intervene in 118 of these or 54% of the time. In contrast, there were a total of 276 misunderstandings in directive mode, 226 for which the experimenter was allowed to notify the user. In only 69 or 30.5% of these misunderstandings did the experimenter notify the user. The difference in the relative number of notifications is largely due to the fact that, in directive mode, the computer frequently ignored the statements it misunderstood, as the misunderstandings often were in conflict with the computer's current task goal. Consequently, it was unnecessary for the experimenter to notify the user about such misunderstandings since they would not cause a problem. On the other hand, confusion between computer and user was much more likely in declarative mode because the computer would more frequently formulate a response based on its erroneous interpretation of the user's input. In these cases, there was a greater need for the experimenter to notify the user of the misunderstanding.</Paragraph> </Section> <Section position="7" start_page="164" end_page="164" type="sub_section"> <SectionTitle> Smith and Gordon Human-Computer Dialogue 6.7 Summary of Results </SectionTitle> <Paragraph position="0"> What general conclusions can we draw from this analysis? Based on the evaluation of the Circuit Fix-It Shop at two different levels of initiative, we have observed the following phenomena: * Directive mode dialogues tend to follow an orderly pattern consisting largely of computer-initiated subdialogue transitions, terse user responses, and predictable subdialogue transitions. However, the inflexibility of this mode is a severe drawback in the presence of user-correctable miscommunications.</Paragraph> <Paragraph position="1"> * Declarative mode dialogues are shorter but less orderly, consisting of more user-initiated subdialogue transitions. There is evidence that users are willing to modify their behavior as they gain expertise, provided the computer allows it. The ability to yield the initiative as users gain experience is essential if a dialogue system is to be useful in practical applications involving repeat users.</Paragraph> <Paragraph position="2"> * The small number of subjects and the design of the experiment make it difficult to observe differences within a given level of initiative as subjects gain additional expertise. Nevertheless, in a practical environment we believe the capacity to change initiative during a dialogue is essential for obtaining the most effective interaction between repeat users and a system. It is our conjecture that being able to vary initiative between dialogues is insufficient, but further study of this issue is needed.</Paragraph> <Paragraph position="3"> After reviewing other empirical studies in the next section, we will address the impact of these results on future research in Section 8.</Paragraph> <Paragraph position="4"> 7. Recent Empirical Studies Relevant to Human-Computer Mixed-Initiative</Paragraph> </Section> <Section position="8" start_page="164" end_page="165" type="sub_section"> <SectionTitle> Dialogue Structure </SectionTitle> <Paragraph position="0"> Danieli and Gerbino (1995) also look at dialogues with an implemented computer system. This system answers user queries about train schedules and services. The focus of the paper is on a few objective and several subjective performance measures of two interaction strategies similar to the directive and declarative modes described in this paper. Their paper concludes that the mode similar to our directive mode is more robust and more likely to succeed, but the mode similar to our declarative mode is faster and less frustrating to experienced users. The general performance results obtained during our testing of the Circuit Fix-It Shop (Section 6.1) lend support to their claim, as 88% of our attempted dialogues in directive mode were completed successfully, compared to 80% in declarative mode, and experimenter interaction (of any kind) occurred only once every 18 user-utterances in directive mode, but once every 6 user-utterances in declarative mode. While their dialogue control algorithms are not identical to ours, their results are complementary, as they show that performance differences as a function of the computer's level of control may be prevalent in database query interactions as well.</Paragraph> <Paragraph position="1"> Guinn (1996) reports on the utility of computer-computer dialogue simulations of the Collaborative Algorithm, an extension of our Missing Axiom Theory (Section 3.1) for modeling dialogue processing. Guinn has implemented the model and run extensive simulations of computer-computer dialogues in order to explore the dynamic Computational Linguistics Volume 23, Number 1 setting of initiative as the dialogue ensues. The model attaches an initiative level to each task goal, and a competency evaluation, based on user model information, is used to decide who should be given the initiative for a given task goal. There is ongoing work in implementing and testing the Collaborative Algorithm in human-computer interactive environments.</Paragraph> </Section> </Section> class="xml-element"></Paper>