File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/p00-1020_metho.xml
Size: 15,906 bytes
Last Modified: 2025-10-06 14:07:15
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1020"> <Title>An Empirical Study of the Influence of Argument Conciseness on Argument Effectiveness</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The evaluation framework </SectionTitle> <Paragraph position="0"> In order to evaluate different aspects of the argument generator, we have developed an evaluation framework based on the task efficacy evaluation method. This method allows Figure 4 The evaluation framework architecture the experimenter to evaluate a generation model by measuring the effects of its output on user's behaviors, beliefs and attitudes in the context of a task.</Paragraph> <Paragraph position="1"> Aiming at general results, we chose a rather basic and frequent task that has been extensively studied in decision analysis: the selection of a subset of preferred objects (e.g., houses) out of a set of possible alternatives. In the evaluation framework that we have developed, the user performs this task by using a computer environment (shown in Figure 5) that supports interactive data exploration and analysis (IDEA) (Roth, Chuah et al. 1997). The IDEA environment provides the user with a set of powerful visualization and direct manipulation techniques that facilitate the user's autonomous exploration of the set of alternatives and the selection of the preferred alternatives.</Paragraph> <Paragraph position="2"> Let's examine now how an argument generator can be evaluated in the context of the selection task, by going through the architecture of the evaluation framework.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The evaluation framework architecture </SectionTitle> <Paragraph position="0"> Figure 4 shows the architecture of the evaluation framework. The framework consists of three main sub-systems: the IDEA system, a User Model Refiner and the Argument Generator. The framework assumes that a model of the user's preferences (an AMVF) has been previously acquired from the user, to assure a reliable initial model.</Paragraph> <Paragraph position="1"> At the onset, the user is assigned the task to select from the dataset the four most preferred alternatives and to place them in a Hot List (see Figure 5, upper right corner) ordered by preference. The IDEA system supports the user in this task (Figure 4 (1)). As the interaction unfolds, all user actions are monitored and collected in the User's Action History (Figure 4 (2a)). Whenever the user feels that the task is accomplished, the ordered list of preferred alternatives is saved as her Preliminary Decision (Figure 4 (2b)). After that, this list, the User's Action History and the initial Model of User's Preferences are analysed by the User Model Refiner (Figure 4 (3)) to produce a Refined Model of the User's Preferences (Figure 4 (4)).</Paragraph> <Paragraph position="2"> At this point, the stage is set for argument generation. Given the Refined Model of the</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> User's Preferences, the Argument Generator </SectionTitle> <Paragraph position="0"> produces an evaluative argument tailored to the model (Figure 4 (5-6)), which is presented to the user by the IDEA system (Figure 4 (7)).The argument goal is to introduce a new alternative (not included in the dataset initially presented to the user) and to persuade the user that the alternative is worth being considered. The new alternative is designed on the fly to be preferable for the user given her preference model.</Paragraph> <Paragraph position="1"> All the information about the new alternative is also presented graphically. Once the argument is presented, the user may (a) decide immediately to introduce the new alternative in her Hot List, or (b) decide to further explore the dataset, possibly making changes to the Hot List adding the new instance to the Hot List, or (c) do nothing. Figure 5 shows the display at the end of the interaction, when the user, after reading the argument, has decided to introduce the new alternative in the Hot List first position (Figure 5, top right).</Paragraph> <Paragraph position="2"> Whenever the user decides to stop exploring and is satisfied with her final selections, measures related to argument's effectiveness can be assessed (Figure 4 (8)). These measures are obtained either from the record of the user interaction with the system or from user self-reports in a final questionnaire (see Figure 6 for an example of self-report) and include: - Measures of behavioral intentions and attitude change: (a) whether or not the user adopts the new proposed alternative, (b) in which position in the Hot List she places it and (c) how much she likes the new alternative and the other objects in the Hot List.</Paragraph> <Paragraph position="3"> - A measure of the user's confidence that she has selected the best for her in the set of alternatives. - A measure of argument effectiveness derived by explicitly questioning the user at the end of the interaction about the rationale for her decision (Olso and Zanna 1991). This can provide valuable information on what aspects of the argument were more influential (i.e., better understood and accepted by the user).</Paragraph> <Paragraph position="4"> - An additional measure of argument effectiveness is to explicitly ask the user at the end of the interaction to judge the argument with respect to several dimensions of quality, such as content, organization, writing style and judgements along these dimensions are clearly weaker than evaluations measuring actual behavioural and attitudinal changes (Olso and Zanna 1991).</Paragraph> <Paragraph position="5"> To summarize, the evaluation framework just described supports users in performing a realistic task at their own pace by interacting with an IDEA system. In the context of this task, an evaluative argument is generated and measurements related to its effectiveness can be performed.</Paragraph> <Paragraph position="6"> We now discuss an experiment that we have performed within the evaluation framework</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The Experiment </SectionTitle> <Paragraph position="0"> The argument generator has been designed to facilitate testing the effectiveness of different aspects of the generation process. The experimenter can easily control whether the generator tailors the argument to the current user, the degree of conciseness of the argument (by varying k as explained in Section 2.3), and what microplanning tasks the generator performs. In the experiment described here, we focused on studying the influence of argument conciseness on argument effectiveness. A parallel experiment about the influence of tailoring is described elsewhere.</Paragraph> <Paragraph position="1"> We followed a between-subjects design with three experimental conditions: No-Argument - subjects are simply informed that a new house came on the market.</Paragraph> <Paragraph position="2"> Tailored-Concise - subjects are presented with an evaluation of the new house tailored to their preferences and at a level of conciseness that we hypothesize to be optimal. To start our investigation, we assume that an effective argument (in our domain) should contain slightly more than half of the available evidence. By running the generator with different values for k on the user models of the pilot subjects, we found that this corresponds to k=-0.3. In fact, with k=-0.3 the arguments contained on average 10 pieces of evidence out of the 19 available.</Paragraph> <Paragraph position="3"> Tailored-Verbose - subjects are presented with an evaluation of the new house tailored to their preferences, but at a level of conciseness that we hypothesize to be too low (k=-1, which corresponds on average, in our analysis of the pilot subjects, to 16 pieces of evidence out of the possible 19).</Paragraph> <Paragraph position="4"> In the three conditions, all the information about the new house is also presented graphically, so that no information is hidden from the subject.</Paragraph> <Paragraph position="5"> Our hypotheses on the outcomes of the experiment are summarized in Figure 7. We expect arguments generated for the Tailored-Concise condition to be more effective than arguments generated for the Tailored-Verbose condition. We also expect the Tailored-Concise condition to be somewhat better than the No-Argument condition, but to a lesser extent, because subjects, in the absence of any argument, may spend more time further exploring the dataset, thus reaching a more informed and balanced decision. Finally, we do not have strong hypotheses on comparisons of argument effectiveness between the No-Argument and Tailored-Verbose conditions.</Paragraph> <Paragraph position="6"> The experiment is organized in two phases. In the first phase, the subject fills out a questionnaire on the Web. The questionnaire implements a method form decision theory to acquire an AMVF model of the subject's preferences (Edwards and Barron 1994). In the second phase of the experiment, to control for possible confounding variables (including subject's argumentativeness (Infante and Rancer 1982), need for cognition (Cacioppo, Petty et al. 1983), intelligence and self-esteem), the subject The more you like the house the closer you should put a cross to &quot;good choice&quot; satisfaction with houses in the Hot List3 is randomly assigned to one of the three conditions.</Paragraph> <Paragraph position="7"> Then, the subject interacts with the evaluation framework and at the end of the interaction measures of the argument effectiveness are collected, as described in Section 3.1.</Paragraph> <Paragraph position="8"> After running the experiment with 8 pilot subjects to refine and improve the experimental procedure, we ran a formal experiment involving 30 subjects, 10 in each experimental condition.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Experiment Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 A precise measure of satisfaction </SectionTitle> <Paragraph position="0"> According to literature on persuasion, the most important measures of arguments effectiveness are the ones of behavioral intentions and attitude change. As explained in Section 3.1, in our framework such measures include (a) whether or not the user adopts the new proposed alternative, (b) in which position in the Hot List she places it, (c) how much she likes the proposed new alternative and the other objects in the Hot List. Measures (a) and (b) are obtained from the record of the user interaction with the system, whereas measures in (c) are obtained from user self-reports.</Paragraph> <Paragraph position="1"> A closer analysis of the above measures indicates that the measures in (c) are simply a more precise version of measures (a) and (b). In fact, not only they assess the same information as measures (a) and (b), namely a preference ranking among the new alternative and the objects in the Hot List, but they also offer two additional critical advantages: 3 If the subject does not adopt the new house, she is asked to express her satisfaction with the new house in an additional self-report.</Paragraph> <Paragraph position="2"> (i) Self-reports allow a subject to express differences in satisfaction more precisely than by ranking. For instance, in the self-report shown in Figure 8, the subject was able to specify that the first house in the Hot List was only one space (unit of satisfaction) better then the house preceding it in the ranking, while the third house was two spaces better than the house preceding it.</Paragraph> <Paragraph position="3"> (ii) Self-reports do not force subjects to express a total order between the houses. For instance, in Figure 8 the subject was allowed to express that the second and the third house in the Hot List were equally good for her.</Paragraph> <Paragraph position="4"> Furthermore, measures of satisfaction obtained through self-reports can be combined in a single, statistically sound measure that concisely express how much the subject liked the new house with respect to the other houses in the Hot List. This measure is the z-score of the subject's self-reported satisfaction with the new house, with respect to the self-reported satisfaction with the houses in the Hot List. A z-score is a normalized distance in standard deviation units of a measure xi from the mean of a population X. Formally: xi[?] X; z-score( xi ,X) = [xi - u (X)] / s(X) For instance, the satisfaction z-score for the new instance, given the sample self-reports shown in</Paragraph> <Paragraph position="6"> The satisfaction z-score precisely and concisely integrates all the measures of behavioral intentions and attitude change. We have used satisfaction z-scores as our primary measure of argument effectiveness.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5.2 Results </SectionTitle> <Paragraph position="0"> As shown in Figure 9, the satisfaction z-scores obtained in the experiment confirmed our hypotheses. Arguments generated for the Tailored-Concise condition were significantly more effective than arguments generated for Tailored-Verbose condition. The Tailored-Concise condition was also significantly better than the No-Argument condition, but to a lesser extent. Logs of the interactions suggest that this happened because subjects in the No-Argument condition spent significantly more time further exploring the dataset. Finally, there was no significant difference in argument effectiveness a) How would you judge the houses in your Hot List? The more you like the house the closer you should put a cross to &quot;good choice&quot; bad choice : __:__:__:__ :X :__:__:__:__: good choice Figure 9 Results for satisfaction z-scores. The average z-scores for the three conditions are shown in the grey boxes and the p-values are reported beside the links between the No-Argument and Tailored-Verbose conditions.</Paragraph> <Paragraph position="1"> With respect to the other measures of argument effectiveness mentioned in Section 3.1, we have not found any significant differences among the experimental conditions.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> Argumentation theory indicates that effective arguments should be concise, presenting only pertinent and cogent information. However, argumentation theory does not tell us what is the most effective degree of conciseness. As a preliminary attempt to answer this question for evaluative arguments, we have compared in a formal experiment the effectiveness of arguments generated by our argument generator at two different levels of conciseness. The experiment results show that arguments generated at the more concise level are significantly better than arguments generated at the more verbose level. However, further experiments are needed to determine what is the optimal level of conciseness.</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> Acknowledgements </SectionTitle> <Paragraph position="0"> Our thanks go to the members of the Autobrief project: S. Roth, N. Green, S. Kerpedjiev and J.</Paragraph> <Paragraph position="1"> Mattis. We also thank C. Conati for comments on drafts of this paper. This work was supported by grant number DAA-1593K0005 from the</Paragraph> </Section> class="xml-element"></Paper>