File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1007_concl.xml
Size: 4,557 bytes
Last Modified: 2025-10-06 13:54:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1007"> <Title>experiments in natural language generation for intelligent tutoring systems</Title> <Section position="7" start_page="55" end_page="56" type="concl"> <SectionTitle> 4 Discussion and conclusions </SectionTitle> <Paragraph position="0"> Our work touches on three issues: aggregation, evaluation of NLG systems, and the role of NL interfaces for ITSs.</Paragraph> <Paragraph position="1"> In much work on aggregation (Huang and Fiedler, 1996; Horacek, 2002), aggregation rules and heuristics are shown to be plausible, but are not based on any hard evidence. Even where corpus work is used (Dalianis, 1996; Harvey and Carberry, 1998; Shaw, 2002), the results are not completely convincing because we do not know for certain the content to be communicated from which these texts supposedly have been aggregated. Therefore, positing empirically based rules is guesswork at best. Our data collection attempts at providing a more solid empirical base for aggregation rules; we found that tutors exclude significant amounts of factual information, and use high degrees of aggregation based on functionality. As a consequence, while part of our rules implement standard types of aggregation, such as conjunction via shared participants, we also introduced functional aggregation (see conceptual aggregation (Reape and Mellish, 1998)).</Paragraph> <Paragraph position="2"> As regards evaluation, NLG systems have been evaluated e.g. by using human judges to assess the quality of the texts produced (Coch, 1996; Lester and Porter, 1997; Harvey and Carberry, 1998); by comparing the system's performance to that of humans (Yeh and Mellish, 1997); or through task efficacy measures, i.e., measuring how well the users so, and the distribution of topics and of evaluations is too broad to be telling.</Paragraph> <Paragraph position="3"> of the system perform on the task at hand (Young, 1999; Carenini and Moore, 2000; Reiter et al., 2003). The latter kind of studies generally contrast different interventions, i.e. a baseline that does not use NLG and one or more variations obtained by parameterizing the NLG system. However, the evaluation does not focus on a specific component of the NLG process, as we did here for aggregation.</Paragraph> <Paragraph position="4"> Regarding the role of NL interfaces for ITSs, only very recently have the first few results become available, to show that first of all, students do learn when interacting in NL with an ITS (Litman et al., 2004; Graesser et al., 2005). However, there are very few studies like ours, that evaluate specific features of the NL interaction, e.g. see (Litman et al., 2004). In our case, we did find that different features of the NL feedback impact learning. Although we contend that this effect is due to functional aggregation, the feed-back in DIAG-NLP2 changed along other dimensions, mainly using referents of indicators instead of indicators, and being more strongly directive in suggesting what to do next. Of course, we cannot argue that our best NL generator is equivalent to a human tutor - e.g., dividing the number of ConsultRU and ConsultInd reported in Sec. 2.2 by the number of dialogues shows that students ask about 10 ConsultRus and 1.5 ConsultInd per dialogue when interacting with a human, many fewer than those they pose to the ITSs (cf. Table 2) (regrettably we did not administer a PostTest to students in the human data collection). We further discuss the implications of our results for NL interfaces for ITSs in a companion paper (Di Eugenio et al., 2005).</Paragraph> <Paragraph position="5"> The DIAG project has come to a close. We are satisfied that we demonstrated that even not overly sophisticated NL feedback can make a difference; however, the fact that DIAG-NLP2 has the best language and engenders the most learning prompts us to explore more complex language interactions. We are pursuing new exciting directions in a new domain, that of basic data structures and algorithms.</Paragraph> <Paragraph position="6"> We are investigating what distinguishes expert from novice tutors, and we will implement our findings in an ITS that tutors in this domain.</Paragraph> <Paragraph position="7"> Acknowledgments. This work is supported by the Office of Naval Research (awards N00014-99-1-0930 and N00014-001-0640), and in part by the National Science Foundation (award IIS 0133123). We are grateful to CoGenTex Inc. for making EXEMPLARS and RealPro available to us.</Paragraph> </Section> class="xml-element"></Paper>