File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/89/e89-1039_evalu.xml
Size: 14,722 bytes
Last Modified: 2025-10-06 14:00:00
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1039"> <Title>EMPIRICAL STUDIES OF DISCOURSE REPRESENTATIONS FOR NATURAL LANGUAGE INTERFACES</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> RESULTS AND DISCUSSION </SectionTitle> <Paragraph position="0"> There are 1047 utterances in our corpus. Of these, 38% are Initiatives, 48% Responses, 10% Resp/lnit, and 4% Clarifications. Table 1 and 2 in the appendix summarize some of our results. 58% of the Initiatives are Context Independent, i.e. utterances that can be interpreted in isolation.</Paragraph> <Paragraph position="1"> However, of these about 10% are dialogue openings. This means that only 48% of the Initiatives within the dialogues can be interpreted in isolation. null Context Dependencies The complete set of data concerning the number of context dependent utterances and the distribution of different types of context dependency are presented in the appendix. While we believe that the data presented here give a correct overall picture of the qualities of the language used in the dialogues, the previously mentioned caveat conceming the theory dependency of the data, especially as regards ellipsis and definite descriptions, should be kept in mind. We will for the same reasons in this paper concentrate our discussion on the usage of pronouns in the dialogues.The number of Context Dependent utterances are 167 or 42%. Thus, when the users are given the opportunity to use connected discourse, they will w even when the response times (as in our case) occasionally seem slow.</Paragraph> <Paragraph position="2"> The most common forms of indexicality are ellipsis (64%) and definite descriptions (29%).</Paragraph> <Paragraph position="3"> The use of pronouns is relatively rare, only 16%.</Paragraph> <Paragraph position="4"> The limited use of pronouns is not something found exclusively in our corpus. Similar results were found by Guindon et al (1987), where only 3% of the utterances contained any pronouns.</Paragraph> <Paragraph position="5"> While being to small an empirical base for any conclusive results, this does suggest that the use of pronouns are rare in typed man-computer dialogues in natural language. Some suggestions why this should be the case can be found in a study by Bosch (1988) on the use of pronouns in spoken dialogues. He argues for a a division of the focus structure into two parts, explicit and implicit, and claims that &quot;explicit focus is typically, though not exclusively, accessed by means of unmarked referential expressions (typically de-accented anaphoric pronouns), while implicit pronouns focus is accessed only by marked devices, including accented pronouns&quot;(Bosch, 1988, p 207). What is interesting with this analysis in the present context, is that para-linguistic cues (accent) is used to signal how the pronoun should be interpreted. Since this communicative device is absent in written dialogues, this could explain why the subjects refrain from using pronouns. null We believe this to be an expression of a general principle for the use of pronouns. Since a pronoun underspecifies the referent compared to a definite description, there is every reason to believe that language users following Grice's (1975) cooperative principle should only use them when the listener/reader effortlessly can identify the intended referent. This is supported by data from Fraurud (1988), who analyzed the use of pronouns in three different types of unrestricted written Swedish text. She showed that for 91% of the 457 singular pronouns a very simple algorithm using only syntactical information could correctly identify the antecedent, which in 97.4% of the cases were found in the same or preceding sentence. Similar results have also been obtained by Hobbs (1978).</Paragraph> <Paragraph position="6"> We obtained results similar to those of Fraumd (1988) as regards the distance between the pronoun and its antecedent. All our antecedents where found in the immediate linguistic context, except for one problematic category, the pronoun man (one/you), excluded in her study which often refers to some global context, e.g. C line:5:lO Does one read mechanics \[Ldser man mekanik\].</Paragraph> <Paragraph position="7"> We will by no means conclude from this that it is a simple task to develop a computational discourse representation for handling pronouns. As pointed out by Shuster (1988), it is often unclear whether a pronoun refers to the whole or parts of a previously mentioned event or action. While this underspecification in most cases seems to - 294present no problems for human dialogue participants, it certainly makes the computational management of such utterances a non-trivial task.</Paragraph> <Paragraph position="8"> Task structure The results concerning task structure are interesting. It is perhaps not too surprising that the task structure in a data base application is simple. Here one task is introduced, treated, finished, and dropped; and then another is introduced. A basically similar pattern is found in the advisory systems.</Paragraph> <Paragraph position="9"> The advisory-and-order systems, however, shows a completely different picture. These systems are in an important sense more complicated, since two different types of actions can be performed; obtaining information or advice, and ordering. The collected dialogues show that these two tasks are executed in parallel, or rather that they are intertwined. The consequence is that we have two active tasks at the same time. For instance, in the HIFI simulations the interlocutors shift rapidly between discussing the ordered equipment, its total price, etc, and discussing technical information about available equipment.</Paragraph> <Paragraph position="10"> 7% of the initiatives are task shifts in this sense. The problem is, that while it presents no difficulty for the human reader to follow these task shifts, it is difficult to find any surface cues indicating them. The computational mechanisms for handling this type of dialogue will therefore presumably be more complex than for the other applications that we have studied. In our opinion this confirms Grosz' (1977) observation that there are different types of dialogues with different task structure. It also indicates that categories such as data base and expert systems are not always the most relevant when discussing application areas for NL-techniques.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> System initiatives </SectionTitle> <Paragraph position="0"> The system's linguistic behaviour seems to influence the language used by the user in an important sense. The utterance type Resp/Init refleets how often the system not only responds to an initiative, but also initiates a new information request. This is used more frequently in three simulations. This ought to result in the number of Context Dependent initiatives being lower than in the other dialogues, because the user has here already provided all the information needed. This hypothesis is corroborated in two of the three simulations (PUB and Wines). They have 17% respective 29% context dependent initiatives compared to the average of 42%. (We do not tag whether a response is context dependent or not.) The result is interesting, because it indicates that this is a way of 'forcing' the user to use a language which is computationally simpler to bandie, without decreasing the habitability of the system, as measured in the post-experimental interviews.</Paragraph> <Paragraph position="1"> As mentioned above, this pattern is not found in the third system, the travel advisory system.</Paragraph> <Paragraph position="2"> This system belongs to the advisory-and-order class. We cannot at present explain this difference, but would still claim that the result obtained is interesting enough to deserve a thorough follow-up, since databases and advisory systems presently are the largest potential application areas for NLIs.</Paragraph> <Paragraph position="3"> Indirect speech acts Indirect speech acts (Searle, 1975) have been one of the active areas of research in computational linguistics. It can perhaps be of interest to note that there are only five indirect speech acts in our corpus, all of which use standardized expressions (Can you tell me ...? etc). Beun and Bunt (1987) found a higher frequency of indirect requests in their corpus of terminal dialogues (15%). However, this frequency was considerably lower than in their control condition of telephone dialogues (42%). Taken together, these results seems to support our belief that some of the reasons for using indirect means of expression does not exist in man-computer dialogues in natural language (c.f. Dahlb~lck and JOnsson, 1986).</Paragraph> <Paragraph position="4"> The lack of variation in the expression of indirect speech acts is perlaaps not all that surprising when viewed in the light of psychological research on their use. Clark (1979) expanded Searle's (1975) analysis by distinguishing between convention of means and convention of forms for indirect speech acts; the former covers Searle's analysis in terms of felicity conditions and reasons for performing an action, the latter the fact that can you open the window? is a conventional form for making an indirect request, whereas Is it possible for you to open the window? is not. Gibbs (1981, 1985) demonstrated then that what counts as a conventional form is dependent on the situational context in which it occurs. There is therefore in our opinion good reasons to believe that indirect speech acts can be handled by computational methods simpler than those developed by Perrault and co-workers, something which in fact seems compatible with the discussion in Perrault and Allen (1980). In conclusion, we believe that indirect speech acts are not as frequent in man-computer dialogues as ~-U - 295 in human dialogues, and that most of them use a small number of conventional forms which suggests that computationally tractable and cost-effective means of handling them can be found. Task and dialogue structure When developing N-L-technology, it is important to try to assess the applicability domain of a system. As mentioned above, the major dividing line between different classes of systems in our corpus seems not to be between database and expert (advisory) systems. But there are important differences between these and the third class used in this study, the advisory-and-order systems. In these cases more than one task can be performed, asking for information and giving an order. This means not only that the discourse representation needs to be more complicated, which in turn causes problems when trying to find the referent of referring expressions, but that it becomes neeessary to understand the iUocutionary force of the utterance. As was shown in the Planes system (Waltz 1978) when all the user can do with the system is to request information, all input can be treated as questions, thus simplifying the analysis of the input considerably. But this is of course not possible in these cases. The problem this causes becomes especially clear in dialogues where the user follows Grice's quantitative maxim as much as possible, something which occurs in some of our HiFi dialogues, where one or two word utterances are very common. From a communicative point of view this is a very natural strategymif one is engaged in an information seeking dialogue sequence requesting information about the price of different tuners, there is no need to say anything more than the name of one of them, i.e. specify the referent, but taking the illocutionary force and the predicate to be given.</Paragraph> <Paragraph position="5"> And when one is satisfied with the information, and wants to order the last one, why say something more than order, i.e. only specify the illocutionary force? What makes this problematic is of course that in some cases what is ordered is not only the last mentioned item, but a number of them, namely the set defined by the last mentioned tuner, amplifier, turn-table and loudspeakers. But realizing this requires knowledge of what constitutes as HiFi set.</Paragraph> <Paragraph position="6"> Without pursuing the examples further, we wish to make two comments on this. The first is that delimiting the classes or subsets for which NL-technology with different capabilities are suitable seems to depend more on the task situation than on the computer technology of the background system. The second is that since the communicative behaviour described in the previous section can be seen to be in accordance with established theories of dialogue communication, and since it, in spite of the terseness of the utterances, seems to present no problems to the human dialogue participants, it seems somewhat strange to classify such utterances as ill-formed or in other ways deviant, something which is not uncommon. Chapanis (1981, p 106) claims that &quot;natural human communication is extremely unruly and often seems to follow few grammatical, syntactic and semantic rules&quot;. And Hauptman and Rudnicky (1987, p 21) takes this to be supported by Grosz (1977) &quot;whose protocols show incomplete sentences, ungrammatical style, ellipsis, fragments and clarifying subdialogues&quot;. Perhaps these examples demonstrate an extreme form of the written language bias, but in our opinion any analysis showing that a large part of a communicative event breaks the rules of communication should lead to a questioning of the validity of the formulated rules. Perhaps present day analysis of the structure of language in dialogues (including our own) is too much influenced of the traditional linguistic analysis of isolated utterances, and a shift of perspective is required for a breakthrough in this area.</Paragraph> <Paragraph position="7"> A FINAL REMARK As can be seen in the tables in the appendix, there are differences between the different background systems, for instance the use of pronouns in the PUB dialogues is as frequent as the use of ellipsis, while Wines have no pronouns. There are also differences between different users, ranging from very condensed one word phrases to small essays on two to three lines. This indicates that when designing a NLI for a specific application it is important to run simulations, preferably with the real end users (cf. Kelly 1983 * and Good et al 1984). We intend to proceed in that direction and develop a method for design and customization of NLI's based on Wizard of Oz experiments.</Paragraph> </Section> </Section> class="xml-element"></Paper>