File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-0502_abstr.xml
Size: 7,623 bytes
Last Modified: 2025-10-06 13:49:02
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0502"> <Title>Langer S. and Hickey M. in preparation. Using Semantic Lexicons for Intelligent Message Retrieval in a Communication Aid. Submitted to Journal of Natural Language Engineering, special issue on Natural Language Processing for Communication</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> The aim of the WordKeys project is to enhance a communication aid with techniques based on research in text retrieval, in order to reduce the cognitive load normally associated with retrieving pre-stored messages in augmentative and alternative communication (AAC) systems. In this paper, the differences between traditional information retrieval and the requirements for text retrieval in a communication aid are highlighted. We then present the over-all design of the retrieval based communication aid, and describe the morphological analysis module used for indexing and the ranking algorithm in more detail. The system relies on a large lexicon for the automatic indexing of messages and for semantic query expansion. The lexicon is derived from the WordNet database and additionally includes frequency information. Currently, user trials are being carried out to determine the suitability of the approach for AAC.</Paragraph> <Paragraph position="1"> 1 Message retrieval for an AAC system Currently, there exist different types of communication aids for non-speaking people. Among the systems using natural language, we distinguish two different approaches. The communication strategy can be based on enhanced message composition, or the user can rely on a set of pre-stored messages, together with a selection procedure. It is the latter type of communication aid that will be discussed further here.</Paragraph> <Paragraph position="2"> A principal deficiency of the current generation of communication aids is the low rate of communication which can be achieved by users. Rates of between 2 and 25 words per minute are typical, which compares poorly to natural speech rates of 150 to 175 words per minute (Foulds, 1980); (Darragh and Witten, 1992). The low communication rate does not encourage either the user of an aid to create messages or a communication partner to maintain attention (Aim et al, 1993). For message selection systems, the low communication rate is partially caused by the fact that many systems rely on retrieval methods that put a high cognitive load on the user. In most systems, the user must remember an access route, or in some cases a code, in order to speak a message. The load placed on the user means that he or she is only able to select from a small number of different things to say.</Paragraph> <Paragraph position="3"> The reduction of the necessary user input to produce an utterance and the minimization of the cognitive the load on the user in a message-based communication aid can be achieved through efficient message access. A novel approach to reach this is the use of full text retrieval to access a message database. Contrary to most existing message based system, in an AAC system based on text retrieval, in order to select a message, the users do not have to remember any message numbers or another code. They can select a conversational item from the database by entering one or several key words. Appropriate messages will be those containing these words or words related to the key words (Hickey and Page, 1993); (Hickey, 1995).</Paragraph> <Paragraph position="4"> At a first glance, the implementation of a text retrieval system for AAC users might seem straightforward, as retrieval techniques have been investigated for decades. However, most algorithms suggested in the literature are designed for collections of larger documents, containing several hundreds of words.</Paragraph> <Paragraph position="5"> Little research has been dedicated to the investigation of full text retrieval of short messages such as those used in communication aids. Thus techniques from information retrieval have to be modified considerably to be applicable to the messages communicated by AAC users, which typically contain not more than 20 words. In addition to the difference in length of the messages to be accessed, there is another constraint that affects communication aids to a much higher degree than standard text retrieval systems -- the minimal input requirement. In standard text retrieval, queries of 5-10 words are regarded as short queries (Hearst, 1996)..This is different for a communication aid. Users of these devices typically have a very low typing rate, and it is desirable that any message from the message database can be retrieved by only one key word, without the need for query refinement.</Paragraph> <Paragraph position="6"> The state of the art and the named special requirement for a retrieval module in an AAC device suggest the use of enhanced full text retrieval using semantic expansion of queries. A system based on a query expansion technique has the capability of finding messages that contain words that are semantically related to the query words in addition to the messages that contain the query words themselves. Semantic query expansion is especially suited for communication aids, where minimal input and high recall are the key factors. Research in text retrieval has shown that it looks promising to further investigate the use of electronic semantic lexicons both for query expansion and in order to overcome problems of word sense ambiguity (Richardson and Smeaton, 1995). Especially relating to short text, research on image caption retrieval has shown that the recall rate can be considerably higher, if suitable methods of calculating semantic distances between query words and message words are used (Smeaton and Quigley, 1996); (Guglielmo and Rowe, 1996).</Paragraph> <Paragraph position="7"> The measurement of semantic distance can be based on semantic relationship between words. The relationship encoded in many dictionaries and thesauri is synonymy, and often some hypernyms are also included. Both kind of links are relevant for message retrieval. It has been shown that apart from * synonyms, which have been used for query expansions for decades, hyponymic links should be considered for text retrieval purposes (Richardson and Smeaton, 1995). The usefulness of hyponymic links has also been evaluated for WordKeys (Langer and Hickey, in preparation). The usefulness of other links, such as meronymy, has yet to be confirmed.</Paragraph> <Paragraph position="8"> For semantic query expansion through semantically related words, a comprehensive electronic dictionary containing extensive semantic information is needed. Research in electronic lexicography has been very intense during the last years, and many large dictionaries are being built for different languages. Few of those dictionaries, however, are publicly available; and few of those available are suitable for retrieval of unrestricted text. The semantic database WordNet (Miller et al, 1990) has already been successfully used for information retrieval purposes (Richardson and Smeaton, 1995); (Smeaton and Quigley, 1996), and has also been a source for the design of another lexical database for AAC systems, which, like the lexicon used for WordKeys, included additional frequency information (Zickus et al, 1995). The size and coverage of WordNet led to the decision to base the indexing module and the semantic expansion in the WordKeys system on this lexical database.</Paragraph> </Section> class="xml-element"></Paper>