File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/82/c82-2069_concl.xml
Size: 2,466 bytes
Last Modified: 2025-10-06 13:55:58
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-2069"> <Title>TOPIC IDENTIFICATION TECHNIQUES YOR PREDICTIVE LANGUAGE ANALYSERS /</Title> <Section position="3" start_page="0" end_page="0" type="concl"> <SectionTitle> 5J Conclusions </SectionTitle> <Paragraph position="0"> The technique described in this paper for the identification of the topic of a text section has a number of advantaKe8 over previous schemes. First, its use of information which will probably already be stored in the natural language processing system's lexicon has obvious advantages over schemes which require large, separate data-structures purely for topic identification, as well as for making the predictions ussoolated with a topicdeg In practice, Scrabble uses a slightly doctored lexicon to improve efficiency, but the necessary work could be done by an automatic proprooess~Lng of the lexicon.</Paragraph> <Paragraph position="1"> Second, the scheme described here can make use of nominal8 which suggest a candidate topic, and associated stereotypes, without complex ma~tpulation of semantic information which Is not useful for thAs purpose. The scheme of - 284 (DeJong 79), for example, would perform complex opeZattons on semantic representations associated with &quot;pick&quot; before It processed the more useful word &quot;tuna&quot; if It processed the above example text.</Paragraph> <Paragraph position="2"> Third the use of semantl.o primitive patterns has great- , er generality than techniques which set up direct links between words and bundles of predictions, as appeared to be done in early versions of the SAM program (Sohank 75a).</Paragraph> <Paragraph position="3"> One final point. The technique for topic identification in this paper would not be practical either if it was very expensive to load stereotypes which turn out to be Irrelevant, or if the cost of comparing the predictions of such stereotypes with the text representation was high. The Scr~bble system, running under Cambridge LISP on an IBM 370/165 took 8770 milliseconds to analyse the example text above of which 756 milliseconds was used by loading and activatiDg the two irrelevant stereotypes and 103 milliseconds was spent comparing their predictions with the CD-x~presentation of the text. The system design is such that these figures would not increase drematically if more stereotypes were considered whilst processing the example.</Paragraph> </Section> class="xml-element"></Paper>