File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1072_metho.xml
Size: 14,692 bytes
Last Modified: 2025-10-06 14:13:36
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1072"> <Title>and DISCO -- An HPSG-based NLP System its Application for Appointment Scheduling -- Project Note --</Title> <Section position="4" start_page="436" end_page="437" type="metho"> <SectionTitle> 3 Linguistic Resources </SectionTitle> <Paragraph position="0"> Tile core of tile linguistic resources consists of a two-level morphology with feature constraints, all Ill)S(\] oriented grammar of German with integrated syntax and semantics, and a module for surface speech act recognition, all implemented ill 7&quot;D/2.</Paragraph> <Paragraph position="1"> Morl)hology The component X2MorP, analyzing and generating word forms, is based on at two-level morphology which is extended by a word-lbrmation gralnnrar (described in 7&quot;l)1;) lbr handling the concatenative llarts of morl)hosyntax \[15\].</Paragraph> <Paragraph position="2"> Grammar The style of the grammar closely tbllows tile spirit of HPSG, but also incorporates insights fi'om other grammar frameworks (e.g. catcgorial grammar) and further extensions to the theory \[12\].</Paragraph> <Paragraph position="3"> The grammar distingnishes various types of linguistic objects, such as lexical entries, phrase structure schmnata, lexical rules, multi-word lexemes etc., all of which are sl)e<:ified as tyl)ed \[~ature stru<:tures. LexicaI rules are defined as unary ruh!s ;tnd al)plied at runti,ne. Multi-word lexelnes are eoml)lex lexelnes with a non-compositional semantics, such as tixed idiomatic expressions, lIPS(', I~rincil)les and constraints are rel)resented by inheritance links ill tile type lattit:e. The grammar covers :t fair nmnber of the standard constructions of German, and exhibits a more detailed coverage ill some sl)ecilic application oriented are~.</Paragraph> <Paragraph position="4"> Semantics Feature strncture descriptions of the se.mantic contribution of linguistic items arc represented in &quot;I&quot;1)1; and are fully integrated into tile grammar. Additionally, the T'DPS type system is used to encode and check sortal constraints as they occur in selcctional restrictions. For furl.her 1)rocessing such as scope normalization and anaphora resolution, inferences and al)plication dependent interpretation, the (initial) TDPS semantic descriptions arc translated into A/'1;1; fornndae.</Paragraph> <Paragraph position="5"> Sl)eech Act Recognition and Dialogue The grammar provides a typed interface to a speech act recognition module based on IIPSG representations of utterances. Tim assignments of illocutionary force take into account syntactic features, a marking of performative verbs and assignments of fixed illoeutionary force to relevant idiomatic expressions.</Paragraph> <Paragraph position="6"> Recently inference-based dialogue facilities using a quasi-modal logic for multiagent belief and goal attribution \[5\] have been added to the system, incoming surface speech act structures are subjected to anaphora and reference resolution, translated into a frame-based action representation, and disambignated using inferential context. The effects, including communicated beliefs and goals, of the first acceptable speech act interpretation are then asserted.</Paragraph> </Section> <Section position="5" start_page="437" end_page="437" type="metho"> <SectionTitle> 4 Processing components </SectionTitle> <Paragraph position="0"> Parser and generator provide the basic processing flmctionality needed for grammar development and sample applications. In addition to the separate modules for parsing and generation, we also experiment with a uniform reversible processing module based on generalized Earley deduction.</Paragraph> <Paragraph position="1"> Parser The parser is a bidirectional bottom-up chart parser which operates on a context-free backbone implicitly contained in the grammar \[6\]. The parser can be parameterized according to various processing strategies (e.g. breadth first, preference of certain rules etc.). Moreover, it is possible to specify the processing order for the daughters of individual rules.</Paragraph> <Paragraph position="2"> An elaborate statistics component supports the grammar developer in tuning these control strategies.</Paragraph> <Paragraph position="3"> In addition, the parser provides the facility to filter out useless t~ks, i.e. tasks where a rule application can be predicted to fail by a cheaper mechanism than nnification. There is a facility to precompute a filter automatically by determining the possible and impossible combinations of rules; some additional filtering information is hand-coded.</Paragraph> <Paragraph position="4"> The parser is implemented in an object-oriented manner to allow for different parser classes using different constraint solving mechanisms or different, parser instances using different parsing strategies in the same system. With differing parameter settings instances of the parser module are used in the X2MorF and surface speech act recognition modules ,-us well.</Paragraph> <Paragraph position="5"> Generator Surface generation in DISCO iS performed with the SeReal (Sentence Realization) system \[4\], which is t)ased on the semantie-head-drivell algorithm by Shieber et al. SeReal takes a TDPS semantic sentence representation ms its input and can deliver all derivations for the input admitted by the grammar. Efficient lexieal access is achieved by having the lexicon indexed according to semantic predicates.</Paragraph> <Paragraph position="6"> Each index is associated with a small set of lemmata containing the semantic predicate. Using the same indexing scheme at run-time for lexical access allows us to restrict unification tests to a few lexical items.</Paragraph> <Paragraph position="7"> Subsumption-based methods for lexieal access were considered too expensive for dealing with distributed disjunctions. The grammar nsed for generation is the sanre as the one used for parsing except for some compilation steps performed by SeReal that, among other things, introduce suitable information wherever 'semantically empty' items are referred to. Rule application is restricted by rule accessibility tables which are computed off-line.</Paragraph> </Section> <Section position="6" start_page="437" end_page="437" type="metho"> <SectionTitle> 5 Performance Modelling </SectionTitle> <Paragraph position="0"> hr our search for nrethods that gel; us from the transparent and extensible competence grammar to el\[icient and robust performance systems we have been following several leads in paralM, We assume that methods for compilation, control and learning need to be investigated. The best combination of these methods will depend ou the specific application, hr the following some initial results of our efforts are summarized. null Acquisition of Sublanguages by EBL it is a matter of common experience that different domains make different demands on the grammar. This observation has given rise to the notion of sublangnage; efficient processing is achieved by the exploitation of restricted language use in well specified domains.</Paragraph> <Paragraph position="1"> In the DISCO system we have integrated such an approach based on Explanation-Based Learning (1~;13I,) \[14\]. The idea is to generalize the derivations of training instances created by normal parsing automatically and to use these generalized derivations (called telnplates) in the n,n-time mode of the system. If a template can be instantiated for a new input, no further grammatical analysis is necessary. The approach is not restricted to the senl;ential level but can also be applied to arbitrary subsentential phrases, allowing it to interleave with normal processing.</Paragraph> <Paragraph position="2"> Intelligent Backtracldng in Processing Disjunctions In \[16\] a method is outlined lbr controlling the order in which eonjnnets and disjmmts are to be processed. The ordering of disimmts is useful when the syste.m is supposed to find only the best result(s), which is the case for any reasonably practical NL application. An extension of NDi,/V'e has been implemented that exploits distributed disjunctions for preference-based backtracking.</Paragraph> <Paragraph position="3"> Compilation of IIPSG into Lexicallzed TAG \[7\] describes an approach for compiling fIPSG into lexicalized feature-based TAG. Besides our hope to achieve more efficient processing, we want to gain a better understanding of the correlation between 1 \[PSG and TAG. The compilation algorithm has l)een intplelneute(\[ and (:overs almost all constrtlctions containe(\] ill our IIPS(\] granltnar.</Paragraph> </Section> <Section position="7" start_page="437" end_page="438" type="metho"> <SectionTitle> 6 Environment </SectionTitle> <Paragraph position="0"> The DISCO I)EVELOPMI~NT SIIELI, serves as the basic architectural platform for the integration of natural language components in the DISCO core system, as well as for the CosMA application system \[13\]. Following an object oriented architectural model we followed a two-step approach, where in the first; step the architecture is developed independently of specific components to be used and of a partienlar flow of control.</Paragraph> <Paragraph position="1"> In tim second phase tl,e resulting 'frame system' is instantiated by the iutegration of e.xisting components and by defining the particular llow of control between these components. Using an object-oriented design together with multiple inheritance has been shown fruit- null ful for ttle system's modifiability, extensibility and incremental usability ........</Paragraph> <Paragraph position="2"> Several editing and visualization tools greatly facilitate the work of the grammar developer. The most prominent of them, FEGRAMEDj provides the user with a fully interactive feature editor and viewer. There are many possibilities to customize tile view onto a feature strncture, such ,as hiding certain features or parts of a structure, specifying the feature order and many more. The large feature structures emerging in the process of constraint based formalisms make such a tool absolutely indispensable for grammar de.bugging. Main goals of the development of FI~GltAMI.'D were high portability and interfacing to different systems. Written in ANSI-C, it exists in Macintosh and OSF/Motif versions and is already used at several external sites.</Paragraph> <Paragraph position="3"> There exists a graphical chart display with mouse-sensitive chart nodes and edges directly linked to the feature viewer, thus making del)ugging much sinq)ler.</Paragraph> <Paragraph position="4"> It also provides a view of the running parser and enables you to inspect the effects of the chosen parsing strategy visually. A browser for the 7&quot;DE type system permits navigation through a type lattice and is coupled with tl,e feature editor. There are other tools ms well, e.g., a 77)PS2I#TEXutility, an EMACS TDPS mode, global switches which affect tile I)ehaviour of the whole system etc.</Paragraph> <Paragraph position="5"> The diagnostics tool (DiTo) \[11\] containing close to 1500 annotated diagnostic sentences of German fimilitates consistency maintenance and measuring of con> petenee. The tool ha.s been lmrted to several sites that participate in extending the test-sentence database.</Paragraph> <Paragraph position="6"> 7 Putting it to the Test Cooperative Schedule Mauagement In building the COSMA prototype the DISCO core system has been successrully integrated into an application domain with both scientific interest and practical plausibility, viz. multi-agent appointment scheduling (see Figure 1). Understanding and sending messages in natural language is crucial for tl,is application since it cannot be expected that all participants will haw~ a COSMA system. Tile use of natural hmguage also makes it easier for the owner of the system to Inonitor the progress of an appointment scheduling process. Each COSMA instance functions as a personal secretarial assistant providing the following services: (i) storage and organization of a personal aplmint ment date-book; (ii) graphical display and manil)ulation of appointment data; and (iii) natural language understanding and generation in communication with other agents via electronic mail. The current scheduling flmctionality includes the arrangement of multiparticipant meetings (possibly with vague or taMerspecified details) as well a~s the modification and cancellation of appointments that are under arrangement or have already been committed to.</Paragraph> <Paragraph position="7"> Accordingly, the current COSMA architecture h~us three major components: a prototype appoiutment planner (developed by the DFKI project AKA-MOI)) that keeps the calendar database, provides temporal resolution and drives the communication with other agel,ts; a graphical user interface (developed inside the DISCO project) monitoring tile Calendar state and</Paragraph> </Section> <Section position="8" start_page="438" end_page="438" type="metho"> <SectionTitle> APPOINTMENT PLANNER I,~:VEL </SectionTitle> <Paragraph position="0"> plication to the COSMA scenario. Tile entire COSMA prototylm has been Imilt on top of I,he DISCO DIgVEL-OPMENT SIIEI,L its a nlollotol|ic extension to the core system.</Paragraph> <Paragraph position="1"> supporting the nlotlse- and menu-driven arrangement, of new appointments and, finally, the DISCO core system (enriched with a set of application specilic modules) that provides the natural language and linguistic dialogue capabilities.</Paragraph> <Paragraph position="2"> Intm'faee to the. Core Engine The communication \])etween tile DISCO system and the appointment planner is modelled in it ,-estricted appointment tmsk inl.erface language aim roughly nleets the illterHal l'el)resental.ion of the al)pointment planner. 'tk) connect the two components, DISCO is em'iehed with a dedicated interface nm(lule that l,ranslates l)etween the DIS(:() internal semantics representation language AfPS~. and the appointment planner representation. The translation process (maintaining the substantial difference in expressive power between A/'PSPS and the restricted planner language) builds on ideas from current compiler technology with a limited set of domain- and application-specific inference rules \[10\].</Paragraph> <Paragraph position="3"> On its opposite end DISCO is hooked up to plain electrmfic mail facilities through a general lmrpose e-mail i~,terfaee that allows it to receive and send e-mail (and in case of processing Nilures to 'respool' messages to the user mailbox).</Paragraph> </Section> class="xml-element"></Paper>