File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-1008_intro.xml
Size: 7,126 bytes
Last Modified: 2025-10-06 14:05:11
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1008"> <Title>ON TEXT COHERENCE PARSING Udo llahn Albert-Ludwigs 4hfiversit~t Fmiburg Linguistische Infonnatik / Computerlinguistik</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> l)ufing tim last years it has become increasingly apparent that dialog and text understanding systems must account Ior connectivity relations that extend over sentence boundaries. This has led tn a bulk of work dealing with varinus forms of cohesimt-preserving language mechanisms, lnainly in the field of anaphora, which contribute to connectivity among sentences.</Paragraph> <Paragraph position="1"> From the focus on these linguistic phenomena one might obtain a misleading picture of textual connectivity, viz. one that considers it basically as a 'fiat', continuous streanl of formally connected utterances lacking additional structure. Far less research has been devnted to the intemM organization of cohesive utteranccs by mechanisms at a more global level of dialog/text architecture, the level of text coherence.</Paragraph> <Paragraph position="2"> Major computational approaches rclated to co-.</Paragraph> <Paragraph position="3"> herence aspects within a dialog processing framework am due to Reichman's \[1978\], McKcown's \[19851 and Scha & Polmlyi's \[19881 lbnnalizations of diatog grammars. Coherence criteria of written texts llavc \[)cell illvestigated ill tile context of 'Rhetorical Stmcttlm Thet) ry' IMann & Thompson 19881 and related extensions le.g., Altemlan 1982, Tucker, Nirenburg & Raskin 19861 of tile original theory of coherence relations ill discom.'se \[Hobbs 1982\]. A second major methodok)gy which deals with the global stnactufing of written texts is the model nf text lnacro propositions and superstnlcturns \[Kintsch & wm Dijk 1978, van Dijk 19801, tilt latter sharing all relevant pmtxmies one generally altriDutes to story grammars \[Runlelhari 19'\]51. The problem with this kind of methodology is that, unlike the coherence relation approach, the grammat,'s which have been proposed so far are litirly idiosyncnttic 10r each application dmnain (narratives, weather reports, etc.). Cornmolt to all these approaches is the requirement of a deep, propositionally guided understanding of the underlying discourse; in particular, a complete theory o1' its dontain and an exhaustive specifieatitm of a natural language grammar must be supplied in order to guaraw tee proper operation of implemented systelns. This AcqEs In! COL1NG-92, NANI\[.:S. 23-28 aot~q' 1992 2 5; might explain wily, with only low exceptions, these UItK|C1S Of text coherence have resisted lullher coral)ilia li0nal treatment as evidenced by Ol)cratiooal systems.</Paragraph> <Paragraph position="4"> We here make an alternative alld conlpt|tationa\[ly more tractable l)rolx)sal on how it) deal wifll global text structures at the text coherei1ce level. Its roots Call be traced back to the seminal wolk of F, l)mms \[ 1974\], ill which he inl~lrmally deve, lnped tile notion of thematie progression patterm', distinguishinl; Delween three prototypical patterns, viz. constant theme, continuous tim realization of dxemes, and derived theme (see st:ctiou 3). The model outlined ill this paper stalls lmm a thor ough fi~rmalization of (one ol) these notioos and places it into the cnvimmnent of a fully operational wxt pars ing ,~\vste.t wtlose design is mainly oriented towards the proper l~cognitiotl of text cohesion aod coherence phe nolnclla. Pellioent feasolls for our clloiec oI: a 1)allen type model of text coherence ale: (1) The text parser fomls part of tile text nndelstanding system TOPIC. It operates ht a i~al.world doruain \[Reimer & \]tahu 19881, i.e. textual input is taken fl'onl a t)crnlallcnt stream of test reports in major (;ennan in fomladon technology magazines. As it seems that it will remain iu/'easible for a long time to come to provide exhaustive dt)tuain and grammar si~cilications lor routinely operating text understmulers, a palticularly robust partialpwwing approach capable o\['handlinp potential specilication gaps has lreen adopted. These conditions obvkmsly preclude tile consideration of RST-style co hercnce relation COlUl)uting as a text coherence analysis stlategy, since relevant knowledge pmtions might be lacking lbr deteonming specific instances of coherence relations. Conversely, the coherence lelatiou appnlach seems currelflly itffeasiMe for tile rotlthle processing ol large-scale text collections in real donlaios.</Paragraph> <Paragraph position="5"> (2) Tile description of ctlherence structures in tenns o f coherence mlatiotls or text macro prolx)silitms requires the awlilability o\[deet) m'seHiot~d knowledge from thcP application domain (A-bt)x level spccilications in Kryp-Ion ternlinology; t:f. grachmau el al. \[ 1985 It. The TOP l(~ systellI, \[ltlwcver, emphasizes tile role o1: tetmDlo logiczd knowledge of its C/\[om~.iill, i.e. tile description ol prolotytiical plx)i)elties aIKl iuferellce rules related to baste conceptual HIIi\[S of the domain (Krypton's T-/xlx level knowledge). As TOPIC is rather weak with re spool to lull-blown asseltiolm\[ knowledge, coherelicc relation etm~puting, however valuable it might be, is currently out of reach for this systenl. Fortunately, Daues-type coherence t)allerns primarily t'eli:r to the level of tenninological knowledge.</Paragraph> <Paragraph position="6"> (3) l'rototypical patterns oI: themalic llrogression arc lairly gem:ral and independent Of particular domains ttlat exlx)sitory lexts deal with. l.ingttistic studies have PROC. OF COI,ING 92, NANTES, Aut;. 23-28, 1992 collected empirical evidence for this claim through investigations of texts from diverse domains \[Giora 1983a, Kurzon 1984\]. This coincides with the generality of use of most coherence relations, but is in sharp contrast to the highly constrained and domain-dependent model of superstructures and story grammars. (4) Major thematic progression pattems are correlated with particular search styles and retrieval modes in full-text information systems. Hence, providing typed coherence operators inherently supports graphics-based user interactions with the TOPIC system in terms of advanced conceptual orientation and navigation tools for semantically guided text graph tours (see section 5.3).</Paragraph> <Paragraph position="7"> (5) The investigation of thematic progression pattems is of value in its own methodological right. They constitute a basic structural model of text macro organization as opposed to model-theoretic and plan/goal-based approaches (a distinction made by Pustejovsky \[1987\]).</Paragraph> <Paragraph position="8"> As such they might complement current text understanding methodologies whose emphasis, so far, has been on fairly knowledge-expensive assertional models (such as coherence relations and text macro propositions) or stereotyped text-semantical models (such as superstructures and story grammars).</Paragraph> </Section> class="xml-element"></Paper>