XML Viewer - w99-0104

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0104_metho.xml
Size: 10,025 bytes
Last Modified: 2025-10-06 14:15:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0104">
  <Title>O @ O O O O O O O O O O O @ O O O 0 O O O O O O O @ O 0 O @ O O O O @ O O O O O @ O @ O</Title>
  <Section position="4" start_page="33" end_page="35" type="metho">
    <SectionTitle>
3 Lexical Cohesion
</SectionTitle>
    <Paragraph position="0"> The heuristics encoded in COCKTAIL make light use of textual cohesion, i.e. the property of texts to Ustick together s by using related words.</Paragraph>
    <Paragraph position="1"> Both pronominal and nominal coherence resolution heuristics use cohesion cues indicated by term repetition while nominal corofexence relies on semantic relations between anaphors and their antecedents.</Paragraph>
    <Paragraph position="2"> In addition, coreference chains are a form of textual cohesion, known as referential cohesion (d. (Halliday and Haesan, 1976)).</Paragraph>
    <Paragraph position="3"> Until now, lex/m/cohes/on, arising from semantic connections between words, was successfully used as the only form of textual cohesive structure, known as * lez/cd chdn&amp; At present there are three methods of generating lexical chains. The first one, implemented in the TextTning algorithm (Hearst, 1997), counts the f~lUencies of term repetitions and is an ideal, lightweight tool for segmenting texts. The second method, adds knowledge from semantic dictionaries (e.g. Roget's Thesaurus in the work of (Morris and Hirst, 1991) or WordNet in the methods presented in (B~y and Elhadad, 1997), (Hirst and St-Onge, 1998)). Besides term repetition, this approach reco~i,~s relations between text words that are connected in the dictionaries with predefined patterns. This method was applied for generation of text ~lmmm'ies, the recognition of the intentional structure of texts and in the detection of malapropism. The third method is based on a path-finding algorithm detailed in (Harabagiu and Moldovan, 1998). This method creates a richer SDefiuition introduced in (Halliday and Ha.man, 1976) and (Morris and Hirst, 1991)  structure, useful for the al~duction of coherer~e relations from the knowledge encoded in WordNet.</Paragraph>
    <Paragraph position="4"> Here we describe a new cohesion structure that (a) incorporates both lexical and referential cohesion and (b) produces a unique chain that contains not only single words, but also textual entities encompassing head-adjunct lists. We use the finite-state parses of FaSTU$ (Appelt et al., 1993) for recognizing these entities, but the method extends to any basic phrasal parser 4.</Paragraph>
    <Paragraph position="5"> We produce this novel cohesive structure to exploit the close relation between text cohesion and coherence. It is known (cf. (Harabagiih 1999)) that cohesion, as a surface indicator of the text coherence, can indicate the lexico-semantic knowledge upon which coherence is inferred. Our aim is to use this cohesive chain for producing axiomatic knowledge for CICERO, a TACITUS-like system that abducts coherence relations. TACITU$ (Hobbs etal., 1993) is a successful abductive system when provided with extensive praPS~n~ic and linguistic knowledge. CICERO is des~ned as a Jightwe~t version of TACITUS, that performs reliable abductions, with minimal knowledge and effective searches. Translating all the lexical, morphological, synta~'c and semantic ambiguities from texts would make the search intractable. Out solution for CICERO is to use a cohesive chain to create manageable knowledge upon which the abduction can be performed. Section 4 describes this knowledge and the operation of CICERO.</Paragraph>
    <Paragraph position="6"> Our cohesive chain is a link~! structure consisting of three parts: (1) the connected text entity, (2) its incoming and outgoing pointers and (3) a fez/cosemantic. ~ph~ containing paths of WordNet con-cepts and relations. The lexico-semantic structure is later translated in the axiomatic knowledge that supports coherence inference. To exemplify the cohesion chain, we use the following text, spanned by the coreference chains produced with COCKTAIL: \[Toys R Us\]~ named Midmd Goldstein \[chief ezecut/ve o~, ending ymrs o l ~eculation~ about ~oho u,~/l su~d \[6'horla,/,aw~\]s, \[the \[to~ retaaer\], 's founder and chief architetrt.\]s \[Robert Nalmsone\]4, \[former vice chairman and udddy regarded as t~ other st~riou~ cont~ufer for \[the top ezecuti~e\]~ '8 job\]4, ~m named president and chief opueting o~=r, both ne~ positions.</Paragraph>
    <Paragraph position="7"> The indexes indicate the four coreference chains.</Paragraph>
    <Paragraph position="8"> This text has only two repeating terms, the verb name and the noun executive, thus it generates little information with the TeztTiling algorithm. The cohesion method detailed in (Barzilay and Elhadad, 4Such a parser operates on part-of-speech tagged text, with several noun and verb grouping rules. 1997) can detect one lexical chain: \[chief executive o~cer, chairman, executive, .president\]. We would like to obtain richer lexico-semantic information, thus we build a cohesion chain that contains larger textual entities. To recognize the entities, we use the coreference chains and the following parse, pro duc.ecl, by FASTUS:  #&lt;PHIJ~E(B~) :'president and chief operatin K officer.</Paragraph>
    <Paragraph position="9"> both nee poeiticm8&amp;quot;&gt; Textual entities are either basic phrases contained in the coreference chaln.q or lists of phrases collected from the parse, by scanning for all NGs or NAMEphrases directly connected to a verb phrase through a S~bject, Ob3ect or prepositional relations. For example, as phrase &amp;quot;Toys R (.ramie the antecedent from a coreference chain, its corresponding textual entity is:  The derivation of the lexico-semantic structure (LSS) follows the steps: Lfor every re/at/on r(wt,,n2) from a TE if(there is st a sense of wl a.d s2 a sense of wa such that the same relation r'(ws,w4)/8 found in a gtoss ~ the hierorchies of ,z~' or w~' ) Add relation r&amp;quot; to LS$ ~.for every word tv in a TE  if (there is a concept C in LSS such that there is a collocation \[~1 c\] in a gloss from the hierarchy(w)) Add to to LSS 3. if (word w is already in LSS) Add ne~a connection to w in LSS For example, in the first TE illustrated in Figure I, we have the relation Object(name, CEO). We find art Object relation also in the gloss of appoint, the hypernym of sense 3 of verb name. The new Obo \]ect relation connect verb assume with the synset {duty, responsibility, obligation}. A hypernym of CEO is manager, collocating with position in the gloss of managership. Noun position belongs to the hierarchy of duty, thus the new Object relation can be added to the LSS.</Paragraph>
    <Paragraph position="11"/>
  </Section>
  <Section position="5" start_page="35" end_page="36" type="metho">
    <SectionTitle>
4 Text Coherence
</SectionTitle>
    <Paragraph position="0"> We base our consideration of textual coherence on the definitions introduced in (Hobbs, 1985). The formal definition of relations that capture the coherence between textual assertious is based on the relations between the states they infer, their changes and their logical connections. States, changes and logical connections can be retrieved from pragmatic knowledge, accessible in lexical knowledge bases like WordNet. The complex structure of our cohesion chains help guiding these inferences. 0 For each textual unit, defined from the parse of the O text, axiomatic knowledge produced. The acquisition of axiomatic knowledge is cued by the concepts O and relations from the LSS portion of the cohesion O chain, and is mined from WordNet. CICERO, our system, adds to this knowledge axioms that feature the O characteristics of every coherence relation. CICER0's O job is to abduct the coherence structure of a text.</Paragraph>
    <Paragraph position="1"> To do so, it follows the steps: 0 /.for every textual unit TUi @  ~. Derive pragmatic knowledge for TUi @ 3. for every pair (TUi,TUj),i ~ j 4- for every coherence relation 7~k O 5. hypothesize R~(TU. TUj) 6. Perform abduction R~ (TU. TUj) O 7. Choose cheapest abduction @  For the text illustrated in Section 3, this proce- O dure generates the coherence graph illustrated in @  We exemplify the operation of CICERO on this text by presenting the way it derives the Elaboration relation between the textual unit from the first sentence that announces the nomination of Michael Goldstein (TU.) and the textual unit from the same sentence that deals with the succession of Charles Lazarus (TUb). l~st, CICERO generates the knowledge upon which the abductions can be performed. This knowledge is represented in axiomatic form, using the notation proposed in (Hobbs et al., 1993) and previously implemented in TACITUS. In this formalism each text unit represents an event or a state, thus has a special variable e associated with it. Events are lexicalized by verbs, which are reaped into predicates verb(e,z,y), where z represents the subject of the event, and y represents its object (in the case of intransitive verbs, y is not attached to a predicate,  whereas in the case of bitransitive verbs, y is mapped&amp;quot; into Yl and l~2).&amp;quot;Moreover, predicates from the text are related to other predicates, derived from a knowledge base. These relations are captured in first order predicate calculus. For example, the pragmatic knowledge used for the derivation of the Elaboration relation between TUa and TUbis: TU,:</Paragraph>
    <Paragraph position="3"> In the next step, ~!1 coherence relations are hypothesized, and the cost of their abduction is obtained. The appendix lists the LISP function created on the fly by CICERO that produces the abduction of the Elaboration function. Because of the computational expense, an intermediary Step simplifies the axiomatic knowledge. The appendix lists also the full abduciton and its cost. CICERO is a system still under development, and at present we did not evaluate the precision of its results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML