File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2202_intro.xml
Size: 3,183 bytes
Last Modified: 2025-10-06 14:06:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2202"> <Title>DiMLex: A lexicon of discourse markers for text generation and understanding</Title> <Section position="3" start_page="0" end_page="1238" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Assuming that text can be formally described (and represented) by means of discourse relations holding between adjacent portions of text (e.g., \[Mann, Thompson 1988\]), we use the term discourse marker for those lexical items that (in addition to non-lexical means such as punctuation, aspectual and focus shifts, etc.) can signal the presence of a relation at the linguistic surface. Typically, a discourse relation is associated with a wide range of such markers; consider, for instance, the following variety of CON-CESSIONS, which all express the same underlying propositional content. The words treated here as discourse markers are underlined.</Paragraph> <Paragraph position="1"> We were in SoHo; {nevertheless\[ nonetheless I however \] still \] yet}, we found a cheap bar.</Paragraph> <Paragraph position="2"> We were in SoHo, but we found a cheap bar anyway.</Paragraph> <Paragraph position="3"> Despite the fact that we were in SoHo, we found a cheap bar.</Paragraph> <Paragraph position="4"> Notwithstanding the fact that we were in SoHo, we found a cheap bar.</Paragraph> <Paragraph position="5"> Although we were in SoHo, we found a cheap bar.</Paragraph> <Paragraph position="6"> If one accepts these sentences as paraphrases, then the various discourse markers all need to be associated with the information that they signal a concessive relationship between the two propositions involved. Next, the fine-grained differences between similar markers need to be represented; one such difference is the degree of specificity: for example, but can mark a general CONTRAST or a more specific CONCESSION. ~,~e believe that a dedicated discourse marker lexicon holding this kind of information can serve as a valuable resource for natural language processing. Our efforts in constructing that lexicon are described in Section 2.</Paragraph> <Paragraph position="7"> From the perspective of text generation, not all paraphrases listed above are equally felicitous in specific contexts. In order to choose the most appropriate variant, a generator needs knowledge about the fine-grained differences between similar markers for the same relation.</Paragraph> <Paragraph position="8"> Furthermore, it needs to account for the interactions between marker choice and other generation decisions and hence needs knowledge about the syntagmatic constraints associated with different markers. We will discuss this perspective in Section 3.</Paragraph> <Paragraph position="9"> From the perspective of text understanding, a sophisticated system should be able to derive the discourse relations holding between adjacent text spans, and also to notice the additional semantic and pragmatic implications stemming from the usage of a particular discourse marker.</Paragraph> <Paragraph position="10"> We will briefly characterize such applications in Section 4.</Paragraph> </Section> class="xml-element"></Paper>