File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0402_intro.xml

Size: 5,783 bytes

Last Modified: 2025-10-06 14:00:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0402">
  <Title>Mining Discourse Markers for Chinese Textual Summarization</Title>
  <Section position="3" start_page="11" end_page="12" type="intro">
    <SectionTitle>
2 Chinese Discourse Markers
</SectionTitle>
    <Paragraph position="0"> Among all kinds of information that may be found in a piece of discourse, discourse markers (also known as discourse connectives, clue words (Reichman 1978; Siegel et al. 1994) or cue phrases (Grosz et al. 1986; Litman 1996) are regarded as the major linguistic deviceavailable for a writer to structure a discourse. Discourse markers are expressions which signal a sequential relationship between the current basic message and the previous discourse. Schiffrin (1987) is concerned with elements which mark sequentially dependent units of discourse. She examines discourse markers in interview data, looking specifically at their distribution and their particular interpretation(s). She proposes that these markers typically serve three functions: (i) they index adjacent utterances to the speaker, the hearer, or both; (ii) they index adjacent utterances to prior and/or subsequent discourse; (iii) they work as contextual coordinates for utterances by locating them on one or more planes of her discourse model.</Paragraph>
    <Paragraph position="1"> Discourse markers also figure prominently in Chinese which has a tendency to delay topic introduction (Kaplan 1996; Kirkpatrick 1993).</Paragraph>
    <Paragraph position="2"> Hinds (1982) and Kong (1998) also maintain that the Chinese tendency of delayed topic introduction is heavily influenced by the qi cheng zhuan he canonical structure (a Chinese rhetorical pattern). In a study examining rhetorical structure in Chinese, Kirkpatrick (1993) found that several major patterns, favored and considered to be good style by native Chinese writers, are hinted at by Chinese discourse markers. Although the effect of discourse markers in other languages might not be too prominent, there is a great necessity to study discourse markers in Chinese in order to capture the major associated rhetorical patterns in Chinese texts. While the full semantic understanding in Chinese texts is obviously much more difficult to accomplish, the approach using text mining techniques in identifying discourse markers and associated rhetorical structures in a sizeable Chinese corpus will be certainly beneficial to any language processing, such as summarization and knowledge extraction in Chinese.</Paragraph>
    <Paragraph position="3"> In Chinese, two distinct classes of discourse markers are useful for identification and interpretation of the discourse structure of a Chinese text: primary discourse markers and secondary discourse markers (T'sou et al. 1999).</Paragraph>
    <Paragraph position="4"> Discourse markers can be either words or phrases.</Paragraph>
    <Paragraph position="5"> Table 1 provides a sample listing of various</Paragraph>
    <Paragraph position="7"> rhetorical relations and examples considered in this research.</Paragraph>
    <Paragraph position="8">  ruguo 'if', name 'then' zhiyou 'only if', cai 'only \[hen' C/inwei 'because', suoyi 'therefore' iiran 'given that', name 'then' suiran 'although', danshi 'but' &amp;quot;ishi 'even if', rengran 'still' chule 'except',jianzhi 'also' huozhe 'or', huozhe 'or' ~udan 'not only', erqie 'but also'  zong er yan zhi 'in one word' ~hishi shang 'in fact' liru 'for example' tebie shi 'in particular' dati er yan 'in general' wulun ruhe 'anyway' shouxian 'first', qici &amp;quot;next&amp;quot; huan ju hua shuo 'in other words' zhengru 'just as' nandao ('does it mean...') kexi 'unfortunately' and Associated Rhetorical Relations in Chinese It may be noted that our analysis of Chinese has yielded about 150 discourse markers, and that on the average, argumentative text (e.g. editorials) in Chinese shows more than one third of the discourse segments to contain discourse markers. While primary discourse markers can be paired discontinuous constituents, with each marker attached to one of the two utterances or propositions, the socondary discourse markers tend to be unitary constituents only. In the case of primary discourse markers, it is quite common that one member of the pair is deleted, unless for emphasis. The deletion of both discourse markers ts also possible. The recovery process therefore faces considerable challenge even when concerned * with the deletion of only one member of the paired discourse markers. Since these discourse markers 'have no unique lexical realization, there is also the need for disambiguation in a homocode problem. Moreover, primary discourse markers can also be classified as simple adverbials, as is the case in English: (I) Even though a child, John is so tall that he has problem getting half-fare.</Paragraph>
    <Paragraph position="9"> (2) Even though a child, (because) John is tall, so he has problem getting half-fare. In (1), so is usually classified as an adverb within a sentence, but in (2) so is recognized as marking a change in message thrust at the discourse level.</Paragraph>
    <Paragraph position="10"> In the deeper linguistic analysis the two so's may be related, for they refer to a situation involving excessive height with implied consequence which may or may not be stated. In terms of the surface syntactic structure, so in (1) can occur in a simple (exclamatory) sentence (e.g. &amp;quot;John is so tall!&amp;quot;), but so in (2) must occur in the context of complex sentences. Our concern in this project is to identify so in the discourse sense as in (2) in contrast to so used as an adverb in the sentential sense as in (1). Similar difficulties are found in Chinese, as discussed in Section 7.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML