File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0302_intro.xml

Size: 3,150 bytes

Last Modified: 2025-10-06 14:06:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0302">
  <Title>Identifying the Linguistic Correlates of Rhetorical Relations</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Within Rhetorical Structure Theory (RST) (Mann and Thompson 1986, 1988), the discourse structure of a text is represented by means of a hierarchical tree diagram in which contiguous text spans are related by labeled relations.</Paragraph>
    <Paragraph position="1"> Hierarchical structure results from the fact that each text span in a labeled relation may itself have a complex internal discourse structure.</Paragraph>
    <Paragraph position="2"> Traditionally, human analysts have constructed RST analyses for texts by employing tacit, subjective, intuitive judgments. RASTA (Corston-Oliver 1998a, 1998b), a discourse analysis component within the Microsoft English Grammar, automatically produces RST analyses of texts. To do so, it proceeds in three stages. In the first stage, RASTA identifies the clauses that function as terminal nodes in an RST analysis. In the second stage, RASTA examines all possible pairs of terminal nodes to determine which discourse relation, if any, might hold between the two nodes. In the third stage, RASTA combines the terminal nodes according to the discourse relations that it hypothesized to form RST analyses of a complete text.</Paragraph>
    <Paragraph position="3"> This paper discusses the second stage of processing, during which RASTA identifies discourse relations. Whereas introspection is a viable strategy for human analysts, a computational discourse analysis system like RASTA requires explicit methods for identifying discourse relations. This paper therefore describes (section 2) the kinds of linguistic evidence that RASTA considers in positing discourse structure. Intuitively, cues to discourse relations are not all equally compelling. This intuition is reflected in the use of heuristic scores (section 3) to measure the plausibility of a relation. Section 5 describes in detail the cues used to identify the SEQUENCE relation and gives a worked example. For a more complete description of the workings of RASTA, the reader is referred to Corston-Oliver (1998b).</Paragraph>
    <Paragraph position="4"> The Microsoft English Grammar (MEG) is a broad-coverage grammar of English that performs a morphological analysis, a conventional syntactic constituent analysis and a logical form analysis (involving the normalization of syntactic alternations to yield a representation with the flavor of a predicate representation). Functional roles such as subject and object are identified and anaphoric references are resolved during linguistic analysis.</Paragraph>
    <Paragraph position="5"> To date, I have focused on the text of Encarta 96 (Microsoft Corporation 1995, henceforth Encarta), a general purpose electronic encyclopedia whose articles exhibit a variety of complex discourse structures. All examples in this paper are taken from Encarta. References given are to the titles of articles.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML