File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/85/p85-1039_intro.xml
Size: 3,805 bytes
Last Modified: 2025-10-06 14:04:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P85-1039"> <Title>THE USE OF SYNTACTIC CLUES IN DISCOURSE PROCESSING</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> The role that syntactic structure should play in natural language processing has been a matter of debate in computational linguistics. While some researchers eschew syntactic processing as giving a poor return on the heavy investment of a parser (Schank and Riesbeck, 1981), others make syntactic representations the basis from which further work is done (Sager, 1981; Hirschman and Sager, 1982). Current syntax-based processors tend to work only within a narrow semantic domain, since they rely heavily on word co-occurrence patterns which hold only within texts from a partdeg icular sublangua&e. Knowledge-based processors, on the other hand, can operate on a less restricted semantic field, but only if sufficient knowledge in the form of scripts, frames, and so forth, is built into the program.</Paragraph> <Paragraph position="1"> This paper describes a syntactic approach to natural language processing which is not bound to a narrow semantic field, and which requires little or no world knowledge. This approach has been demonstrated in a computer program called DUMP (~iscourse Understanding model \[rogram), which relies solely on syntactic structure to create summaries of one particular genre of discourse-that of newspaper reports--and to label the kinds of information given in them (Decker, 1985). The process for creating these summaries differs substantially from the word-llst and statistical methods used by other automatic abstractor programs (Borko and Beruier, 1975). The DUMP program therefore depends on a predictable discourse genre or style, rather than a predictable sublanguage lexicon or body of world knowledge.</Paragraph> <Paragraph position="2"> DUMP was developed from a corpus of over 5800 words representing twenty-three news reports from three daily newspapers: the New York Times, the Boston Globe, and the Providence Journal/Evenin~ Bulletin. With one exception, each story appeared in the upper right-hand column of the front page.</Paragraph> <Paragraph position="3"> The stories in the corpus were chosen randomly and the only criterion for rejection was too large a percentage of quoted material. Only the first two hundred words or so of each story were included in the corpus in order to allow a greater samplin~ of reports. The discourse principles at work are fairly represented in an excerpt o ~ this length.</Paragraph> <Paragraph position="4"> The input to the DUMP program consists of a llst of hand-~6rsed sentences making up each story.</Paragraph> <Paragraph position="5"> Ideaily,.these parse trees should be the output of a parsing program. ~n fact, about one-third of the sentences were passed through the RUS parser (Woods, 1973). RUS experienced difficulty with some of these sentences for a number of reasons: the parser was operating without a semantic component, and arcs from nodes were ordered with the expectation of feedback from semantics; RUS lacked some rules for structures which appear with regularlt 7 in the news; It attempted to give all the parses of a sentence, where DUMP only required one, and that not necessarily the correct or complete one (about which more later); and DUMP's rules call for certain syntactic labels which are not ordinarily assigned by parsing programs (negative and adversative clauses, for example). However, it should be stressed that none of these difficulties represents parsing problems of theoretical import. All could he resolved by extensions to existing components of the ATN and its dictionary.</Paragraph> </Section> class="xml-element"></Paper>