File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2102_intro.xml

Size: 4,962 bytes

Last Modified: 2025-10-06 14:05:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2102">
  <Title>Towards a Syntactic Account of Punctuation</Title>
  <Section position="3" start_page="0" end_page="604" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Ititherto, the field of punctuation has been almost completely ignored within Natural Language Processing, with perhaps the single exception of the sentence-final full-stop (period). The reason for this non-treatment has been the lack of any coherent theory Of punctuation on which a computational treatment could be based. As a result, most contemporary systems simply strip out punctuation in input text, and do not put any marks into generated texts.</Paragraph>
    <Paragraph position="1"> Intuitively, this s~ems very wrong, since punctuation is such an integral part of many written languages. If text in the real world (a newspaper, for example) were to appear without any punctuation marks, it would appear very stilted, ambiguous or infantile. Therefore it is likely that any computational system that ignores these extra textual cues will suffer a degradation in performance, or at the very least a great restriction in the class of linguistic data it is able to process.</Paragraph>
    <Paragraph position="2"> Several studies have already shown the potential for using punctuation within NLP. Dale (1991) has shown the positive benefits of using punctuation ill the fields of discourse structure and semantics, suggesting that it can be used to indicate degrees of rhetorical balance and aggregation between juxtaposed elements, and also that in certain cases a punctuation mark can determine the rhetorical relations that hold between two elements.</Paragraph>
    <Paragraph position="3"> In the field of syntax Jones (1994) has shown, through a comparison of the performance of a grammar that uses punctuation and one which does not, that for the more complex sentences of real language, parsing with a punctuated grammar yields around two orders of magnitude fewer parses than parsing with an nnpunctuated grammar, and that additionally the punctuated parses better reflect the linguistic structure of the sentences. Briscoe and Carroll (1995) extend this work to show the real contribution that usage of punctuation can make to the syntactic analysis of text. They also point out some fundamental problems of the approach adopted by Jones (1994).</Paragraph>
    <Paragraph position="4"> If, based on the conclusions of these studies, we are to include punctuation in NLP systems it is necessary to have some theory upon which a treatment can be based. Thus far, the only account available is that of Nunberg (1990), which although it provides a useful basis for a theory is a little too vague to be used as the basis of' any implementation. In addition, the basic implementation of Nunberg's punctuation linguistics seems untenable, certainly on a computational level, since it stipulates that punctuation phenomena should be treated on a seperate level to the lexical words in the sentence (Jones, 1994). It is also the case that Nunberg's treatment of punctuation is  too prescriptive to account for, or permit, some phenomena that occur in real language (Jones, :1995).</Paragraph>
    <Paragraph position="5"> Therefore it is necessary to develop a new theory of punctuation, that is suitable for computational implementation. Work has already been carried out on the variety of punctuation marks and their interaction (Jones, :1995), showing that whilst tile set of symbols that we conventionally regard as punctuation (point punctuation, quotation and parenthetical symbols) account for the majority of punctuation in the written language (and therefore conld be implemented in a standardised way), there is another set of more unusual symbols, usually with a higher semantic content, which tend to be specific to the corpus in which they occur and therefore art; less suited to a standardised treatment. This study also shows that the average number of punctuation symbols to be expected in a sentence of English is four, thus reinforcing the argument for the inclusion of pnnctnation in language processing systems.</Paragraph>
    <Paragraph position="6"> Tile next step towards the devek)pnmnt of a theory of punctuation is the study of the interaction of punctuation and the lexical items it separates, in particular the way that punctuation will integrate into grammars and syntax. The major problem of the ewduatory studies, (Dale (199l), Jones (1994), and to a far lesser: extent Briscoe &amp; Carroll (1995)), was that their coverage and use of pun&lt;:tuation was rather poor, being necessarily based on human intuitions and possible idiosyncrasies. What is needed therefore is a proper investigation into the syntactic roles that punctuation symbols can play, and a tbrmalisation of these into instructions for the inclusion of punctuation in N\]+ grammars.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML