File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-1027_abstr.xml

Size: 6,908 bytes

Last Modified: 2025-10-06 13:45:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-1027">
  <Title>Genre Number of Percent of Number of Percent of Present Tense Occurrences Past Tense Occurrences</Title>
  <Section position="2" start_page="167" end_page="167" type="abstr">
    <SectionTitle>
MARKEDNESS AND FREQUENCY 169
</SectionTitle>
    <Paragraph position="0"> found the problem difficult, largely because there were very few studies available to him which gave information about the frequency of grammatical categories. Some examples which Greenberg collected showed the expected frequency distributions: singulars more frequent than plurals, or a greater frequency of so-called direct cases (nominative, accusative and vocative) than of oblique cases (genitive, dative, ablative, locative, instrumental, etc.) in a number of languages. In other instances, however, the evidence was either inconclusive or contrary to the putative markedness analysis. In Josselson's Russian word count (Josselson 1953), for example, the imperfective aspect of verbs (which is considered the unmarked member of the opposition) is slightly less frequent (46.9%) than the marked perfective (53.1%).</Paragraph>
    <Paragraph position="1"> The availability of new frequency data of grammatical categories now makes it possible to reopenthe question and to see whether the results can shed light on the adequacy of the markedness hypothesis in this area. The main source for my analysis is the one-million-word Corpus of Present-day American English, assembled at Brown University in the 1960's and recently analyzed grammatically (cf. Francis and Ku~era 1982). This data base does constitute a representative sample of printed American English, consisting of 500 samples of texts, each about 2,000 words long, drawn from a variety of sources ranging from newspapers to learned articles and fiction. The grammatical analysis of the corpus is based on a taxonomy of 87 grammatical classes or &amp;quot;tags&amp;quot; (including six syntactically useful punctuation tags), representing an expanded and refined system of word-classes, supplemented by major morphological information and some syntactic information.</Paragraph>
    <Paragraph position="2"> In some relatively straightforward cases, our English data clearly support the expected frequency correlation that is consistent with the markedness analy~i.~..</Paragraph>
    <Paragraph position="3"> The prevalence of singular over plural forms in common nouns shows, in spit1~ of some interesting stylistic differences, an overall frequency ratio simil~to that between unmarked and marked lexical forms, namely about 3:1. The same is true of those case forms which are still formally marked in English: roughly the same 3:1 ratio holds for the nominative of personal pronouns (i.e. forms such &amp;quot;as 'I, he, she, we, they') vs. objective forms (i.e. 'me, him, her, us, them').</Paragraph>
    <Paragraph position="4"> With regard to the more interesting cases, such as tense, the statistical evidence from the Brown Corpus offers both greater problems and greater insight. The adherents of the markedness hypothesis have often attempted to fit the notion of tense into the markedness framework. Tense in some Slavic languages, for example, has been generally analyzed into a binary opposition of the marked past vs. the unmarked nonpast, an analysis substantiated on the grounds that the present forms, too, serve a more general function than simply the localization of the activity as overlapping with the speech moment--such as that of the gnomic present, historical present, or &amp;quot;programmed&amp;quot; future. Greenberg (1966) assumes the same kind of analysis for Latin and Sanskrit and presents figures which show that the &amp;quot;unmarked&amp;quot; present in both of these languages is more frequent than the &amp;quot;marked&amp;quot; past and the future, although the differences between the present and past figures, in the Sanskrit sample, are really quite small.</Paragraph>
    <Paragraph position="5"> Let us assume, for a moment, that the past--nonpast relation has some appeal as a linguistic universal and assume that in English, too, the simple present is unmarked and the simple past is the marked member of this opposition. If one takes the frequency figures of the entire Brown Corpus into account, then the past tense predominates above the present; there are altogether 21391 occurrences of the simple present vs. 26172 occurrences of the simple past. The frequency data is thus the reverse of what one might have assumed under the markedness analysis.</Paragraph>
    <Paragraph position="6"> But the fact of real interest is not this discrepancy alone but rather the fact that, in looking at the individual genres of writing represented in the data base, and counting the present vs. past tense occurrences separately for each genre category, a quite idiosyncratic pattern emerges, as the following table indicates.</Paragraph>
    <Paragraph position="7">  In six of the fifteen genre categories, the simple present is more frequent than the past; in nine, the opposite is true. The stylistic reasons for this tense distribution are fairly clear from the characteristics of the genres involved: the present tense prevails in the descriptive genres, while the past predominates greatly in what might be called the narrative genres. Interestingly enough, this tense distribution groups all imaginative prose (Genres K through R) with A. Press: Reportage, in this narrative category. The other two newspaper genres (B. Press: Editorial and C. Press: Reviews), on the other hand, are grouped together with the descriptive genres.</Paragraph>
    <Paragraph position="8"> The genre dependency of the present and past tense forms makes a meaningful statement about their possible markedness relation and their frequency a rather hopeless enterprise. The same difficulty, as it turns out, is not limited to the present/past opposition but encompasses all granTnatical categories, including other tense and aspect forms, such as the perfect and the progressive. The chi-square statistical test returns a highly significant value when calculated for the distribution of these forms over the fifteen genres of the corpus. The chi-square for the perfect is 634.00, for the progressive aspect 806.70, for the simple present 840.35 and for the simple past 12391.56. All these figures are highly significant, even at the I% level of significance (at P = 0:01 for 14 degrees of freedom, the critical value of chi-square = 29.1). Consequently, the null hypothesis that the uneven distribution of grammatical forms among the genres of the corpus is due to chance has to be rejected.</Paragraph>
    <Paragraph position="9"> The apparent impossibility of determining a stable frequency of such grammatical forms as tense and aspect in English dooms the attempts to find a correlation</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML