File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/79/j79-1011_intro.xml

Size: 7,650 bytes

Last Modified: 2025-10-06 14:04:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="J79-1011">
  <Title>ssociation for Computational Lingui~tics</Title>
  <Section position="4" start_page="7" end_page="15" type="intro">
    <SectionTitle>
PROCEDURE
</SectionTitle>
    <Paragraph position="0"> Thirty-six syntactic structures were selscted for considerati.cn on ths tasis of being possible contributors to the ccnstellaticn~ dftermini~ genre style. They were chosen fzom four main categories: sentence-type, including the range of pcssible interrogative patterns; fccus phenomena of variouz sorts; dements of the main verb phrase: and a group of ccr joined or emEcdded structures. 17 is latter group included structures of noun-phrase modif icaticn , verbal complementation, sentence modification, and carallel el em en^ ccm]ciaihg.</Paragraph>
    <Paragraph position="1"> Table 1 shows the svctactic structutes whose frequency cf occurrence was ccunted, along with the identification numbers assigned to each of these variables for use in subsequent tables and discussions. Note that the, various structures are clearly hct ccmpletely indc padent of each  ~xtra~osition, e. g. , im~liss an embedded clause; a passive conztrccticn implies a transitive verb; conditional clauses may imply the Fast tense marker; contracted verbal forms imply auxiliaries; and emphatic do and other auxiliaries axe mutually EXC~US~V~. Hcweve~, none of these relationships (ref'err2d to here as grammatical conoccurrence restriCticxs) , with the exception of the last, is reciprccai. Extra~ositicn implies an embedded claus~, but an embedded clau~~ 6oes nct necessarily imply extreposition; a ~aasive ccn~truction irn~lies a transitive verb, but a transitive verb dcec nct ilr,~ly a passive ccnstr~cticn, etc. Since these various flemerits arcl not complete1,y redundant, they are able to oFerats at i~ast semi-~ndependcntly as FOSS~~~EU !syntactic indicztor~ of style.</Paragraph>
    <Paragraph position="2"> Five genres were chosen for investigaticn cc the basis cf scntext _c_f _ut_tfrz~cs which ~ermits identification by place cf ~ublicaticn.. he genres s~ltctc i: were: Learned Jcurnals, N~.,ws~a~er fiqcrtage, Pcpular Jcurna1s, Govercment Dccunents, and Ficti~n.</Paragraph>
    <Paragraph position="3"> The actual dati, ucIe drawn fro^ e Ernwn University  oillicn- wcrd ~nqlish COIFUS, - A --- Standard --- Samylf pf F12sent-- null dx Edited Am~r-rcac Enqiish for Use with Dl~ital computers. .... ----a ---I--- -I --I I...-.- - ---- --1)--.-- --This cor ~us ccnr~sts of 500 samples cf English-language texts gubllshed in Ul;it~d Stat~s in 1561, each sample  approximately 2,GflU words long. This large number of relatively shcit ram~les minimizes the effect of any single author or topic, and the restrlctiofis on aax~ tind place of ~ublicatior. contrcl variables associated with provenance. F. complete de~cri~tion of this corpus anjd its ccntent may be found in Francis (19E4) or in Kucera &amp; Fra~cis (1967).</Paragraph>
    <Paragraph position="4"> A total sam~le of 560 sectences was drawn, 1 from each cf the five genres. Each genre subset cf 1C4 consistsd cf tsn sentenczs frcm each of ten sentence-l~xgth blocks, Sentence length uas measured In words, and the blocks are ~hcwn in Table 2, These block lengths weif chosen to mirscr roughly the distribution of sentence lengths in the entire corpus, A st~uctursd sam~le of this kind was dkawn tc prevent sentence length as such from acting as a variable since sertenceler~gt h distributicn was already knc wn to differentiate among genres (Marckuorth G Eell, 1967), and also to guarantee that those syntactic devices which tad to be associated with greater sentence length wculd have equ~l c~portunitles to sppear within each genre sam~le. Again, the emphasis in this study was on the sentkrce as the basic u~it cf analysis, and cn the CO-ccc urrence of syntactic structures within sentences, The reader should keep in mind, however, that a randcm sam~l~ cf ZentGnces would permit certain sentsnce  lengths tc dcminate in sp~clfic gazes, ~cssibiy obscuring the CO-cccurr~nce ~attsrns of interest he re but, neverthsl~ss, ref'lectinq another property which can clearly he said to chariicter zeiqenre style, Each cf thE sarn~le sentences uas analyzed for the occurrEnce of the syntactic variables indicated in Table 1, and the cumber cf occurrences of each structure was recor6ed. A discussicn cf the basis on whim the syntactic elements were identified may be found In Marckworth (1973, ~p. 4'4-48). The basic data for analysis thus consisted of  500 otservaticnq (ssntences) with ea~h observation scored on 56 variables and clIassified by genre and length. The subsequent analysis cf these data was kased pqimarily d,n discriminaxit .functions 'which were used to determine how thg variabla~ rerved to distinguish one gare f rcm another.</Paragraph>
    <Paragraph position="5"> Easically , discriminant tunction analysis is the multivariate extensicn cf the univariate F ratio which is used to distingulsh among previously estatlishsd groups. Ir rrprezents, however, a contiderable incxease in both, complexity and analytical Fcuer since it focuses not only bn the simple. dif fermces between groups ,on each variable, but alsc cn the interfslationships amcng differences on the several varlables ccnsidered simu'ltaneously. It serves to maximize qxoup differences by developing maximally efficient weights whlch, when a~~lied to the original.daf2, will yield the clearsst disrinctions anong *he groups teing analyzed. The mct hod of discrimina~t function analysis is discussed fully in Rulc~, - et -- al. (1567, pp. 299-315) %he frequency of occurrence of gach cf the variables withie each of the qenre categories is shown in Table 3.</Paragraph>
    <Paragraph position="6"> Eleven of the varlables, indicated by an asterisk following  *Omitted from main analysis because ,of lcw, frequency of cccurrence in the total sample (( 25).</Paragraph>
    <Paragraph position="7">  the tctal ccluma, appeared in less tha~ five percent of the sent ERCES examined. Because of this low incidence, inter ret tat ion of these variables would 1:~ difficult and tenuous so they were omitted from further a~alyses.</Paragraph>
    <Paragraph position="8">  Tho cf the, variables (6 and 23) could not be examined in this way because of zero iric-idence in scae qenres, but the data patterr, fcr 23 (illclusi~r, gf direct discourse)  suggests an cbvious distircticn between f cr nal and informal n~n-f lctlcn. Variable 6 (imperatives) wobld clearly distinguish between Ne wsFaFer Reportag6 and Popular Journals, which is ,not surprising in vlew of the number of how-tc-do-it articles in the latter- gare. These two variakles (6 and 23) could be and were retained for the discriminant f uncticc znalyses, The remahlrg variables (1G, 52, 25, 30, 31,. 32, arid 34) showed nc distinction as univariat c indices cc the four tests but they were, neverthelesst alsc retain~d for the multivariat2 analysis since they cculd, uh~n ~r~alyzed in conjurction with other variakles, still ~rcvide irfcrmatioc for genre distinction. This is kkcaus~ th~ simple univariate analyses discuss~d abqhe dc not take into account the possible interccrrclat icns (ccnst~llet-icn effec~s) ~mor,g the variatles.</Paragraph>
    <Paragraph position="9"> The first ~ult'iv~rizte analysis yas a f ive-group discriminant furcrion analysis, performd on the five gEnres. it indicated z clear differentiaticr of fiction 'from the PScur ncn-fiction qenrec (see Fiqure 1) , cn *he bzsis of the eight syntactic variables llst~d in' Table 4 These results demcnstrat~ that sentences fro^ all cf the non-</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML