File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2184_intro.xml
Size: 3,919 bytes
Last Modified: 2025-10-06 14:06:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2184"> <Title>How Verb Subcategorization Frequencies Are Affected By Corpus Choice</Title> <Section position="3" start_page="1122" end_page="1123" type="intro"> <SectionTitle> 2 Methodology </SectionTitle> <Paragraph position="0"> For the sentence production data, we used the numbers published in the original Connine et al.</Paragraph> <Paragraph position="1"> paper as well as the original data, which we were able to review thanks to the generosity of Charles Clifton. The Connine data (CFJCF) consists of examples of 127 verbs, each classified as belonging to one of 15 subcategorization frames.</Paragraph> <Paragraph position="2"> We added a 16th category for direct quotations (which appeared in the corpus data but not the Connine data). Examples of these categories, taken from the Brown Corpus, appear in figure 1 below. There are approximately 14,000 verb tokens in the CFJCF data set.</Paragraph> <Paragraph position="3"> For the BC, WSJ, and SWBD data, we counted subcategorizations using tgrep scripts based on the Penn Treebank. We automatically extracted and categorized all examples of the 127 verbs used in the Cormine study. We used the same verb subcategorization categories as the Connine study. There were approximately 21,000 relevant verb tokens in the Brown Corpus, 25,000 relevant verb \[O\] Barbara asked, as they heard the front door close. \[PP\] Guerrillas were racing \[toward him\].</Paragraph> <Paragraph position="4"> 3 \[mf-S\] Hank thanked them and promised \[to observe the rules\]. 4 \[inf-S\]/PP/ Labor fights \[to change its collar from blue to white\]. 5 \[wh-S\] I know now \[why the students insisted that I go to Hiroshima even when I told them I didn't want to\].</Paragraph> <Paragraph position="5"> 6 \[that-S\] She promised \[that she would soon take a few day's leave and visit the uncle she had never seen, on the island of Oyajima --which was not very far from Yokosuka\]. 7 \[verb-ing\] But I couldn't help \[thinking that Nadine and WaUy were getting just what they deserved\]. \[perception Far off, in the dusk, he heard \[voices singing, muffled but strong\]. complement.\] 9 \[NP\] The turtle immediately withdrew into its private council room to study \[the phenomenon\]. 10 \[NP\]\[NP\] The mayor of the town taught \[them\] \[English and French\]. 11 \[NP\]\[PP\] They bought \[rustled cattle\] \[from the outlaw\], kept him supplied with guns and ammunition, harbored his men in their houses.</Paragraph> <Paragraph position="6"> 12 \[NP\]\[inf-S\] She had assumed before then that one day he would ask \[her\] \[to marry him\]. 13 INP\]\[wh-S\] I asked \[Wisman\] \[what would happen if he broke out the go codes and tried to start transmitting one\].</Paragraph> <Paragraph position="7"> 14 \[NPl\[that-S\] But, in departing, Lewis begged \[Breasted\] \[that there be no liquor in the apartment at the Grosvenor on his return\], and he took with him the fast thirty galleys of Elmer Gantry. 15 \[passive\] A cold supper was ordered and a bottle of port.</Paragraph> <Paragraph position="8"> 16 Quotes He writes \[&quot;Confucius held that in times of stress, one should take short views - only up to</Paragraph> <Paragraph position="10"> Figure 1 - examples of each subcategorization frame from Brown Corpus tokens in the Wall Street Journal Corpus, and 10,000 in Switchboard. Unlike the Connine data, where all verbs were equally represented, the frequencies of each verb in the corpora varied. For each calculation where individual verb frequency could affect the outcome, we normalized for frequency, and eliminated verbs with less than 50 examples. This left 77 out of 127 verbs in the Brown Corpus, 74 in the Wall Street Journal, and only 30 verbs in Switchboard. This was not a problem with the Connine data where most verbs had approximately 100 tokens.</Paragraph> </Section> class="xml-element"></Paper>