File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/p93-1032_evalu.xml
Size: 8,986 bytes
Last Modified: 2025-10-06 14:00:07
<?xml version="1.0" standalone="yes"?> <Paper uid="P93-1032"> <Title>AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA</Title> <Section position="7" start_page="237" end_page="240" type="evalu"> <SectionTitle> RESULTS </SectionTitle> <Paragraph position="0"> The program acquired a dictionary of 4900 subcategorizations for 3104 verbs (an average of 1.6 per verb). Post-editing would reduce this slightly (a few repeated typos made it in, such as acknowlege, a few oddities such as the spelling garontee as a 'Cajun' pronunciation of guarantee and a few cases of mistakes by the tagger which, for example, led it to regard lowlife as a verb several times by mistake). Nevertheless, this size already compares favorably with the size of some production MT systems (for example, the English dictionary for Siemens' METAL system lists about 2500 verbs (Adriaens and de Braekeleer 1992)). In general, all the verbs for which subcategorization frames were determined are in Webster's (Gove 1977) (the only noticed exceptions being certain instances of prefixing, such as overcook and repurchase), but a larger number of the verbs do not appear in the only dictionaries that list subcategorization frames (as their coverage of words tends to be more limited). Examples are fax, lambaste, skedaddle, sensationalize, and solemnize. Some idea of the growth of the subcategorization dictionary can be The two basic measures of results are the information retrieval notions of recall and precision: How many of the subcategorization frames of the verbs were learned and what percentage of the things in the induced dictionary are correct? I have done some preliminary work to answer these questions.</Paragraph> <Paragraph position="1"> In the mezzanine, a man came with two sons and one baseball glove, like so many others there, in case, from the text on the second line and whether the resultant dictionary has the correct subcategorization for this occurrence shown on the third line (OK indicates that it does, while * indicates that it doesn't). For recall, we might ask how many of the uses of verbs in a text are captured by our subcategorization dictionary. For two randomly selected pieces of text from other parts of the New York Times newswire, a portion of which is shown in Fig. 1, out of 200 verbs, the acquired subcategorization dictionary listed 163 of the subcategorization frames that appeared. So the token recall rate is approximately 82%. This compares with a baseline accuracy of 32% that would result from always guessing TV (transitive verb) and a performance figure of 62% that would result from a system that correctly classified all TV and THAT verbs (the two most common types), but which got everything else wrong.</Paragraph> <Paragraph position="2"> We can get a pessimistic lower bound on precision and recall by testing the acquired dictionary against some published dictionary. 13 For this 13The resulting figures will be considerably lower than the true precision and recall because the dictionary lists subcategorization frames that do not appear in the training corpus and vice versa. However, this is still a useful exercise to undertake, as one can attain a high token success rate by just being able to accurately detect the most common subcategorization test, 40 verbs were selected (using a random number generator) from a list of 2000 common verbs. 14 Table 2 gives the subcategorizations listed in the OALD (recoded where necessary according to my classification of subcategorizations) and those in the subcategorization dictionary acquired by my program in a compressed format. Next to each verb, listing just a subcategorization frame means that it appears in both the OALD and my subcategorization dictionary, a subcategorization frame preceded by a minus sign (-) means that the sub-categorization frame only appears in the OALD, and a subcategorization frame preceded by a plus sign (+) indicates one listed only in my program's subcategorization dictionary (i.e., one that is probably wrong). 15 The numbers are the number of cues that the program saw for each subcatframes. null 14The number 2000 is arbitrary, but was chosen following the intuition that one wanted to test the program's performance on verbs of at least moderate frequency.</Paragraph> <Paragraph position="3"> 15The verb redesign does not appear in the OALD, so its subcategorization entry was determined by me, based on the entry in the OALD for design.</Paragraph> <Paragraph position="4"> egorization frame (that is in the resulting subcategorization dictionary). Table 3 then summarizes the results from the previous table. Lower bounds for the precision and recall of my induced subcategorization dictionary are approximately 90% and 43% respectively (looking at types).</Paragraph> <Paragraph position="5"> The aim in choosing error bounds for the filtering procedure was to get a highly accurate dictionary at the expense of recall, and the lower bound precision figure of 90% suggests that this goal was achieved. The lower bound for recall appears less satisfactory. There is room for further work here, but this does represent a pessimistic lower bound (recall the 82% token recall figure above). Many of the more obscure subcategorizations for less common verbs never appeared in the modest-sized learning corpus, so the model had no chance to master them. 16 Further, the learned corpus may reflect language use more accurately than the dictionary. The OALD lists retire to NP and retire from NP as subeategorized PP complements, but not retire in NP. However, in the training corpus, the collocation retire in is much more frequent than retire to (or retire from). In the absence of differential error bounds, the program is always going to take such more frequent collocations as subeategorized.</Paragraph> <Paragraph position="6"> Actually, in this case, this seems to be the right result. While in can also be used to introduce a locative or temporal adjunct: (5) John retired from the army in 1945.</Paragraph> <Paragraph position="7"> if in is being used similarly to to so that the two sentences in (6) are equivalent: (6) a. John retired to Malibu.</Paragraph> <Paragraph position="8"> b. John retired in Malibu.</Paragraph> <Paragraph position="9"> it seems that in should be regarded as a subcategorized complement of retire (and so the dictionary is incomplete).</Paragraph> <Paragraph position="10"> As a final example of the results, let us discuss verbs that subcategorize for from (of. fn. 1 and Church and Hanks 1989). The acquired subcategorization dictionary lists a subcategorization involving from for 97 verbs. Of these, 1 is an outright mistake, and 1 is a verb that does not appear in the Cobuild dictionary (reshape). Of the rest, 64 are listed as occurring with from in Cobuild and 31 are not. While in some of these latter cases it could be argued that the occurrences of from are adjuncts rather than arguments, there are also a6For example, agree about did not appear in the learning corpus (and only once in total in another two months of the New York Times newswire that I examined). While disagree about is common, agree about seems largely disused: people like to agree with people but disagree about topics.</Paragraph> <Paragraph position="11"> flavor: TV:8, --TV-PP(wiih) heat: IV:12, TV:9, --TV-P(up), --P(up) leak: P(out):7, --IV, --P(in), --IV, --TV- P(tO) lock: TV:16, TV-P(in):16, --IV, --P(), --TV- P(together), --TV-P(up), --TV-P(out), --TVP(away) null mean: THAT:280, TV:73, NPINF:57, INF:41, ING:35, --TV-PP (to), --POSSING, --TV-PP (as) --DTV, --TV-PP (for) occupy: TV:17, --TV-P(in), --TV-P(with) prod: TV:4, Tv-e(into):3, --IV, --P(AT), --NPINF redesign: TV:8, --TV-P (for), --TV-P(as), --NPINF reiterate: THAT:13, --TV remark: THAT:7, --P(on), --P(upon), --IV, +IV:3, retire: IV:30, IV:9, --P(from), --P(t0), --XCOMP, +e(in):38 shed: TV:8, --TV-P (on) sift: P(through):8, --WV, --TV-P(OUT) strive: INF:14, P(for):9, --P(afler), -e (against), -P (with), --IV tour: TV:9, IV:6, --P(IN) troop: --IV, -P0, \[TV: trooping the color\] wallow: P(in):2,--IV,-P(about),-P(around) water: WV:13,--IV,--WV-P(down), -}-THAT:6 Recall (percent of OALD ones learned): 43% some unquestionable omissions from the dictionary. For example, Cobuild does not list that forbid takes from-marked participial complements, but this is very well attested in the New York Times newswire, as the examples in (7) show: (7) a. The Constitution appears to forbid the general, as a former president who came to power through a coup, from taking office. null b. Parents and teachers are forbidden from taking a lead in the project, and ...</Paragraph> <Paragraph position="12"> Unfortunately, for several reasons the results presented here are not directly comparable with those of Brent's systems. 17 However, they seems to represent at least a comparable level of performance. null</Paragraph> </Section> class="xml-element"></Paper>