File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1141_evalu.xml
Size: 7,831 bytes
Last Modified: 2025-10-06 13:59:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1141"> <Title>Collocation Extraction Based on Modifiability Statistics</Title> <Section position="6" start_page="5" end_page="5" type="evalu"> <SectionTitle> 4 Experimental Results and Discussion </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 4.1 Precision and Recall for Collocation Extraction </SectionTitle> <Paragraph position="0"> In the first experiment, we incrementally examined parts of the a14 -highest ranked candidate lists returned by the each of the four measures we considered. The precision values for various a14 were computed such that for each percent point of the list, the proportion of true positives was compared to the overall number of candidate items returned. This yields the precision curves in Figure 1 and its associated values at selected list portions in the upper table from Table 2.</Paragraph> <Paragraph position="1"> First, we observe that all measures outperform the baseline by far and, thus, all are potentially useful measures of collocativity. Of the statistical measures, log-likelihood (the most complex one) performs the worst, whereas t-test and frequency, almost indistinguishable, share the middle position, with frequency measurements having a very slight edge at six rank points. This is in contrast to the findings reported by Krenn and Evert (2001), which gave the t-test an edge.6 As can be clearly seen, however, our linguistic modifiability measure substantially outperforms all other measures at all points in the ranked list.</Paragraph> <Paragraph position="2"> Considering 1% (a14a1a0 a2a4a3 ), its precision value is ten percentage points higher than for t-test and frequency, and even 22 points higher compared to loglikelihood. Until 50% (a14a5a0a7a6a9a8a4a10a4a10 ) of the ranked list is considered, modifiability maintains a three to five percentage point advantage in precision over t-test and frequency. In the second half of the list, all curves and associated values start converging towards the baseline.</Paragraph> <Paragraph position="3"> We also tested the significance of differences for our precision results, both between modifiability and frequency and between modifiability and t-test.</Paragraph> <Paragraph position="4"> Because in both cases the ranked lists were taken from the same set of candidates, viz. the 8,644 PP-verb candidate types, and hence constitute dependent samples, we applied the McNemar test (Sachs, 1984) for statistical testing. We selected 100 measure points in the ranked list, one after each increment of one percent, and then used the two-tailed test for a confidence interval of 99%. Table 3, which lists the number of significant differences for 10, 50 and 100 measure points, shows that almost all of them are significantly different.</Paragraph> <Paragraph position="5"> # of significance # of signicant differences measure points comparing modifiability with The recall curves in Figure 2 and their corresponding values in the lower table from Table 2 measure which proportion of all true positives is identified by a particular measure at a certain part of the ranked list. In this sense, recall is an even better indicator of a particular measure's performance. Again, the linguistically motivated collocation extraction algorithm outscores all others, even more pronounced than for precision. When examining 20% (a14a11a0 a3a13a12 a10a4a14a38a60 , 30% (a14a15a0 a10a4a16a4a14a4a8 ) and 40% 6The reason why frequency performs even slightly better than t-test may very well have to do with the size of our training corpus (114 million words). But this just underlines the fact that large corpora are essential for collocation discovery.</Paragraph> <Paragraph position="7"> a2 ) of the ranked list, modifiability, respectively, identifies almost 60%, 70% and 80% of all true positives, holding a ten percentage point lead over t-test and frequency at each of these points.</Paragraph> <Paragraph position="8"> When 50% (a14 a0 a6a9a8a4a10a4a10 ) are considered, this difference reaches eleven and twelve points (compared to frequency and t-test, respectively).</Paragraph> <Paragraph position="9"> Even more strikingly, for the identification of 90% of all true positives, modifiability only needs to look at 55% (a14 a0 a6 a12 a16 a6 ) of the ranked list. Frequency, on the other hand, needs to examine 75% (a14 a0 a3 a6 a2 a8 ) and t-test even 85% (a14 a0 a12 a8 a6 a12 ) of the ranked list to reach this high level of recall.</Paragraph> </Section> <Section position="2" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 4.2 Modifiability Revisited </SectionTitle> <Paragraph position="0"> The previous subsection showed that a measure for collocation discovery which takes into account the linguistic property of limited modifiability fares significantly better than linguistically not so founded, purely statistical measures. Although the modifiability property constitutes common wisdom about collocations, it has not yet been empirically evaluated. Thus, we ran an experiment which took both the PNV triples classified as collocations and the PNV triples classified as non-collocations and counted the numbers of distinct supplements (referred to as a14 in Subsection 3.3). From this data, we set up a distribution of collocational and non-collocational PNV triples in which the distributional ranking criterion was the number of distinct supplements (cf. Figure 3).</Paragraph> <Paragraph position="1"> As Figure 3 reveals, not only is the proportion of collocational PNV triples with only one distinct supplement higher (36%) than the proportion for non-collocational ones (20%), but with each additional supplement, the collocational proportion curve declines more steeply than its non-collocational counterpart. Moreover, the collocational proportion curve already ends with 54 distinct supplements, whereas the non-collocational proportion curve leads up 520 distinct supplements. Thus, we are able to add some empirical grounding to the widespread textbook assumption about the limited modifiablity of collocations.</Paragraph> <Paragraph position="2"> Another observation (which is also inherent to our linguistic measure) based on this experiment is that some collocations do possess at least limited modifiability. Collocation acquisition is, of course, not a goal by itself, but rather aims at creating collocation lexicons for both language processing and generation (Smadja and McKeown, 1990). From this perspective, our linguistic modifiabilty measure actually yields quite a valuable by-product for the development of lexicons or collocational knowledge bases: A list of possible structural and lexical modifications associated with a particular collocational entry candidate. In our case, these modifications refer to the nominal group of the PP. We illustrate this point in Table 4 with two collocational PNV triples and some of their associated NP supplements plus their frequencies.</Paragraph> <Paragraph position="3"> As can be seen, both structural and lexical attributes of collocations can thus be obtained. The structural information comes in the form of part-of-speech (POS) tags. From this, possible prenominal POS types and their combinations can be used to describe a collocation's structural make-up. From a lexical viewpoint, the collocation can be described by the lexical semantic word classes used for modification.7 As can be seen in Table 4 under the PNV triple for 'to get under pressure', the noun 'Druck' ('pressure') is often modified by a certain semantic class of adjectives, such as 'stark' ('strong'), 'schwer' ('heavy'), 'erheblich' ('considerable', 'grave').</Paragraph> </Section> </Section> class="xml-element"></Paper>