File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0303_metho.xml

Size: 15,234 bytes

Last Modified: 2025-10-06 14:14:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0303">
  <Title>Controlling the application of lexical rules</Title>
  <Section position="4" start_page="11" end_page="14" type="metho">
    <SectionTitle>
3 Acquiring probabilities
</SectionTitle>
    <Paragraph position="0"> In order to implement the approach described, it is necessary to acquire probabilities for attested senses, and to derive appropriate estimates of lexical rule productivity. Probabilities of different word senses can be learned by a running analyzer, to the extent that lexical ambiguities are resolved either during processing or by an external oracle, and for limited domains this may well be the best approach. We are more interested in incorporating probabilities in a large, reusable, lexical knowledge base. Recent developments in corpus processing techniques have made this more feasible.</Paragraph>
    <Paragraph position="1"> For instance, work on word sense disambiguation in corpora (e.g. Resnik 1995), could lead to an estimate of frequencies for word senses in general, with rule-derived senses simply being a special case. Many lexical rules involve changes in subcategorization, and automatic techniques for extracting subcategorization from corpora (e.g. Briscoe and Carroll, 1995; Manning, 1993) could eventually be exploited to give frequency information.</Paragraph>
    <Paragraph position="2"> In some cases, a combination of large corpora and sense taxonomies can be used to provide a rough estimate of lexical rule productivity suitable for instantiating the formulae given in the  previous section. For example, we examined verbs derived from several classes of noun from the  90 million word written portion of the British National Corpus, using the wordlists compiled by Adam Kilgarriff. We looked at four classes of nouns: vehicles, dances, hitting weapons (e.g. club, whip) and decOrative coatings (e.g. lacquer, varnish). For the sake of this experiment, we assumed that these undergo four different lexical rules1: * vehicle -&gt; go using vehicle (Levin, 1992 : 51.4.1) * dance -&gt; perform dance ((Levin : 51.5) * hitting weapon -&gt; hit with weapon (subclass of Levin : 18.3) * paint-like substance -&gt; apply paint-like substance (Levin : 24)  The first problem is isolating the nouns which can be input to the lexical rules. For the purposes of dei-iving a productivity measurement for the rule as a whole, it does not matter much if the set is incomplete, as long as there are no systematic differences in productivity between the included and the excluded cases. There are several potential sources for semantically coherent noun classes. The list of vehicle nouns was derived from a taxonomy constructed semi-automatically from Longman Dictionary of Contemporary English (Procter, 1978), as described by Copestake (1990). This taxonomy only included land vehicles, not boats or airplanes. The other three classes were derived manually from a combination of Roget's and WordNet, since the relevant taxonomies were not available. For the 'hitting weapon' and 'paint-like substance' classes, this involved combining several Roget categories and WordNet synsets. We excluded entries made up of more than one word, such as square dance and also pruned the set of nouns to exclude cases where a non-derived verb form would confuse the results (e.g. taxi).</Paragraph>
    <Paragraph position="3"> Initially we :used the automatically assigned part of speech tags to identify verbs, but these gave a large number of false positives, because of errors in the tagging process. Therefore we looked instead for forms ending in -ed and -ing which had been tagged as verbs. There is still the potential for false positives if an adjectival -ed form (e.g. bearded) was mistakenly tagged as a verb, but this did not appear to cause a problem for these experiments. Only considering inflected forms means that we are systematically underestimating frequencies, but since the main aim is to acquire the correct relative ordering of lexical rules, this is not too problematic. Figure 4 shows some raw frequencies of noun and verb (-ed, and -ing form) from the BNC. We also show frequencies of the -er nominal, which we assume is derived from the verb form. For comparison, we show whether the word is found in the Cambridge International Dictionary of English (CIDE), a modern learner's dictionary. A more sophisticated system for acquisition of accurate frequencies for each word would have to be capable of sense-disambiguation. For example, according to Figure 4, distemper was found as a noun 37 times, but many of these uses actually referred to the disease, rather than the paint.</Paragraph>
    <Paragraph position="4"> We assumed that a unique conversion rule applied to each noun and calculated the productivities of the lexical rules as the ratio of the number of words for Which verbs were found over the total number of words in the class which were found in the corpus. The results are summarized in  forms were genuine. This resulted in one putative example of the conversion rule being discarded: trailered and trailering were found in one section of the corpus, but turned out to refer to getting lit is irrelevant here exactly how these rules axe to be formalized, though see references in Levin (1992) and also Kiparsky (1996). It is also not essential to our approach that these rules be treated as distinct, from the viewpoint of their representation as typed feature structures, since it would be possible to attach probabilities to subrules which only differed in the semantic type of their input.</Paragraph>
    <Paragraph position="6"> a horse into a trailer, rather than transporting by trailer. In other words, trailer here is being regarded as a container or location, rather than as a vehicle. Manual checking of the rare derived forms is not particularly time-consuming, so a semi-automatic approach, where high frequency forms which are found in an MRD are assumed to be genuine, but where low frequency examples are manually checked, should be adequate.</Paragraph>
    <Paragraph position="7"> As expected, some very frequent nouns such as car and vehicle had no corresponding verbs. Of course we could hypothesize that verb formation is preempted by synonymy (e.g. by drive). But, whatever the cause, blocking is allowed for automatically by the approach proposed here, since the probability calculated for unseen entries of high frequency words will be very low (see section 4). Similarly, it should not be necessary to explicitly encode the fact that the conversion rule does not apply to an already derived form such as primer.</Paragraph>
    <Paragraph position="8"> Even with a 90 million word corpus, some words occurred very infrequently, and others which were found in Roget's and/or WordNet were absent completely. For example calcimine is defined in WordNet as a type of water-based paint, and is also found in Roget's, but does not occur in the BNC. Even the relative estimates for productivity of rules will be inaccurate if there is a systematic difference between the frequency of words in one input class as compared to another, since infrequently occurring words are less likely to have attested derived forms. We discuss modifications to the formulae which would allow for this in the next section. This effect might have accounted for the relatively 10w productivity observed for the dance rule. However, there might also be phonological effects since many dance names are taken from languages other than English. The results for productivity are only strictly comparable within a particular corpus. It should be apparent from the frequencies that large corpora are needed to find instances of some words.</Paragraph>
  </Section>
  <Section position="5" start_page="14" end_page="16" type="metho">
    <SectionTitle>
4 Utilizing probabilistic lexical rules
</SectionTitle>
    <Paragraph position="0"> The majority of implemented NLP systems have either simply listed derived forms and extended senses, or treated them using lexical rules as redundancy statements. In the introductory section, we argued that this approach cannot be correct in principle, because of the problem of nonce senses. But it is also demonstrably inadequate, at least for systems which are not limited to a narrow domain. In an experiment with a wide-coverage parsing system (Alvey NL Tools, ANLT) Briscoe and Carroll (1993) observed that half of the parse failures were caused by inaccurate subcategorization information in the lexicon. The ANLT lexicon was derived semi-automatically from a machine readable dictionary (LDOCE), and although the COMLEX syntax dictionary (Grishman et al., 1994), which was derived with much greater amounts of human effort, has a slightly better performance, the difference is not great. Automatic acquisition of information from corpora is a partial answer to this problem, and one which is in many respects complementary to the approach assumed here, but successful acquisition of a broad-coverage lexicon from a really large corpus would lead to a similar problem of massive ambiguity as we see in the case of productive lexical rules. Control of syntactic ambiguity by the use of lexical and other probabilities has been demonstrated by several authors (e.g. Black et al., 1993; Schabes, 1992; Resnik, 1992), but the difficulty of acquisition means that the validity of utilizing lexical probabilities of the type assumed here has not yet been demonstrated on a large scale.</Paragraph>
    <Paragraph position="1"> This approach fits in most naturally with systems where probabilistic information is incorporated systematically. However it could be useful with more traditional systems. Different applications could utilize probabilistic information in different ways. For word choice in generation, it would be appropriate to take the highest-probability suitable entry, and, if none are attested, to construct a phrase, rather than apply a semi-productive lexical rule to produce a nonce form. For  analysis, the most likely rules can be applied first, in the case of known senses, and since nonce senses are (by definition) rarer, rules will be applied productively only when this fails. This improves on the control principle suggested in Copestake (1992), that lexical rules should only be applied if no interpretation was applicable which did not involve a lexical rule, since it allows for cases such as turkey, where the derived (meat) use is more frequent than the non-derived (animal) use in the corpora which we have examined. The two other control effects suggested in Copestake (1992) are both also superseded by the current proposal. One of these was to allow for blocking, which is discussed below. The other was that more specific lexical rules should be preferred over' more general ones. We would expect that, in general, the more specialized rule will be more productive, as a natural consequence of applying to a smaller class, but the earlier proposal would have had the undesirable consequence that this was a fixed consequence, which could not be adjusted for cases where the generalization did not hold. Thus the grammar writer was, in effect, required to consider both competence and performance when stipulating a rule.</Paragraph>
    <Paragraph position="2"> The general claim we make here is that if we assume that speakers choose well-attested high-frequency forms to realize particular senses and listeners choose well-attested high-frequency senses when faced with ambiguity, then much of the 'semi-productivity' of lexical rules is predicted.</Paragraph>
    <Paragraph position="3"> Blocking can be treated as a special case of this principle: if speakers use higher frequency forms to convey a given meaning, an extended meaning will not become conventionalized if a common synonym exists. This means that we do not have to stipulate a separate blocking principle in interpretation, since the blocked senses will not be attested or will have a very low frequency. And in generation, we assume that higher probability forms are preferred as a way of conveying a given meaning. Practically, this has considerable advantages over the earlier proposal, that blocking should be detected by looking for synonyms, since the the state of the art in acquisition and representation of lexical semantic information makes it difficult to detect synonymy accurately. We can assume, for example, that a verbal use of car will not be postulated by a generator, because it is unattested, and will only be possible for an analyzer when forced by context. It is necessary to allow for the possibility of unblocking, because of examples such as the following: (3) a There were five thousand extremely loud people on the floor eager to tear into roast cow with both hands and wash it down with bourbon whiskey.</Paragraph>
    <Paragraph position="4"> (Tom Wolfe, 1979. The Right StuJ~) b In the case of at least one county primary school ...they were offered (with perfect timing) saute potatoes, carrots, runner beans and roast cow.</Paragraph>
    <Paragraph position="5"> (Guardian newspaper, May 16th 1990, in a story about mad cow disease.) However, this is not the complete story, since we have not accounted formally for the extra implicatures that the use of a blocked form conveys, nor have we allowed for the generation of blocked forms (apart from in the circumstances where the generator's lexicon omits the synonym). Both these problems require an account of the interface with pragmatics, though the latter is perhaps not serious for computational applications, since we are unlikely to want to generate blocked forms. The treatment proposed here is one of many possible schemes for estimating the productivity of lexical rules and integrating these estimates with the estimation of the probabilities of unseen entries for given word forms. Other more complex schemes could be developed, which,, for example, took account of the average probability of the output of a lexical rule. This might be necessary, for example, to model the relative frequencies of -er vs -ee suffixation, since although the latter</Paragraph>
  </Section>
  <Section position="6" start_page="16" end_page="16" type="metho">
    <SectionTitle>
I
</SectionTitle>
    <Paragraph position="0"> is more productive (by Baayen and Lieber's (1991) definition), tokens of the former are more frequent overall (Barker, 1996). However, we will assume the simple approach here, since acquiring the average probability of lexical rule output raises some additional difficulties, and we currently have no evidence that the more complex approach is justified, given that our main aim is to rank unseen senses by plausibility. Another problem, mentioned above, is the need to ensure that classes have comparable frequency distributions. This could matter if there were competing lexical rules, defined on different but overlapping classes, and if one class has a very high percentage of low frequency words compared to the other, the estimate of its productivity will tend to be lower. The * productivity figure could be adjusted to allow for item frequency within classes, but we will not discuss this further here.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML