File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/87/j87-3002_evalu.xml
Size: 13,489 bytes
Last Modified: 2025-10-06 14:00:00
<?xml version="1.0" standalone="yes"?> <Paper uid="J87-3002"> <Title>LARGE LEXICONS FOR NATURAL LANGUAGE PROCESSING: UTILISING THE GRAMMAR CODING SYSTEM OF LDOCE</Title> <Section position="7" start_page="0" end_page="31" type="evalu"> <SectionTitle> 6 EVALUATION </SectionTitle> <Paragraph position="0"> The utility of the work reported above rests ultimately on the accuracy of the lexical entries which can be derived from the LDOCE tape. We have not attempted a systematic analysis of the entries which would result if the decompacting and grammar code transformation programs were applied to the entire dictionary. In Section 3 we outlined some of the errors in the grammar Computational Linguistics, Volume 13, Numbers 3-4, July-December 1987 211 Bran Boguraev and Ted Briscoe Large Lexicons for Natural Language Processing marry v 1 \[T1; I0\] to take (a person) in marriage: He married late in lifs / never marrleK t (fig.) She marr/ed money (= a rich man) 2 TI\] (of a priest or official) to perform the ceremony of marriage for (2 people): An o/d It/end marr/ed them 3 IT1 (to)\] to cause to take in marriage: She want8 to marry her dAzw~er to codes which are problematic for the decompacting stage. However, mistakes or omissions in the assignment of grammar codes represent a more serious problem. While inconsistencies or errors in the application of the grammar coding system in some cases can be rectified by the gradual refinement of the decompacting program, it is not possible to correct errors of omission or assignment automatically. On the basis of unsystematic evaluation, using the programs to dynamically produce entries for the PATR-II parsing system, a number of errors of this type have emerged.</Paragraph> <Paragraph position="1"> For example, the LDOCE definitions and associated code fields in Figure 15, demonstrate that upset(3) needs it + D5 which would correspond to its use with a noun phrase and a sentential complement; suppose(2) is missing optional &quot;to be&quot; for the X1 and X7 codes listed; help(l) needs a T3 code since it does not always require a direct object as well as an infinitive complement; and detest needs a V4 code because it can take a direct object as well as a gerund complement.</Paragraph> <Paragraph position="2"> It is difficult to quantify the extent of this problem on the the basis of enumeration of examples of this type.</Paragraph> <Paragraph position="3"> Therefore, we have undertaken a limited test of both the accuracy of the assignment of the LDOCE codes in the source dictionary and the reliability of the more ambitious (and potentially controversial) aspects of the grammar code transformation rules. It is not clear, in particular, that the rules for computing semantic types for verbs are well enough motivated linguistically or that the LDOCE lexicographers were sensitive enough to the different transformational potential of the various classes of verbs to make a rule such as our one for Object Raising viable.</Paragraph> <Paragraph position="4"> We tested the classification of verbs into semantic types using a verb list of 139 pre-classified items drawn from the lists published in Rosenbaum (1967) and Stockwell et al. (1973). Figure 16 gives the number of verbs classified under each category by these authors and the number successfully classified into the same categories by the system.</Paragraph> <Paragraph position="5"> The overall error rate of the system was 14%; however, as the table illustrates, the rules discussed above classify verbs into Subject Raising, Subject Equi and persuade v I \[TI (of); D5\] to cause to feel certain; CONVINCE: She waa not persuaded o,f the truth o.f hi~ ~ement = \[Tl(into, out o~; V3\] to cause to do something by reasoning, arguing, begging, etc.: try to persuade him to let .a go with him. l No~.~ wo.ld pers,zo~s him.</Paragraph> <Paragraph position="6"> sense-no: I argl: \[pred: persuade sense-no: 2 argl: \[ref: uther sense-no: I\] arg2: \[ref: gwen sense-no: I\] arg3: \[pred: marry sense-no: 2 argl: \[ref: gwen sense-no: i\] arg2: \[ref: cornwall sense-no: 1\]\]\]\]\]\] Figure 14 Object Equi very successfully. The two Subject Raising verbs which were not so classified by the system were come about and turn out. Come about is coded 15 in LDOCE, but is not given any word qualifier; turn out is not given any 15 code. These are clear examples of omissions on the part of the Longman lexicographers, rather than of failure of the rule. Similarly, trust is not recognised as an Object Equi verb, because its dictionary entry does not contain a V3 code; this must be an omission, given the grammaticality of (6) I trust him to do the job.</Paragraph> <Paragraph position="7"> Prefer is misclassified as Object Raising, rather than as Object Equi, because the relevant code field contains a T5 code, as well as a V3 code. The T5 code is marked as 'rare', and the occurrence of prefer with a tensed sentential complement, as opposed to with an infinitive, is certainly marginal: upset ... $ \[T1\] to cause to worry, not to be calm, etc.: ........ suppose ... 2 \[TSa,b; V3 often pasta.; X1,7,9\] to be- lieve: I suppose that's true. \] I supposed him to be a workman, but he was in/act a thief. \[ He was ~ommonly supposed (to be) looti, h ........</Paragraph> <Paragraph position="8"> help ... I \[T1; I0; V3, (eep arn~ 2\] to do part of the work (for someone); be of use to (someone in doing something); AID, ASSIST: Could ~lou help me up (the a~,o)~ I T~ a,'~ he~ps h~,. (to) ,~k, I Yo,,, o~u helps a lot. I Can I help (,~ yo,,, wo~k)~ ........</Paragraph> <Paragraph position="9"> detest ... \[T1,4\] to hate with very strong feeling: I deter people who decelse and tell lies. I dn,. ~i shootir~ and k~linC/ .........</Paragraph> <Paragraph position="10"> (7) I prefer that he come on Monday.</Paragraph> <Paragraph position="11"> (8) ?I prefer that he marries Julie.</Paragraph> <Paragraph position="12"> This example also highlights a deficiency in the LDOCE coding system since prefer occurs much more naturally with a sentential complement if it collocates with a modal such as &quot;would&quot;. This deficiency is rectified in the verb classification system employed by Jackendoff and Grimshaw (1985) in the Brandeis verb catalogue.</Paragraph> <Paragraph position="13"> The main source of error comes from the misclassification of Object Raising into Object Equi verbs. Arguably, these errors also derive mostly from errors in the dictionary, rather than a defect of the rule. 66% of the Object Raising verbs were misclassified as Object Equi verbs, because the cooccurrence of the T5 and V (2, 3, or 4) codes in the same code fields, as predicted by the Object Raising rule above, was not confirmed by LDOCE. All the 14 verbs misclassified contain V codes and 10 of these also contain T5 codes. However, the Longman lexicographers typically define two different word senses, one of which is marked (perhaps among other codes) T5 and the other with a V code. Analysis of Computational Linguistics, Volume 13, Numbers 3-4, July-December 1987 213 Bran Boguraev and Ted Briscoe Large Lexicons for Natural Language Processing acknowledge ... I \[T1,4,5 (to) to agree to the truth of; recognise the fact or existence (of): I o,C/~ie~e the h'~ of uoar theU wer~ de/rated I They ~zowlcdCcd ha~/~C/ been d~y~t0d 2 \[T1 (o); X (to be) 1,7\] to reco~ise, accept, or admit (as): He w~ acknowlod~d to be th~ b~ Ida~r. J He was aeknowlod~d am their hinter. \[ ~ admowl~d th~rn~d~ (to be) d~y~atat ........</Paragraph> <Paragraph position="14"> hear ...e I \[We6; TI; V2,4; I0\] to r~ ceive and understand (sounds) by using the ears: I mn~ hear very wall. J I heard him aa/t 8o. \[ I can hear aomeone knock/nf 2 \[Wv6; Tl,Sa\] to be told or informed: I heard that he w~, dl ~ compare HEAR ABOUT, HEAR FROM, HEAR OF ........</Paragraph> <Paragraph position="15"> Figure 17 these word senses suggests that this approach is justified in three cases, but unmotivated in five; for example, hear (1),(2) (justified) vs. acknowledge (1),(2) (unjustified) (see Figure 17). The other four cases we interpreted as unmotivated were show, suspect, know, confess and in the case of consider(2), (Figure 18) there is a clear omission of a T5 code, as demonstrated by the grammaticality of (9) I consider that it is a great honour to be here. Similarly, expect is not given a V3 code under sense 1 (Figure 19), however the grammaticality of (10) I expect him to pass the exam with the relevant interpretation suggests that it should be assigned a V3 code. Alternatively, sense 5, which is assigned a V3 code, seems suspiciously similar to sense 1.</Paragraph> <Paragraph position="16"> The four verbs which are misclassified as Object Equi and which do not have T5 codes anywhere in their entries are elect, love, represent and require. None of these verbs take sentential complements and therefore they appear to be counterexamples to our Object Raising rule. In addition, Moulin et al. (1985) note that our Object Raising rule would assign mean to this category incorrectly. Mean is assigned both a V3 and a T5 category in the code field associated with sense 2 (i.e. &quot;intend&quot;), however, when it is used in this sense it must be treated as an Object Equi verb.</Paragraph> <Paragraph position="17"> This small experiment demonstrates a number of points. Firstly, it seems reasonable to conclude that the assignment of individual codes to verbs is on the whole relatively accurate in LDOCE. Of the 139 verbs tested, we only found code omissions in 10 cases. Secondly though, when we consider the interaction between the assignments of codes and word sense classification, LDOCE appears less reliable. This is the primary source of error in the case of the Object Raising rule. Thirdly, it seems clear that the Object Raising rule is straining the limits of what can be reliably extracted from the LDOCE coding system. Ideally, to distinguish between raising and equi verbs, a number of syntactic criteria should be employed (Perlmutter and Soames, 1979:460ff.). However, only two of these criteria are explicit in the coding system.</Paragraph> <Paragraph position="18"> On the basis of the results obtained, we explored the possibility of modifying the Object Raising rule to take account of the cooccurrence of T5 and T5a codes and V or X codes within a homograph, rather than within a word sense. An exhaustive search of the dictionary produced 24 verbs coded in this fashion. Ten of these were listed as Object Raising verbs in the published lists used in the above experiment. Five more verbs were classified as Equi in the published lists. Of the remaining nine verbs which did not appear in the published lists, three were clearly Object Raising, one was clearly Equi, a further two were probably Object Raising, and the last three were very difficult to classify. This demonstrates that modifying the Object Raising rule in this fashion would result in the misclassification of some Equi verbs. In fact, the list is sufficiently small that this set of verbs is probably best coded by hand.</Paragraph> <Paragraph position="19"> As a final test, we ran the rules for determining the semantic type of verbs over all the 7,965 verb entries in LDOCE. There are 719 verb senses which are coded in the dictionary as having the potential for predicate complementation. Of these 5 were classified as Subject Raising, 53 as Object Raising, 377 as Subject Equi, and 326 as Object Equi by our rules. 42 of the Equi verbs are ambiguous between Subject and Object Equi under the same sense; in the transformation program this ambiguity is resolved by selecting the type appropriate for each individual code. For example, a code which translates consider ... 2 \[WvS, X (to be) 1,7; V3\] to regard as; think of in a stated way: I C/on~der ~ a 1o0/(= I regard you as a fool). \[ Icou'dor/t a~honour tobe~ ~UoutodoU. \[He e.id ~ co~ me (to beJ ~o ~ to bB a ~ wor~. \[ T~ 5~tl~ Ida~ are ~ltV oo~dc~d a part o! Bootb~ ......... expect ... l\[T3,5a,b\] to think (that something will happen): 1 ~ (t~t) heql p~s the ezra/nut/on. I He ~ to/d the ~rdnatio~ \[ &quot;Wdl she oome wonf&quot; &quot;/~p~ *o. * ........ S \[V3\] to believe, hope and think (that someone will do something): The o~C/er ezl~cfcd h~ ram to do thdr duty in the C/om/ng ba~/s .......</Paragraph> <Paragraph position="20"> of verbs together with the relevant LDOCE sense number are listed in the appendix. An exhaustive analysis of the 54 verbs classified as Object Raising revealed two further errors of inclusion in this set; order(6) should be Object Equi and promise(l) should be Subject Equi. The 42 verbs which the transformation program treats as ambiguous Equi verbs appear to be somewhat heterogeneous; some, such as want(1) and ask(2), are cases of 'Super-Equi' control verbs where the control relations are determined contextually, whilst others, particularly the phrasal verbs, appear to be better classified as Object Raising. Allow(l) and permit(l) appear here incorrectly because they are coded \[T4\] to capture examples such as (11) They do not allow/permit smoking in their house.</Paragraph> <Paragraph position="21"> In this example the subject of the progressive complement is not controlled by the matrix subject. Again, since the list is small, this set of verbs should probably be coded by hand.</Paragraph> </Section> class="xml-element"></Paper>