File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0503_metho.xml

Size: 18,095 bytes

Last Modified: 2025-10-06 14:15:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0503">
  <Title>merlo(c)lettres unlge ch</Title>
  <Section position="4" start_page="16" end_page="16" type="metho">
    <SectionTitle>
3 Frequency Distributions of the
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="16" end_page="16" type="sub_section">
      <SectionTitle>
Features
</SectionTitle>
      <Paragraph position="0"> We assume that currently available large cotpoLa are a reasonable approximation to language (Pullum, 1996) Using a combined corpus of 65-mllhon words, we measured the relative frequenc) distributions of the four linguistic features (VBD/~ BN active/passive, Intransitive/transitive, causative/noncausative) over a sample of verbs from the three lextcal semantic classes</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="16" end_page="17" type="metho">
    <SectionTitle>
3 1 Materials
</SectionTitle>
    <Paragraph position="0"> ~e chose a set of 20 verbs from each class based pllmaidy on the classfficatlon of verbs m (Le~ m 1993) (see Appendl~ ~) The uneigatlves ale maanei oI motion verbs The unaccusatl~es ale ~erbs of~haage of state The object-drop verbs are unspecified object alternation verbs The ~e~bs ~ere sele~Led flora Lenin's classes based on their absolute fiequenc} Ful thermore, they do not generally sho~ ma~l~ e depaitures from the intended verb sense m the cotpu~ (Though note that there are only 19 unaccu~atlxes because ,zpped, ~hlch ~as initially counted m the unaccusatives, was then excluded from the aaal~sis as It occurred mostly in a different usage m the corpus, as a velb plus paltlcle ) Most of the vetb~ can occur m the transitive and in the passive Each ~erb presents the ~ame folm m the simple pa~t and m the past palticlple In order to smlphf~ the ~ouat- null mg procedure, we made the assumptron that counts on this single verb form would approximate the distribution of the features across all forms of the verb Most counts were performed on the tagged versron of the Brown Corpus and on the portion of the Wall Street Journal distmbuted by the ACL/DCI (years 1987, 1988, 1989), a combined corpus m excess of 65 mdhon words, with the exceptmn of causatrvlty which was counted only for the 1988 year of the WSJ, a corpus of 29 million words</Paragraph>
  </Section>
  <Section position="6" start_page="17" end_page="17" type="metho">
    <SectionTitle>
3 2 Method
</SectionTitle>
    <Paragraph position="0"> We counted the occurrences of each verb token in a transrtlve or mt~ansltr~e use (INTR), m an active or passive use (ACT), rn a past pamcrple or smaple past use (VBD), and in a causative or non-causative use (CAUS) More precrsely, features were counted as follows INTR a verb occurrence was counted as transrtlve if rmmediately followed by a nominal group, else rt was counted as mtransitrve ACT mare verbs (tagged VBD) were counted as actrve, participles (tagged V BN) counted as actrve ff the closest preceding auxiliary was have, as passive ff the closest preceding auxiliary was be VBD occurrences tagged VBD were simple past, VBN were past participle (Each of the above three counts was normalized over all occurrences of the verb, yielding a single relative frequency measure for each verb for that feature ) CAUS The causative feature was approximated by the followmg steps Frrst, for each verb, all cooccurrmg subjects and objects were extracted from a parsed corpus (Colhns, 1997) Then the proportmn of overlap between the two multrsets of nouns was calculated, meant to capture the causative alternation, ~here the subject of the mtransrtrve can occur as the object of the trans~trve Vve define overlap as the largest multiset of elements belongmg to both the subjects and the object multisets, eg {a,a,a,b}(3 {a} = {a,a,a} The proportron is the ratio between the o~erlap and the sum of the subject and object multrsets (For example, for the rumple sets above, the ratio would be 3/5 or 60 ) All ra~ and normahzed corpus data ale a~adable from the authors, and more detarl concerning data collectron can be found m (Stevenson and Merto, 1999)</Paragraph>
  </Section>
  <Section position="7" start_page="17" end_page="19" type="metho">
    <SectionTitle>
4 Experiments in Verb Classification
</SectionTitle>
    <Paragraph position="0"> The frequency drstnbutrons of the verb alternatmn features yield a vector for each verb that represents the relative frequency values for the verb on each drmensron, the set of 59 vectors constrtute the data for our machine learmng experiments Template \[verb, VBD, ACT, INTR, CADS, class\] Example \[opened, 79, 91, 31, 16, unacc\] Our goal was to determine whether automatm classfficatlon techniques could determine the class of a verb from the distributional propertms represented m this vector In related work (Stevenson and Merlo, 1999) ~e describe initial unsupervised and supervised lealnmg experiments on this data, and discuss the contllbutlon of the four different features (the frequenc.~ distributions) to accurac~ m verb classfficatlon In thzs paper, we extend the work in several ~ays Fu~t, ~e report further analysis of rephcauons of our mmal supervised learning results Next, we demonstrate srmdar performance using different training methods and learning algorithms, mdmatmg that the performance rs Independent of the particular learning approach Furthermore, these addrtronal e~penments allow us to evaluate the performance separately on each of the three verb classes Finally, based on tins evaluation, we suggest a new feature to better drstmgmsh the thematic propertms of the classes, and present experimental results showing that its use rmproves our original accuracy rate</Paragraph>
    <Section position="1" start_page="17" end_page="18" type="sub_section">
      <SectionTitle>
4.1 Initml Experiments
</SectionTitle>
      <Paragraph position="0"> Imtial experiments were carried out using a decrsron tree induction algorithm, the C5 0 system avadable from http//www rulequest corn/ (Qumlan, 1992), to automatmally create a classfficatron program flora a training set of verb vectois with known classfficatron 2 In our earhei experiments ~e ran \[0-fold cross-vahdatrons repeated 10 times hele ~e repeat the ctoss-vahdatrons 50 tmles, and the numbeis tepolted are averages over all the tuns 3 Table 1 shows the results of our experiments on the four features we counted m the corpora (x BD ACT, INTR, CAUS), as well as all three-feature subsets of those four The basehne (chance) performance m th~s task rs 33 8%, since thele are 59 ~ectors and ~The s~stem generates both declsmn trees aml rule sets for use m classfficatmn Since the d~fferencc m petformance between the t~o zs ne~er s~gmficant ~xe repoKt here Jab the results using the extracted rules The rules provide a confidence level foz each classfficatmn ~ hmh Is unavailable with the decmon tree data structure 3A 10-fold cross-vahdatmn means that the s~stem randomly d~vldes the data into 10 parts, and runs 10 t~mes on a different 90%-tralmng-data/10%-test_data spht, ymldmg an average accuracy and standard enor Th~s procedure is then repeated for 50 different random dlvlsmns of the_ data and accurac3 and standard eIror are agam averaged across the 50 runs  two most common classes--of 20 verbs each--to all cases would ymld 20 out of 59 correct, or 33 8% ) As seen m the table, classrficatmn based on the four features performs at 63 7%, or 30% over chance The true mean of the sample cross-vahdatlons lies wd, hm plus or remus two standard errors of the reported mean (dr=49, t=2 01, p&lt; 05) In all cases, the range is plus or mmus I0 or 12, yreldmg a very natrow predrcted accuracy range Furthermore, we performed t-tests comparing the results of the 50 crossvahdatmns for each of the different feature subsets All pairs were srgmficantly different (p&lt; 05) except for the results using all four features (first row m the table) and those excluding ACT (second row m the table) We conclude that all features except ACT contribute posrtlvely to classrficatmn performance, and that ACT does not degrade performance In our rephcatrons, then, we focus on all four features 4 2 Rephcatmn with Different Training and</Paragraph>
    </Section>
    <Section position="2" start_page="18" end_page="19" type="sub_section">
      <SectionTitle>
Learning Methods
</SectionTitle>
      <Paragraph position="0"> There are conceptual and practical reasons for investigating the performance of other training approaches and learning algorithms applied to our verb distribution data Conceptually, it is desrrable to know whether a particular learning algorithm or training techmque affects the level of performance Practically, drfferent methods enable us to evaluate more easily the performance of the classification method within each verb class (When we run repeated cross-validations with t keg.C5 0..system, we don't have access to the accuracy rage for each class, the system only outputs an overall mean error rate ) To preview, we find that the different training and learning methods we tried all, gave similar performance to our original results, and m addltron allowed us to evaluate the accuracy wlthrn each verb class In one set of experiments, we used the same C5 0 system, but employed a training and testing methodology that used a single hold-out case We held out a single verb vector, trained on the remaining ,58 cases, then tested the resulting classffier on the  single hold-out case, and recorded the collect and assigned classes for that verb Tius was then zepeated for each of the 59 verbs This approach ~ raids both an overall accuracy rate (when the results are averaged across all 59 trials), as well as pio~ldmg the data necessary for determining accuracy fol each verb class (because we have the classification of each verb when It is the test case) The results ale presented m Table 2 The overall accuracy IS a little less than that achieved with the 10-fold cross-validation methodology (61 0% versus 63 7%) However, we can see clearly now that the unergatlve verbs ate dassffied with much greater accuracy (75%), Mule the unaccusatwe and obJect-drop verbs are classified with much lower accuracy (57 9% and 50% respectrvely) The distributional features we have appear to be much better at dmtmgmshmg unergatwes than unaccusatlve or obJect-drop verbs To test thrs drrectly under our original tiammg assumptrons, we ran two different experiments, u~mg 10-fold cross-vahdation repeated 10 time~ The first experiment tested the abdit:~ of the classifier to distinguish between unergatlves and the other t~o verb types, wrthout having to distinguish bct~een the latter two The data included the 20 unergarive ,,erbs and a random sample of 10 unaccusatave and 10 obJect-drop verbs, 10 different random ~ampies were selected to form 10 such data sets In these data sets, the ~erbs were labeled as unergatire or &amp;quot;of;her&amp;quot; The baseline (chance) classzficatmn accuracy for this data is 50%, the mean accmac~ achmved across all data sets was 78 5% (standard ellot 0 8%), a srzable improvement o~er chance The second expeim~ent ~as intended to det, etmme ho~ well the classifier can dlstmgm~h.unaccusatl~e from object-drop verbs The data consisted of one ~et that included all the unaccusative and object-drop verbs, with no unergatives Because there ate only i9 unaceusauve verbs, the basehne accuracy late is 51% (20/39), here the classifier achieved an accuracy only slightly above chance, at 58 3% (standard elror 1 8%) These results, summarized in Table 3 clearly confirm the higher accuracy of classifying uneigatlvo verbs with the current feature set This pattern of results ~as repeated under a ~oi3  different type of learning algorithm as well We performed a set of neural network experiments, using NeuroSolutlons 3 0 (see http//www nd corn), and report here on the networks that achieve the best performance on our data These are principal components analysis and automatic feature map networks, which are essentially feed-forward perceptrons with pre-processmg units that transform the existing features rata a more useful format In our tests, both methods performed best overall when there were no hidden layer units, and the networks were trained for 1000 epochs The mean accuracy rates of 10-fold cross-validations with these parameter settings are summarized in Table 4 Again, the overall percentage accuracy is in the low sixties, with better performance on the unergattves than on the other two verb classes, the difference was particularly striking with the PCA networks This overall pattern doesn't change with further training, in fact, training up to 10,000 epochs resulted in very low accuracy (of 45%) for either unaccusatives, objectdrops, or both To summarize, following a different training approach with C5 0 (the single hold-out method), and applying very different learning approaches (two kinds of neural networks), resulted in mmllai o~erall performance to our original C5 0 results This indicates that the accurac3 achieved is at lea.st somewhat independent of specific learning or train-Ing techniques Moreover, these different methods, along with experiments directly testing unergative versus unaccusatlve/object-drop classification, allow us to examine more closely where the resulting classifters have the most serious problems In all cases, the accuracy is best for unergattves, and the accuracy of unaccusatives, object-drops, or both, is degraded If this performance is indeed a reliable mdi- null cation of the inherent dtscnmmabd~ty of tile dastnbutlonal data, then we must examine more closely the properties of the data itself to understand (and potentially improve) the performance</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="19" end_page="20" type="metho">
    <SectionTitle>
4 3 Dsscrlmmatmg Unaccusative and
ObJect-Drop Verbs
</SectionTitle>
    <Paragraph position="0"> To understand why the data discriminates unergattves reasonably well, but not unaccusatlves and object-drops, we need to directly test the discnminabilityof the features across the classes We do so by using t-tests to compare the values of the different features--VBD, ACT, INTR., CAUS--for unergattve and unaccusattve verbs, unergatlve and object-drop verbs, and unaccusatlve and object-drop verbs In each case, the t-test is giving the likelihood that the two sets of values--e g, the VBD feature values for unergatives and for unaccusatives--are dra~n from different populations Table 5 shows that all sets of features are significantly different for unergatlve and unaccusattve verbs, and for unergattve and object-drop verbs Ho~ever, only INTR. and CAUS ate slgmficantly different for unaccusattve and object-dtop verbs, indicating that we need additional featules that have different values across these two classes In Section 2 1, we noted the differing semantic role asmgnments for the verb classes, and hypothesized that these differences would affect the expression of syntactic features that ate countable in a corpus For example, the c ~bs feature approximates sen\]antic role reformation b.~ encoding the oxerlap beh~een nouns that can occur m the ~ubject and object positions of a cau~ative xetb Here x~e suggest another feature, that of ammacy of subject, that is intended to distinguish nouns that receive an Agent role flora those that receive a Theme role Recall that object-drop verbs assign Agent to their subject in both the transitive and intransitive alternations, while unaccusattves assign Agent to their subject only in the transitive, and Theme m the intransitive We expect then that object-drop verbs will occur more often with an animate subject Note again that ~e are  making use of frequency dmtnbutmns--the clatm ~s not that only Agents can be ammate, but rather that nouns that receive the Agent role will more often be ammate than nouns that receive the Theme role A problem w~th a feature hke ammacy ~s that ~t requires etther manual determmatmn of the antmacy of extracted subjects, or reference to an on-hne resource such as WordNet for determining ammacy To approximate ammacy w~th a feature that can be extracted automatically, and w~thout reference to a resource external to the corpus, we instead count pronouns (other than ~t) m subject positron The assumptmn ~s that the words I, we, you, she, he, and they most often refer to ammate ent~tms The values for the new feature, P~.O, were determined by automatmally extracting all subject/verb tuples including our 59 examples verbs (from the WSJ88 parsed corpus), and computing the ratm of occurrences of pronouns to all subjects We again apply t-tests to our new data to determine whether the sets of PRo values d~ffer across the verb classes Interestingly, we find that the Prto values for unaccusat~ve verbs (the only class to ass~gn Theme role to the sub tect m one of tts alternatmns) are s~gmficantly dtffe~ent from those for both unergatlve and object-drop verbs (p&lt; 05) Moreover, the PRo values for unergat~ve and object-drop verbs (whose subjects are Agents m bo~h alternatmns) are not s~gmficantly d~fferent Th~s pattern confirms the abd~ty of the feature to capture the thematm d~stmctmn between unaccusat~ve verbs and the other two classes Table 6 shows the result of applying C5 0 (10-fold eross-vahdatmn repeated 50 t~mes) to the three-x~ay classfficatmn task using the PRo feature m conjunctmn w~th the four previous features ~.ccuracy ranproves to over 70%, a teductmn m the error rate of almost 20% due to th~s single nex~ feature Moteover, classifying the unaccusat~ve an2 object-drop verbs using the new feature m conjunctmn w~th the prevmus four leads to accuracy of over 68% (compared to 58% w~thout PRo) We conclude that this feature ~s ~mportant in d~stmgmshlng unaccusat~ve and object-drop verbs, and hkely contributes to the tmprovement m the three-way classtficatton because of th~s Future work wdl examine the performance w~thm the verb classes of th~s new set of features to see whether accuracy has also tmproved for unergatire verbs</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML