File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1601_evalu.xml
Size: 4,716 bytes
Last Modified: 2025-10-06 13:59:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1601"> <Title>Unsupervised Discovery of a Statistical Verb Lexicon</Title> <Section position="8" start_page="6" end_page="7" type="evalu"> <SectionTitle> 6 Results </SectionTitle> <Paragraph position="0"> The semantic role labeling results are summarized in Table 3. Our performance on the identification task is high precision but low recall, as one would expect from a rule-based system. The recall errors stem from constituents which are considered to fill roles by PropBank, but which are not identified as dependents by the extraction rules (such as those external to the verb phrase). The precision errors stem from dependents which are found by the rules, but are not marked by PropBank (such as the expletive &quot;it&quot;).</Paragraph> <Paragraph position="1"> In the classification task, we compare our system to an informed baseline, which is computed by labeling each dependent with a role that is a deterministic function of its syntactic relation. The syntactic relation subj is assumed to be ARG0, and the syntactic relations np#1, cl#1, xcl#1, and acomp#1 are mapped to role ARG1, and all other dependents are mapped to ARGM.</Paragraph> <Paragraph position="2"> Our best system, trained with 1000 verb instances per verb type (where available), gets an F1 of 0.897 on the coarse roles classification task on To conserve space, ARG0 is abbreviated as 0, and prep to is abbreviated as to.</Paragraph> <Paragraph position="3"> the test set (or 0.783 on the combined identification and classification task), compared with an F1 of 0.856 for the baseline (or 0.747 on the combined task), thus reducing 28.5% of the relative error. Similarly, this system reduces 35% of the error on the coarse roles task on development set.</Paragraph> <Paragraph position="4"> To get a better sense of what is and is not being learned by the model, we compare the performance of individual verbs in both the baseline system and our best learned system. For this analysis, we have restricted focus to verbs for which there are at least 10 evaluation examples, to yield a reliable estimate of performance. Of these, 27 verbs have increased F1 measure, 17 are unchanged, and 8 verbs have decreased F1. We show learned linkings for the 5 verbs which are most and least improved in Tables 4 and 5.</Paragraph> <Paragraph position="5"> The improvement in the verb give comes from the model's learning the ditransitive alternation.</Paragraph> <Paragraph position="6"> The improvements in work, pay, and look stem from the model's recognition that the oblique dependents are generated by a core semantic role.</Paragraph> <Paragraph position="7"> Unfortunately, in some cases it lumps different roles together, so the gains are not as large as they could be. The reason for this conservatism is the relatively high level of smoothing in the word distribution relative to the linking distribution. These smoothing parameters, set to optimize performance on the development set, prevent errors of spurious role formation on other verbs.</Paragraph> <Paragraph position="8"> The improvement in the verb rise stems from the model correctly assigning separate roles each for the amount risen, the source, and the destination.</Paragraph> <Paragraph position="9"> To conserve space, ARG0 is abbreviated as 0, and prep to is abbreviated as to.</Paragraph> <Paragraph position="10"> The poor performance on the verb close stems from its idiosyncratic usage in the WSJ corpus; a typical use is In national trading, SFE shares closed yesterday at 31.25 cents a share, up 6.25 cents (wsj 0229). Our unsupervised system finds that the best explanation of this frequent use pattern is to give special roles to the temporal (yesterday), locative (at 31.25 cents), and manner (in trading) modifiers, none of which are recognized as roles by PropBank. The decrease in performance on leave stems from its inability to distinguish between its two common senses (left Mary with the gift vs. left Mary alone), and the fact that PropBank tags Mary as ARG1 in the first instance, but ARG2 (beneficiary) in the second. The errors in make and help result from the fact that in a phrase like make them unhappy the Penn Tree-bank chooses to wrap them unhappy in a single S, so that our rules show only a single dependent following the verb: a complement clause (cl#1) with head word unhappy. Unfortunately, our system calls this clause ARG1 (omplement clauses following the verb are usually ARG1), but Prop-Bank calls it ARG2. The errors in the verb follow also stem from a sense confusion: the second followed the first vs. he followed the principles.</Paragraph> </Section> class="xml-element"></Paper>