File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1112_metho.xml
Size: 12,785 bytes
Last Modified: 2025-10-06 14:14:56
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1112"> <Title>by NSF award #IRI-9618797 STIMULATE: Generating Coherent Summaries of On-Line Documents: Combining</Title> <Section position="4" start_page="681" end_page="683" type="metho"> <SectionTitle> 3 Event Profile: WordNet and EVCA </SectionTitle> <Paragraph position="0"> Since our first intuition of the data suggested that articles with a preponderance of verbs of Verb Type Sample Verbs % communication say, announce .... 20% support have, get, go, ... 30% remainder abuse, claim, offer, ... 50% type from the Wall Street Journal (main and selected subordinate verbs, n = 10,295). a certain semantic type might reveal aspects of document type, we tested the hypothesis that verbs could be used as a predictor in providing an event profile. We developed two algorithms to: (1) explore WordNet (WN-Verber) to cluster related verbs and build a set of verb chains in a document, much as Morris and Hirst (1991) used Roget's Thesaurus or like Hirst and St. Onge (1998) used WordNet to build noun chains; (2) classify verbs according to a semantic classification system, in this case, using Levin's (1993) English Verb Classes and Alternations (EVCA-Yerber) as a basis. For source material, we used the manually-parsed Linguistic Data Consortium's Wall Street Journal (WSJ) corpus from which we extracted main and complement of communication verbs to test the algorithms on.</Paragraph> <Paragraph position="1"> Using WordNet. Our first technique was to use WordNet to build links between verbs and to provide a semantic profile of the document. WordNet is a general lexical resource in which words are organized into synonym sets, each representing one underlying lexical concept (Miller et al. 1990). These synonym sets - or synsets - are connected by different semantic relationships such as hypernymy (i.e. plunging is a way of descending), synonymy, antonymy, and others (see Fellbaum 1990). The determination of relatedness via taxonomic relations has a rich history (see Resnik 1993 for a review). The premise is that words with similar meanings will be located relatively close to each other in the hierarchy. Figure 1 shows the verbs cite and post, which are related via a common ancestor inform, ..., let know.</Paragraph> <Paragraph position="2"> The WN-Verber tool. We used the hypernym relationship in WordNet because of its high coverage. We counted the number of edges needed to find a common ancestor for a pair of verbs. Given the hierarchical structure of WordNet, the lower the edge count, in principle, the closer the verbs are semantically. Because WordNet in WordNet.</Paragraph> <Paragraph position="3"> allows individual words (via synsets) to be the descendent of possibly more than one ancestor, two words can often be related by more than one common ancestor via different paths, possibly with the same relationship (grandparent and grandparent, or with different relations (grandparent and uncle).</Paragraph> <Paragraph position="4"> Results from WN-Verber. We ran all articles longer than 10 sentences in the WSJ corpus (1236 articles) through WN-Verber. Output showed that several verbs - e.g. go, take, and say - participate in a very large percentage of the high frequency synsets (approximate 30%).</Paragraph> <Paragraph position="5"> This is due to the width of the verb forest in WordNet (see Fellbaum 1990); top level verb synsets tend to have a large number of descendants which are arranged in fewer generations, resulting in a flat and bushy tree structure. For example, a top level verb synset, inform, ..., give information, let know has over 40 children, whereas a similar top level noun synset, entity, only has 15 children. As a result, using fewer than two levels resulted in groupings that were too limited to aggregate verbs effectively. Thus, for our system, we allowed up to two edges to intervene between a common ancestor synset and each of the verbs' respective synsets, as in Fig-</Paragraph> <Paragraph position="7"> our system.</Paragraph> <Paragraph position="8"> In addition to the problem of the flat nature of the verb hierarchy, our results from WN-Verber are degraded by ambiguity; similar effects have been reported for nouns. Verbs with differences in high versus low frequency senses caused certain verbs to be incorrectly related; for example, have and drop are related by the synset meaning &quot;to give birth&quot; although this sense of drop is rare in WSJ.</Paragraph> <Paragraph position="9"> The results of NN-Verber in Table 2 reflect the effects of bushiness and ambiguity. The five most frequent synsets are given in column 1; column 2 shows some typical verbs which participate in the clustering; column 3 shows the type of article which tends to contain these synsets. Most articles (864/1236 = 70%) end up in the top five nodes. This illustrates the ineffectiveness of these most frequent WordNet synset to discriminate between article types.</Paragraph> <Paragraph position="10"> Act have, relate, announcements, editori(interact, act to- give, tell als, features gether, ...) Communicate give, get, in- announcements, editori(communicate, form, tell als, features, poems intercommunicate, ...) Change have, modify, poems, editorials, an(change) take nouncements, features Alter convert, announcements, poems, (alter, change) make, get editorials Inform inform, ex- announcements, poems, (inform, round on, plain, de- features</Paragraph> </Section> <Section position="5" start_page="683" end_page="684" type="metho"> <SectionTitle> ...) scribe </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="683" end_page="684" type="sub_section"> <SectionTitle> Evaluation using Kendall's Tau. We </SectionTitle> <Paragraph position="0"> sought independent confirmation to assess the correlation between two variables' rank for WN-Verber results. To evaluate the effects of one synset's frequency on another, we used Kendall's tau (r) rank order statistic (Kendall 1970). For example, was it the case that verbs under the synset act tend not to occur with verbs under the synset think? If so, do articles with this property fit a particular profile? In our results, we have information about synset frequency, where each of the 1236 articles in the corpus constitutes a sample. Table 3 shows the results of calculating Kendall's r with considerations for ranking ties, for all (10) = 45 pairing combinations of the top 10 most frequently occurring synsets. Correlations can range from -1.0 reflecting inverse correlation, to +1.0 showing direct correlation, i.e. the presence of one class increases as the presence of the correlated verb class increases. A T value of 0 would show that the two variables' values are independent of each other.</Paragraph> <Paragraph position="1"> Results show a significant positive correlation between the synsets. The range of correlation is from .850 between the communication verb synset (give, get, inform, ...) and the act verb synset (have, relate, give, ...) to .238 between the think verb synset (plan, study, give, ...) and the change state verb synset (fall, come, close, ...).</Paragraph> <Paragraph position="2"> These correlations show that frequent synsets do not behave independently of each other and thus confirm that the WordNet results are not an effective way to achieve document discrimination. Although the WordNet results were not discriminatory, we were still convinced that our initial hypothesis on the role of verbs in determining event profile was worth pursuing.</Paragraph> <Paragraph position="3"> We believe that these results are a by-product of lexical ambiguity and of the richness of the WordNet hierarchy. We thus decided to pursue a new approach to test our hypothesis, one which turned out to provide us with clearer and more robust results.</Paragraph> <Paragraph position="4"> act com chng alter infm exps thnk I judg I trnf synsets.</Paragraph> <Paragraph position="5"> Utilizing EVCA. A different approach to test the hypothesis was to use another semantic categorization method; we chose the semantic classes of Levin's EVCA as a basis for our next analysis. 3 Levin's seminal work is based on the time-honored observation that verbs which participate in similar syntactic alternations tend to share semantic properties. Thus, the behavior of a verb with respect to the expression and interpretation of its arguments can be said to be, in large part, determined by its meaning. Levin has meticulously set out a list of syntactic tests (about 100 in all), which predict membership in no less than 48 classes, each of which is divided into numerous sub-classes. The rigor and thoroughness of Levin's study permitted us to encode our algorithm, EVCA-Verber, on a sub-set 3Strictly speaking, our classification is based on EVCA. Although many of our classes are precisely defined in terms of EVCA tests, we did impose some extensions. For example, support verbs are not an EVCA category.</Paragraph> <Paragraph position="6"> of the EVCA classes, ones which were frequent in our corpus. First, we manually categorized the 100 most frequent verbs, as well as 50 additional verbs, which covers 56% of the verbs by token in the corpus. We subjected each verb to a set of strict linguistic tests, as shown in Table 4 and verified primary verb usage against the corpus.</Paragraph> <Paragraph position="7"> (1) Does this involve a transfer of ideas? (2) X verbed &quot;something.&quot; (1) *&quot;X verbed without moving&quot;.</Paragraph> <Paragraph position="8"> (1) &quot;They verbed to join forces.&quot; (2) involves more than one participant.</Paragraph> <Paragraph position="9"> (1) &quot;They verbed (over) the issue.&quot; (2) indicates conflicting views.</Paragraph> <Paragraph position="10"> (3) involves more than one participant.</Paragraph> <Paragraph position="11"> (1) X verbed Y (to happen/happened).</Paragraph> <Paragraph position="12"> (2) X brings about a change in Y.</Paragraph> <Paragraph position="13"> Results from EVCA-Verber. In order to be able to compare article types and emphasize their differences, we selected articles that had the highest percentage of a particular verb class from each of the ten verb classes; we chose five articles from each EVCA class, yielding a total of 50 articles for analysis from the full set of 1236 articles. We observed that each class discriminated between different article types as shown in Table 5. In contrast to Table 2, the article types are well discriminated by verb class. For example, a concentration of communication class verbs (say, report, announce, ... ) indicated that the article type was a general announcement of short or medium length, or a longer feature article with many opinions in the text. Articles high in motion verbs were also announcements, but differed from the communication ones, in that they were commonly postings of company earnings reaching a new high or dropping from last quarter. Agreement and argument verbs appeared in many of the same articles, involving issues of some controversy.</Paragraph> <Paragraph position="14"> However, we noted that articles with agreement verbs were a superset of the argument ones in that, in our corpus, argument verbs did not appear in articles concerning joint ventures and mergers. Articles marked by causative class verbs tended to be a bit longer, possibly reflecting prose on both the cause and effect of a particular action. We also used EVCA-Verber to investigate articles marked by the absence of members of each verb class, such as articles lacking any verbs in the motion verb class. However, we found that absence of a verb class was not discriminatory.</Paragraph> <Paragraph position="15"> Evaluation of EVCA verb classes. To strengthen the observations that articles dominated by verbs of one class reflect distinct article types, we verified that the verb classes behaved independently of each other. Correlations for EVCA classes are shown in Table 6. These show a markedly lower level of correlation between verb classes than the results for WordNet synsets, the range being from .265 between motion and aspectual verbs to -.026 for motion verbs and agreement verbs. These low values of T for pairs of verb classes reflects the independence of the classes. For example, the communication and experience verb classes are weakly correlated; this, we surmise, may be due to the different ways opinions can be expressed, i.e. as factual quotes using communication class verbs or as beliefs using experience class verbs.</Paragraph> <Paragraph position="16"> comun motion agree argue exp I aspect~ cause</Paragraph> </Section> </Section> class="xml-element"></Paper>