File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-4004_metho.xml
Size: 8,428 bytes
Last Modified: 2025-10-06 14:10:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-4004"> <Title>Valido: a Visual Tool for Validating Sense Annotations</Title> <Section position="4" start_page="0" end_page="13" type="metho"> <SectionTitle> 2 Semantic Interconnections </SectionTitle> <Paragraph position="0"> Semantic graphs are a notation developed to represent knowledge explicitly as a set of conceptual entities and their interrelationships. Fields like the analysis of the lexical text cohesion (Morris and Hirst, 1991), word sense disambiguation (Agirre and Rigau, 1996; Mihalcea and Moldovan, 2001), ontology learning (Navigli and Velardi, 2005), etc.</Paragraph> <Paragraph position="1"> have certainly benefited from the availability of wide-coverage computational lexicons like Word-Net (Fellbaum, 1998), as well as semantically annotated corpora like SemCor (Miller et al., 1993).</Paragraph> <Paragraph position="2"> Recently, a knowledge-based algorithm for Word Sense Disambiguation, called Structural Semantic Interconnections1 (SSI) (Navigli and Velardi, 2004), has been shown to provide interesting insights into the choice of word senses by providing structural justifications in terms of semantic graphs.</Paragraph> <Paragraph position="3"> SSI exploits an extensive lexical knowledge base, built upon the WordNet lexicon and enriched with collocation information representing seman- null tic relatedness between sense pairs. Collocations are acquired from existing resources (like the Oxford Collocations, the Longman Language Activator, collocation web sites, etc.). Each collocation is mapped to the WordNet sense inventory in a semi-automatic manner and transformed into a relatedness edge (Navigli and Velardi, 2005).</Paragraph> <Paragraph position="4"> Given a word context C = fw1;:::;wkg, SSI builds a graph G = (V;E) such that V =</Paragraph> <Paragraph position="6"> SensesWN(wi) and (s;s0) 2 E if there is at least one semantic interconnection between s and s0 in the lexical knowledge base. A semantic interconnection pattern is a relevant sequence of edges selected according to a manually-created context-free grammar, i.e. a path connecting a pair of word senses, possibly including a number of intermediate concepts. The grammar consists of a small number of rules, inspired by the notion of lexical chains (Morris and Hirst, 1991). An excerpt of the context-free grammar encoding semantic interconnection patterns for the WordNet lexicon is reported in Table 1. For the full set of interconnections the reader can refer to Navigli and Velardi (2004).</Paragraph> <Paragraph position="7"> SSI performs disambiguation in an iterative fashion, by maintaining a set C of senses as a semantic context. Initially, C = V (the entire set of senses of words in C). At each step, for each sense s in C, the algorithm calculates a score of the degree of connectivity between s and the other senses in C:</Paragraph> <Paragraph position="9"> where IC(s;s0) is the set of interconnections between senses s and s0. The contribution of a single interconnection is given by the reciprocal of its length, calculated as the number of edges connecting its ends. The overall degree of connectivity is then normalized by the number of contributing interconnections. The highest ranking sense s of word w is chosen and the senses of w are removed from the semantic context C. The algorithm terminates when either C = ; or there is no sense such that its score exceeds a fixed threshold.</Paragraph> </Section> <Section position="5" start_page="13" end_page="14" type="metho"> <SectionTitle> 3 The Tool: Valido </SectionTitle> <Paragraph position="0"> Based on SSI, we developed a visual tool, Valido2, to visually support the validator in the difficult task</Paragraph> <Paragraph position="2"> for the recognition of semantic interconnections.</Paragraph> <Paragraph position="3"> of assessing the quality and suitability of sense annotations. The tool takes as input a corpus of documents whose sentences were previously tagged by one or more annotators with word senses from the WordNet inventory. The corpus can be input in xml format, as specified in the initial page.</Paragraph> <Paragraph position="4"> The user can browse the sentences, and adjudicate a choice over the others in case of disagreement among the annotators. To the end of assisting the user in the validation task, the tool highlights each word in a sentence with different colors, namely: green for words having a full agreement, red for words where no agreement can be found, orange for those words on which a validation policy can be applied.</Paragraph> <Paragraph position="5"> A validation policy is a strategy for suggesting a default sense choice to the validator in case of disagreement. Initially, the validator can choose one of four validation policies to be applied to those words with disagreement on which sense to assign: null (fi) majority voting: if there exists a sense s 2 SA (the set of senses chosen by the annotators in A) such that jfa2A j a annotated w with sgjjAj , 2, s is proposed as the preferred sense for w; (fl) majority voting + SSI: the same as the previous policy, with the addition that if there exists no sense chosen by a majority of annotators, SSI is applied to w, and the sense chosen by the algorithm, if any, is proposed to the validator; ( ) SSI: the SSI algorithm is applied to w, and the chosen sense, if any, is proposed to the validator; (-) no validation: w is left untagged.</Paragraph> <Paragraph position="6"> Notice that for policies (fl) and ( ) Valido applies the SSI algorithm to w in the context of its sentence by taking into account for disambiguation only the senses in s (i.e. the set of senses chosen by the annotators). In general, given a set of words with disagreement W , SSI is applied to W using as a fixed context the agreed senses chosen for the words in nW.</Paragraph> <Paragraph position="7"> Also note that the suggestion of a sense choice, marked in orange based on the validation policy, is just a proposal and can freely modified by the validator, as explained hereafter.</Paragraph> <Paragraph position="8"> Before starting the interface, the validator can also choose whether to add a virtual annotator aSSI to the set of annotators A. This virtual annotator tags each word w 2 with the sense chosen by the application of the SSI algorithm to . As a result, the selected validation policy will be applied to the new set of annotators A0 = A[faSSIg. This is useful especially when jAj = 1 (e.g. in the automatic application of a single word sense disambiguation system), that is when validation policies are of no use.</Paragraph> <Paragraph position="9"> Figure 1 illustrates the interface of the tool: in the top pane the sentence at hand is shown, marked with colors as explained above. The main pane shows the semantic interconnections between senses for which either there is a full agreement or the chosen validation policy can be applied. When the user clicks on a word w, the left pane reports the sense inventory for w, including information about the hypernym, definition and usage for each sense of w. The validator can then click on a sense and see how the semantic graph shown in the main pane changes after the selection, possibly resulting in a different number and strength of semantic interconnection patterns supporting that sense choice. For each sense in the left pane, the annotators in A who favoured that choice are listed (for instance, in the figure annotator #1 chose sense #1 of street, while annotator #2 as well as SSI chose sense #2).</Paragraph> <Paragraph position="10"> If the validator decides that a certain word sense is more convincing based on its semantic graph, (s)he can select that sense as a final choice by clicking on the validate button on top of the left pane. In case the validator wants to validate present sense choices of all the disagreed words, (s)he can press the validate all button in the top pane. As a result, the present selection of senses will be chosen as the final configuration for the entire sentence at hand.</Paragraph> <Paragraph position="11"> In the top pane, an icon beside each disagreed word shows the validation status of the word: a question mark indicates that the disagreement has not yet been solved, while a checkmark indicates that the validator solved the disagremeent.</Paragraph> </Section> class="xml-element"></Paper>