File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-2029_metho.xml
Size: 5,920 bytes
Last Modified: 2025-10-06 14:08:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-2029"> <Title>Word Sense Disambiguation Using Pairwise Alignment</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Our Method </SectionTitle> <Paragraph position="0"> Our method has the features on an association and a selectional restriction approach both. It can be applied with the various sentence types because our method can treat a local (direct) and a whole sentence dependency. Our method is based on the following steps; Step 1. Parse the input sentence with syntactic parser2, and find all paths from root to leaves in the resulting dependency tree.</Paragraph> <Paragraph position="1"> Step 2. Compare the paths from Step 1. with prototype paths prepared for each sense of the target word.</Paragraph> <Paragraph position="2"> Step 3. Find a summation of similarity between each prototype and input path for each sense.</Paragraph> <Paragraph position="3"> Step 4. Select the sense with the maximum value of the summation.</Paragraph> <Paragraph position="4"> We describe our method in detail in the followings. In our method, we consider paths from root to leaves in a dependency tree. For example, consider the sentence &quot;we consider a path in a graph&quot;. This sentence has three leaves in the dependency structure, and consequently has three paths from root to leaves; (consider, SUB, we), (consider, OBJ, path, a) and (consider, OBJ, path, in, graph, a). &quot;SUB&quot; and &quot;OBJ&quot; in the paths are the elements added automatically using some rules in order to make a remarkable difference between verb-subject and verbobject. We think this sequence structure of word would serve as a clue to WSD very well, and we regard a set of the sequences obtained from an input sentence as the context of a target word.</Paragraph> <Paragraph position="5"> The general intuition for WSD is that words with similar context have the same sense (Charniak, 1993; Lin, 1997). That is, once we prepare the prototype sequences for each sense, we can determine the sense of the target word as one with the most similar prototype set. We measure a similarity between a set of prototype sequences T and a set of sequences from input sentence Ta0 . Let T and Ta0 have a set of sequences, PT a1a3a2 p1</Paragraph> <Paragraph position="7"> here. (See section 4) fire: go off or discharge fire, SUB, person fire, OBJ, [weapon, rocket] fire, [on, upon, at], physical object fire, *, load, [into, with], weapon fire, *, set up, OBJ, weapon fire: terminate the employment fire, SUB, company fire, OBJ, [person, people, staff] fire, from, organization fire, *, hire fire, *, job</Paragraph> <Paragraph position="9"> respectively. pi and pa0j are sequences of words. We define the similarity between</Paragraph> <Paragraph position="11"> pa0ja12 is an alignment score between the sequences pi and pa0j, defined in the next section. fi is a weight function characteristic of the sequence pi, defined as following:</Paragraph> <Paragraph position="13"> (2) where ui and vi are arbitrary constants and ti is arbitrary threshold.</Paragraph> <Paragraph position="14"> Using equation (1), we can estimate a similarity between the context of a target word and prototype context, and can determine the sense of a target word by selecting the prototype with the maximum similarity. null An example of the prototype sequences for verb &quot;fire&quot; is shown in Figure 1. A prototype sequence is represented like a regular expression. For the present, we obtain the sequence by hand. The basic policy to obtain prototypes is to observe the common features on dependency trees in which target word is used in the same sense. We have some ideas about a method to obtain prototypes automatically.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Pairwise Alignment </SectionTitle> <Paragraph position="0"> We attempt to apply the method of pairwise alignment to measuring the similarity between sequences.</Paragraph> <Paragraph position="1"> Recently, the technique of pairwise alignment is</Paragraph> <Paragraph position="3"> used generally in molecular biology research as a basic method to measure the similarity between proteins or DNA sequences (Mitaku and Kanehisa, 1995).</Paragraph> <Paragraph position="4"> There have been several ways to find the pairwise alignment, such as the method based on Dynamic Programming, one based on Finite State Automaton, and so on (Durbinet al., 1998). In our method, we apply the method using DP matrix, as in Figure 2. We have shown the pairwise alignment between sequences p a1 (worked, at, composition, the) and pa0 a1 (is, make, at, home) as an example.</Paragraph> <Paragraph position="5"> In a matrix, a vertical and horizontal transition means a gap and is assigned a gap score. A diagonal transition means a substitution and is assigned a score based on the similarity between two words corresponding to that point in the matrix. Actually, the following value is calculated in each node, using values which have been calculated in its three previous nodes.</Paragraph> <Paragraph position="7"> spectively to substitute wa0j and wi with a gap (-), and return the gap score. substa11 wi sa0l on WordNet hierarchy (Miller et al., 1990). For simplicity, we define the substa11 w</Paragraph> <Paragraph position="9"> as following, based on the semantic distance (Stetina and Nagao, 1998).</Paragraph> <Paragraph position="11"> where sda11 si sa0ja12 is the semantic distance between two synsets si and sa0j. Because 0 a29 sda11 si</Paragraph> <Paragraph position="13"> wa0 a12 a29 1. The score of the substitution between identical words is 1, and one between two words with no common ancestor in the hierarchy is a27 1. We simply define the gap score as a27 1.</Paragraph> </Section> class="xml-element"></Paper>