File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1045_metho.xml

Size: 11,899 bytes

Last Modified: 2025-10-06 14:10:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1045">
  <Title>Selection of Effective Contextual Information for Automatic Synonym Acquisition</Title>
  <Section position="4" start_page="353" end_page="354" type="metho">
    <SectionTitle>
2 Contextual Information
</SectionTitle>
    <Paragraph position="0"> In this study, we focused on three kinds of contextual information: dependency between words, sentence co-occurrence, and proximity, that is, co-occurrence with other words in a window, details of which are provided the following sections.</Paragraph>
    <Section position="1" start_page="353" end_page="354" type="sub_section">
      <SectionTitle>
2.1 Dependency
</SectionTitle>
      <Paragraph position="0"> The first category of the contextual information we employed is the dependency between words in a sentence, which we suppose is most commonly used for synonym acquisition as the context of words. The dependency here includes predicate-argument structure such as subjects and objects of verbs, and modifications of nouns. As the extraction of accurate and comprehensive grammatical relations is in itself a difficult task, the so- null phisticated parser RASP Toolkit (Briscoe and Carroll, 2002) was utilized to extract this kind of word relations. RASP analyzes input sentences and provides wide variety of grammatical information such as POS tags, dependency structure, and parsed trees as output, among which we paid attention to dependency structure called grammatical relations (GRs) (Briscoe et al., 2002).</Paragraph>
      <Paragraph position="1"> GRs represent relationship among two or more words and are specified by the labels, which construct the hierarchy shown in Figure 1. In this hierarchy, the upper levels correspond to more general relations whereas the lower levels to more specific ones. Although the most general relationship in GRs is &amp;quot;dependent&amp;quot;, more specific labels are assigned whenever possible. The representation of the contextual information using GRs is as follows. Take the following sentence for example: Shipments have been relatively level since January, the Commerce Department noted.</Paragraph>
      <Paragraph position="2"> RASP outputs the extracted GRs as n-ary relations as follows:</Paragraph>
      <Paragraph position="4"/>
      <Paragraph position="6"> While most of GRs extracted by RASP are binary relations of head and dependent, there are some relations that contain additional slot or extra information regarding the relations, as shown &amp;quot;ncsubj&amp;quot; and &amp;quot;ncmod&amp;quot; in the above example. To obtain the final representation that we require for synonym acquisition, that is, the co-occurrence between words and their contexts, these relationships must be converted to binary relations, i.e., co-occurrence. We consider the concatenation of all the rest of the target word as context:  The slot for the target word is replaced by &amp;quot;*&amp;quot; in the context. Note that only the contexts for nouns are extracted because our purpose here is the automatic extraction of synonymous nouns.</Paragraph>
    </Section>
    <Section position="2" start_page="354" end_page="354" type="sub_section">
      <SectionTitle>
2.2 Sentence Co-occurrence
</SectionTitle>
      <Paragraph position="0"> As the second category of contextual information, we used the sentence co-occurrence, i.e., which sentence words appear in. Using this context is, in other words, essentially the same as featuring words with the sentences in which they occur.</Paragraph>
      <Paragraph position="1"> Treating single sentences as documents, this featuring corresponds to exploiting transposed term-document matrix in the information retrieval context, and the underlying assumption is that words that commonly appear in the similar documents or sentences are considered semantically similar.</Paragraph>
    </Section>
    <Section position="3" start_page="354" end_page="354" type="sub_section">
      <SectionTitle>
2.3 Proximity
</SectionTitle>
      <Paragraph position="0"> The third category of contextual information, proximity, utilizes tokens that appear in the vicinity of the target word in a sentence. The basic assumption here is that the more similar the distribution of proceeding and succeeding words of the target words are, the more similar meaning these two words possess, and its effectiveness has been previously shown (Macro Baroni and Sabrina Bisi, 2004). To capture the word proximity, we consider a window with a certain radius, and treat the label of the word and its position within the window as context. The contexts for the previous example sentence, when the window radius is 3, are then:  Note that the proximity includes tokens such as punctuation marks as context, because we suppose they offer useful contextual information as well.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="354" end_page="354" type="metho">
    <SectionTitle>
3 Synonym Acquisition Method
</SectionTitle>
    <Paragraph position="0"> The purpose of the current study is to investigate the impact of the contextual information selection, not the language model itself, we employed one of the most commonly used method: vector space model (VSM) and tf*idf weighting scheme. In this framework, each word is represented as a vector in a vector space, whose dimensions correspond to contexts. The elements of the vectors given by tf*idf are the co-occurrence frequencies of words and contexts, weighted by normalized idf. That is, denoting the number of distinct words and contexts as N and M, respectively,</Paragraph>
    <Paragraph position="2"> where tf(wi,cj) is the co-occurrence frequency of word wi and context cj. idf(cj) is given by</Paragraph>
    <Paragraph position="4"> where df(cj) is the number of distinct words that co-occur with context cj.</Paragraph>
    <Paragraph position="5"> Although VSM and tf*idf are naive and simple compared to other language models like LSI and PLSI, they have been shown effective enough for the purpose (Hagiwara et al., 2005). The similarity between two words are then calculated as the cosine value of two corresponding vectors.</Paragraph>
  </Section>
  <Section position="6" start_page="354" end_page="356" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> This section describes the evaluation methods we employed for automatic synonym acquisition. The evaluation is to measure how similar the obtained similarities are to the &amp;quot;true&amp;quot; similarities. We firstly prepared the reference similarities from the existing thesaurus WordNet as described in Section 4.1,  and by comparing the reference and obtained similarities, two evaluation measures, discrimination rate and correlation coefficient, are calculated automatically as described in Sections 4.2 and 4.3.</Paragraph>
    <Section position="1" start_page="355" end_page="355" type="sub_section">
      <SectionTitle>
4.1 Reference similarity calculation using
WordNet
</SectionTitle>
      <Paragraph position="0"> As the basis for automatic evaluation methods, the reference similarity, which is the answer value that similarity of a certain pair of words &amp;quot;should take,&amp;quot; is required. We obtained the reference similarity using the calculation based on thesaurus tree structure (Nagao, 1996). This calculation method requires no other resources such as corpus, thus it is simple to implement and widely used.</Paragraph>
      <Paragraph position="1"> The similarity between word sense wi and word sense vj is obtained using tree structure as follows.</Paragraph>
      <Paragraph position="2"> Let the depth1 of node wi be di, the depth of node vj be dj, and the maximum depth of the common ancestors of both nodes be ddca. The similarity between wi and vj is then calculated as</Paragraph>
      <Paragraph position="4"> which takes the value between 0.0 and 1.0.</Paragraph>
      <Paragraph position="5"> Figure 2 shows the example of calculating the similarity between the word senses &amp;quot;hill&amp;quot; and &amp;quot;coast.&amp;quot; The number on the side of each word sense represents the word's depth. From this tree structure, the similarity is obtained:</Paragraph>
      <Paragraph position="7"> The similarity between word w with senses w1,...,wn and word v with senses v1,...,vm is defined as the maximum similarity between all the pairs of word senses: sim(w,v) = maxi,j sim(wi,vj), (5) whose idea came from Lin's method (Lin, 1998).</Paragraph>
    </Section>
    <Section position="2" start_page="355" end_page="356" type="sub_section">
      <SectionTitle>
4.2 Discrimination Rate
</SectionTitle>
      <Paragraph position="0"> The following two sections describe two evaluation measures based on the reference similarity.</Paragraph>
      <Paragraph position="1"> The first one is discrimination rate (DR). DR, originally proposed by Kojima et al. (2004), is the rate  (percentage) of pairs (w1,w2) whose degree of association between two words w1,w2 is successfully discriminated by the similarity derived by the method under evaluation. Kojima et al. dealt with three-level discrimination of a pair of words, that is, highly related (synonyms or nearly synonymous), moderately related (a certain degree of association), and unrelated (irrelevant). However, we omitted the moderately related level and limited the discrimination to two-level: high or none, because of the difficulty of preparing a test set that consists of moderately related pairs.</Paragraph>
      <Paragraph position="2"> The calculation of DR follows these steps: first, two test sets, one of which consists of highly related word pairs and the other of unrelated ones, are prepared, as shown in Figure 3. The similarity between w1 and w2 is then calculated for each pair (w1,w2) in both test sets via the method under evaluation, and the pair is labeled highly related when similarity exceeds a given threshold t and unrelated when the similarity is lower than t.</Paragraph>
      <Paragraph position="3"> The number of pairs labeled highly related in the highly related test set and unrelated in the unrelated test set are denoted na and nb, respectively.</Paragraph>
      <Paragraph position="5"> where Na and Nb are the numbers of pairs in highly related and unrelated test sets, respectively.</Paragraph>
      <Paragraph position="6"> Since DR changes depending on threshold t, maximum value is adopted by varying t.</Paragraph>
      <Paragraph position="7"> We used the reference similarity to create these two test sets. Firstly, Np = 100,000 pairs of words are randomly created using the target vocabulary set for synonym acquisition. Proper nouns are omitted from the choice here because of their high ambiguity. The two testsets are then created extracting n = 2,000 most related (with high reference similarity) and unrelated (with low reference similarity) pairs.</Paragraph>
    </Section>
    <Section position="3" start_page="356" end_page="356" type="sub_section">
      <SectionTitle>
4.3 Correlation coefficient
</SectionTitle>
      <Paragraph position="0"> The second evaluation measure is correlation co-efficient (CC) between the obtained similarity and the reference similarity. The higher CC value is, the more similar the obtained similarities are to WordNet, thus more accurate the synonym acquisition result is.</Paragraph>
      <Paragraph position="1"> The value of CC is calculated as follows. Let the set of the sample pairs be Ps, the sequence of the reference similarities calculated for the pairs in Ps be r = (r1,r2,...,rn), the corresponding sequence of the target similarity to be evaluated</Paragraph>
      <Paragraph position="3"> where -r,-s,sr, and ss represent the average of r and s and the standard deviation of r and s, respectively. The set of the sample pairs Ps is created in a similar way to the preparation of highly related test set used in DR calculation, except that we employed Np = 4,000,n = 2,000 to avoid extreme nonuniformity.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML