XML Viewer - w03-1610

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1610_metho.xml
Size: 11,262 bytes
Last Modified: 2025-10-06 14:08:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1610">
  <Title>Optimizing Synonym Extraction Using Monolingual and Bilingual Resources</Title>
  <Section position="3" start_page="0" end_page="213" type="metho">
    <SectionTitle>
2 Our Approach
</SectionTitle>
    <Paragraph position="0"> Instead of using only one kind of resource to extract synonyms, we combine both monolingual and bilingual resources for synonym extraction.</Paragraph>
    <Paragraph position="1"> The resources include a monolingual dictionary, an English-Chinese bilingual corpus, and a large corpus of monolingual documents. Before combining them, we first propose three methods to extract synonyms from the three resources. Especially, a novel method is proposed to increase the coverage of the extracted synonyms using the bilingual corpus. Next, we develop an ensemble method to combine the individual extractors. The advantage of our approach is that it can combine the merits of the individual extractors to improve the precision and recalls of the extracted synonyms. null</Paragraph>
    <Section position="1" start_page="0" end_page="21" type="sub_section">
      <SectionTitle>
2.1 Synonym Extraction with a Monolin-
gual Dictionary
</SectionTitle>
      <Paragraph position="0"> This section proposes a method to extract synonyms from a monolingual dictionary. In a mono-lingual dictionary, each entry is defined by other words and may also be used in the definitions for other words. For a word in the dictionary, the words used to define it are called hubs and the words whose definitions include this word are called authorities as in (Blondel and Sennelart, 2002). We use the hubs and authorities of a word to represent its meaning. The assumption behind this method is that two words are similar if they have common hubs and authorities. In this paper, we only use content words as members of hubs and authorities.</Paragraph>
      <Paragraph position="1"> We take these hubs and authorities as features of a word. The vector constructed with them is referred to as the feature vector of a word. The similarity between two words is calculated through their feature vectors with the cosine measure as shown in Equation (1).</Paragraph>
      <Paragraph position="3"/>
      <Paragraph position="5"> Fi is the feature vector of wi; 1=ijv if word ijw is a hub or an authority of the word wi; else, 0=ijv ;</Paragraph>
    </Section>
    <Section position="2" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
2.2 Synonym Extraction with a Bilingual
Corpus
</SectionTitle>
      <Paragraph position="0"> This section proposes a novel method to extract synonyms from a bilingual corpus. It uses the translations of a word to express its meaning. The assumption of this method is that two words are synonymous if their translations are similar.</Paragraph>
      <Paragraph position="1"> Given an English word, we get their translations with an English-Chinese bilingual dictionary. Each translation is assigned a translation probability, which is trained with a bilingual English-Chinese corpus based on the result of word alignment. The aligner use the model described in (Wang et al., 2001). In order to deal with the problem of data sparseness, we conduct a simple smoothing by adding 0.5 to the counts of each translation pair as in (2).</Paragraph>
      <Paragraph position="3"> quency of the Chinese word c and the English word e in the sentence pairs.</Paragraph>
      <Paragraph position="4"> )(ecount represents the frequency of the English word e occurring in the bilingual corpus.</Paragraph>
      <Paragraph position="5"> |_ |ctrans represents the number of Chinese translations for a given English word e.</Paragraph>
      <Paragraph position="6"> The translations and the translation probabilities of a word are used to construct its feature vector. The similarity of two words is estimated through their feature vectors with the cosine measure as shown in (3).</Paragraph>
      <Paragraph position="8"/>
      <Paragraph position="10"> Fi is the feature vector of wi; ijc is the j th Chinese translation of the word w i; ijp is the translation probability of the word wi is translated into ijc For example, the feature vectors of two words  &amp;quot; abandon&amp;quot; and &amp;quot; forsake&amp;quot; are: forsake: &lt; ( , 0.1333), ( o , 0.1333), ( , 0.0667) (F+ , 0.0667), (/ , 0.0667), ...&gt; abandon: &lt;( , 0.3018), ( o , 0.1126), (F+ , 0.0405), (7AE , 0.0225), ( o , 0.0135),...&gt;</Paragraph>
    </Section>
    <Section position="3" start_page="21" end_page="213" type="sub_section">
      <SectionTitle>
2.3 Synonym Extraction with a Monolin-
gual Corpus
</SectionTitle>
      <Paragraph position="0"> The context method described in Section 1 is also used for synonym extraction from large mono-lingual corpora of documents. This method relies on the assumption that synonymous words tend to have similar contexts. In this paper, we use the words which have dependency relationships with the investigated word as contexts. The contexts are obtained by parsing the monolingual documents.</Paragraph>
      <Paragraph position="1"> The parsing results are represented by dependency triples which are denoted as &lt;w1, Relation Type, w2&gt;. For example, the sentence &amp;quot; I declined the invitation&amp;quot; is transformed into three triples after parsing: &lt;decline, SUBJ, I&gt;, &lt;decline, OBJ, invitation&gt; and &lt;invitation, DET, the&gt;. If we name &lt;Relation Type, w2&gt; as an attribute of the word w1, the verb &amp;quot; decline&amp;quot; in the above sentence has two attributes &lt;OBJ, invitation&gt; and &lt;SUBJ, I&gt; . Thus, the contexts of a word can be expressed using its attributes. In this case, two words are synonymous if they have similar attributes.</Paragraph>
      <Paragraph position="2"> We use a weighted version of the Dice measure to calculate the similarity of two words.</Paragraph>
      <Paragraph position="4"> where kji attattatt , , stands for attributes of words. ),( ji attwW indicates the association strength between the attribute attj with the word iw . )( iwA denotes the attribute set of the word iw . The measure used to measure the association strength between a word and its attributes is weighted mutual information (WMI) (Fung and Mckeown, 1997) as described in (5).</Paragraph>
      <Paragraph position="6"> ),(*, wrcount : frequency of the triples having dependency relation r with the word w.</Paragraph>
      <Paragraph position="7"> ,*,*)( iwcount : frequency of the triples including word iw .</Paragraph>
      <Paragraph position="8">  N: number of triples in the corpus. We use it instead of point-wise mutual information in Lin (1998) because the latter tends to overestimate the association between two parts with low frequencies. Weighted mutual information meliorates this effect by adding ),( ji attwp .</Paragraph>
    </Section>
    <Section position="4" start_page="213" end_page="213" type="sub_section">
      <SectionTitle>
2.4 Combining the Three Extractors
</SectionTitle>
      <Paragraph position="0"> In terms of combining the outputs of the different methods, the ensemble method is a good candidate.</Paragraph>
      <Paragraph position="1"> Originally, the ensemble method is a machine learning technique of combining the outputs of several classifiers to improve the classification performance (Dietterich, 2000). It has been successfully used in many NLP tasks. For example, (Curran, 2002) proved that the ensembles of individual extractors using different contexts in the monolingual corpus improve the performance of synonym extraction.</Paragraph>
      <Paragraph position="2"> In fact, we can consider the extractors in the previous sections as binary classifiers. Thus, we use the ensemble method to combine the output of the individual extractors described in the previous sections for synonym extraction. The method is described in Equation (6).</Paragraph>
      <Paragraph position="4"> similarity measure using different resources described in the previous sections.</Paragraph>
      <Paragraph position="6"> == ii aia is the weight for the individual extractors.</Paragraph>
      <Paragraph position="7"> The reasons that we use the weighted ensemble method are as follows: (1) If the majority of three extractors select the same word as a synonym of a investigated word, it tend to be a real synonym. This method can ensure it has a high similarity score. Thus, it will improve the precision of the extracted synonyms. (2) With this method, it can improve the coverage of the extracted synonyms.</Paragraph>
      <Paragraph position="8"> This is because if the similarity score of a candidate with the investigated word is higher than a threshold, our method can select the candidate as a synonym even though it is only suggested by one extractor.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="213" end_page="213" type="metho">
    <SectionTitle>
3 Implementation of Individual
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="213" end_page="213" type="sub_section">
      <SectionTitle>
Extractors
</SectionTitle>
      <Paragraph position="0"> For the extractor employing a monolingual dictionary, we use the same online dictionary as in (Blondel and Sennelart, 2002), which is named the Online Plain Text Dictionary. The dictionary consists of 27 HTML files, which is available from the web site http://www.gutenberg.net/. With the method described in Section 2.1, the result for the extracted synonyms is shown in Table 1 when the similarity threshold is set to 0.04. An example is shown as follows: acclimatize: (acclimate, 0.1481; habituate, 0.0976) The numbers in the example are the similarity scores of two words.</Paragraph>
      <Paragraph position="1">  For synonym extraction from the bilingual corpus, we use an English-Chinese lexicon, which includes 219,404 English words with each source word having 3 translations on average. The word translation probabilities are estimated from a bi-lingual corpus that obtains 170,025 pairs of Chinese-English sentences, including about 2.1 million English words and about 2.5 million Chinese words. With the method described in Section 2.2, we extracted synonyms as shown in Table 2 when the similarity threshold is set to 0.04.</Paragraph>
      <Paragraph position="2">  For synonym extraction from a monolingual corpus, we use the Wall Street Journal from 1987 to 1992, the size of which is about 500M bytes. In order to get contexts of words, we parse the corpus with an English parser -NLPWIN 1 . From the parsing results, we extracted the following four types of dependency triples.</Paragraph>
      <Paragraph position="3">  (a) &lt;verb, OBJ, noun&gt; (b) &lt;verb, SUBJ, noun&gt; (c) &lt;noun, ATTRIB, adjective&gt; (d) &lt;verb, MODS, adjunct&gt;  The statistics are shown in Table 3. Token means the total number of triples in the triple set and type means a unique instance of triple in the corpus. These triples are used as contexts of words to calculate the similarity between words as described in Section 2.3. The result is shown in Table 4 when the similarity threshold is set to 0.1. 1 The NLPWIN parser is developed at Microsoft Research. Its output can be a phrase structure parse tree or a logical form which is represented with dependency triples.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML