File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2911_concl.xml

Size: 7,020 bytes

Last Modified: 2025-10-06 13:55:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2911">
  <Title>Applying Alternating Structure Optimization to Word Sense Disambiguation</Title>
  <Section position="15" start_page="81" end_page="83" type="concl">
    <SectionTitle>
BE
</SectionTitle>
    <Paragraph position="0"> each of which uses one of CUC4BVBNBZBVBNCBCACV. Then, we apply ASO to these auxiliary problems using the feature split and the problem partitioning described in Section 3.2.</Paragraph>
    <Paragraph position="1"> Note that the difference between the multi-task and semi-supervised configurations is the source of information. The multi-task configuration utilizes the label information of the training examples that are labeled for the rest of the multiple tasks, and the semi-supervised learning configuration exploits a large amount of unlabeled data.</Paragraph>
    <Paragraph position="2">  3. Apply ASO to the auxiliary problems to obtain A2. 4. Using the joint linear model (2), train the final predictor by minimizing the empirical risk for fixed A2</Paragraph>
    <Section position="1" start_page="82" end_page="82" type="sub_section">
      <SectionTitle>
4.2 Data and evaluation metric
</SectionTitle>
      <Paragraph position="0"> We conduct evaluations on four Senseval-3 lexical sample tasks (English, Catalan, Italian, and Spanish) using the official training / test splits. Data statistics are shown in Figure 9. On the Spanish, Catalan, and Italian data sets, we use part-of-speech information (as features) and unlabeled examples (for semi-supervised learning) provided by the organizer.</Paragraph>
      <Paragraph position="1"> Since the English data set was not provided with these additional resources, we use an in-house POS tagger trained with the PennTree Bank corpus, and extract 100K unlabeled examples from the Reuters-RCV1 corpus. On each language, the number of unlabeled examples is 5-15 times larger than that of the labeled training examples. We use syntactic relation features only for English data set. As in Section 3, we report micro-averaged F measure.</Paragraph>
    </Section>
    <Section position="2" start_page="82" end_page="82" type="sub_section">
      <SectionTitle>
4.3 Baseline methods
</SectionTitle>
      <Paragraph position="0"> In addition to the standard single-task supervised configuration as in Section 3, we test the following method as an additional baseline.</Paragraph>
      <Paragraph position="1"> Output-based method The goal of our multi-task learning configuration is to benefit from having the labeled training examples of a number of words. An alternative to ASO for this purpose is to use directly as features the output values of classifiers trained for disambiguating the other words, which we call 'output-based method' (cf. Florian et al. (2003)).</Paragraph>
      <Paragraph position="2"> We explore several variations similarly to Section</Paragraph>
    </Section>
    <Section position="3" start_page="82" end_page="83" type="sub_section">
      <SectionTitle>
3.1 and report the ceiling performance.
4.4 Evaluation results
</SectionTitle>
      <Paragraph position="0"> Figure 10 shows F-measure results on the four Senseval-3 data sets using the official training / test splits. Both ASO multi-task learning and semi-supervised learning improve performance over the #words #train avg #sense avg #train per word per sense  data set (first row) and Senseval-3 data sets. On each data set, # of test instances is about one half of that of training instances. single-task baseline on all the data sets. The best performance is achieved when we combine multi-task learning and semi-supervised learning by using all the corresponding structure matrices A2 B4CYBND7B5  produced by both multi-task and semi-supervised learning, in the final predictors. This combined configuration outperforms the single-task supervised base-line by up to 5.7%.</Paragraph>
      <Paragraph position="1"> Performance improvements over the supervised baseline are relatively small on English and Spanish. We conjecture that this is because the supervised performance is already close to the highest performance that automatic methods could achieve. On these two languages, our (and previous) systems out-perform inter-human agreement, which is unusual but can be regarded as an indication that these tasks are difficult.</Paragraph>
      <Paragraph position="2"> The performance of the output-based method (baseline) is relatively low. This indicates that output values or proposed labels are not expressive enough to integrate information from other predictors effectively on this task. We conjecture that for this method to be effective, the problems are required to be more closely related to each other as in Florian et al. (2003)'s named entity experiments. A practical advantage of ASO multi-task learning over ASO semi-supervised learning is that shorter computation time is required to produce similar performance. On this English data set, training for multi-task learning and semi-supervised learning takes 15 minutes and 92 minutes, respectively, using a Pentium-4 3.20GHz computer. The computation time mostly depends on the amount of the data on which auxiliary predictors are learned. Since our experiments use unlabeled data 5-15 times larger than labeled training data, semi-supervised learning takes longer, accordingly.</Paragraph>
      <Paragraph position="3">  methods English Catalan Italian Spanish multi-task learning 73.8 (+0.8) 89.5 (+1.5) 63.2 (+4.9) 89.0 (+1.0) ASO semi-supervised learning 73.5 (+0.5) 88.6 (+0.6) 62.4 (+4.1) 88.9 (+0.9) multi-task+semi-supervised 74.1 (+1.1) 89.9 (+1.9) 64.0 (+5.7) 89.5 (+1.5) baselines output-based 73.0 (0.0) 88.3 (+0.3) 58.0 (-0.3) 88.2 (+0.2) single-task supervised learning 73.0 88.0 58.3 88.0 previous SVM with LSA kernel [GGS05] 73.3 89.0 61.3 88.2 systems Senseval-3 (2004) best systems 72.9 [G04] 85.2 [SGG04] 53.1 [SGG04] 84.2 [SGG04] inter-annotator agreement 67.3 93.1 89.0 85.3 Figure 10: Performance results on the Senseval-3 lexical sample test sets. Numbers in the parentheses are performance gains compared with the single-task supervised baseline (italicized). [G04] Grozea (2004); [SGG04] Strapparava et al. (2004). GGS05 combined various kernels, which includes the LSA kernel that exploits unlabeled data with global context features. Our implementation of the LSA kernel with our classifier (and our other features) also produced performance similar to that of GGS05. While the LSA kernel is closely related to a special case of the semi-supervised application of ASO (see the discussion of PCA in Ando and Zhang (2005a)), our approach here is more general in that we exploit not only unlabeled data and global context features but also the labeled examples of other target words and other types of features. G04 achieved high performance on English using regularized least squares with compensation for skewed class distributions. SGG04 is an early version of GGS05. Our methods rival or exceed these state-of-the-art systems on all the data sets.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML