File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1504_evalu.xml

Size: 2,370 bytes

Last Modified: 2025-10-06 13:59:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1504">
  <Title>Low-cost Named Entity Classification for Catalan: Exploiting Multilingual Resources and Unlabeled Data</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5.2 Results
</SectionTitle>
    <Paragraph position="0"> Table 4 shows accuracy by categories of the multi-lingual model XL in comparison to the best models trained only with Catalan data, already presented in section 4. As it can be seen in row XL, accuracy is increased by almost 3 points compared to supervised learning for Catalan, CA. Whereas improvement for the easiest categories (ORG and PER) is moderate, it is particularly significant for LOC and MIS, achieving improvements of 7.5 and 5.3 points, respectively. The multilingual classifier has also been evaluated with the Spanish test set (see table 1). AdaBoost supervised algorithm has been used to learn an Spanish classifier from Spanish training data, which achieves 87.1% average accuracy. Interestingly, the multilingual classifier presents just a slight reduction to 86.9%, which could be considered irrelevant, whereas performance for Catalan is boosted by almost 3 points.</Paragraph>
    <Paragraph position="1"> The two best-performing bootstrapping strategies for the case using only Catalan (CAa50a36a51a53a52a55a54 and CAa50a36a51a53a52a55a56 ) have also been applied to the multilingual classifier (XLa50a36a51a53a52a55a54 and XLa50a36a51a53a52a55a56 ). Table 3 presents the results for the first (the right-hand side of table), while figure 2 depicts the process graphically. It can be observed that both strategies consistently outperform the base-line bilingual model XL as shown in boldface figures. In this case, XLa50a64a51a40a52a40a56 , again starting from a lower accuracy point, proves more stable above the baseline. This is probably due to the fact that Catalan labelled examples introduced at iteration 0 from the unsupervised classifier do not have such big impact in a bilingual model conditioned by Spanish data  than in the CAa50a36a51a53a52a55a56 case. On the other hand, XLa50a36a51a53a52a55a54 achieves a higher peak (increasing accuracy up to 1.1 points more than multilingual baseline XL and 3.9 more than compared to model using only Catalan data, CA) before decreasing below baseline.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML