File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-1202_evalu.xml
Size: 5,006 bytes
Last Modified: 2025-10-06 13:58:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1202"> <Title>Sense-Tagging Chinese Corpus</Title> <Section position="5" start_page="137" end_page="137" type="evalu"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="137" end_page="137" type="sub_section"> <SectionTitle> 4.1 Test Materials </SectionTitle> <Paragraph position="0"> We sample documents of different categories from ASBC corpus, including philosophy (10%), science (10%), society (35%), art (5%), life (20%) and literary (20%). There are 35,921 words in the test corpus. Research associates tag this corpus manually. At first, they mark up the ambiguous words by looking up the Cilin dictionary. Next, they tag the unknown words.</Paragraph> <Paragraph position="1"> A list of candidates is proposed by looking up the mapping table. Because the mapping table may have errors, the annotators assign a tag &quot;none&quot; when they cannot choose a solution from the proposed candidates. Total 435 of 1,979 words are tagged with &quot;none&quot; with the more restrictive method. In contrast, only 346 words are labeled with &quot;none&quot; with the less restrictive method. The tag mapper achieves 82.52% of performance approximately.</Paragraph> </Section> <Section position="2" start_page="137" end_page="137" type="sub_section"> <SectionTitle> 4.2 Tagging Ambiguous Words </SectionTitle> <Paragraph position="0"> Table 5 shows the performance of tagging ambiguous words. MI defined in Section 3.1 is used. Total 11,101 words are tagged. The performance of tagging low, middle, and high ambiguous words are 62.60%, 31.36%, and 27.00%, respectively. Table 6 shows that the performance is improved, in particular, the classes of middle- and high- ambiguity, when EM (defined in Section 3.1) is used. The overall performance is increased from 49.55% to 52.85%.</Paragraph> <Paragraph position="1"> In the previous experiments, only one sense is reported for each word. If we report more than one sense for middle and high ambiguous words, the performance is improved. Table 7 shows that the first 2 and 3 candidates are selected. From the diagonal of this table, the performance for tagging low ambiguity (2-4), middle ambiguity (5-8) and high ambiguity (>8) is similar (i.e., 63.98%, 60.92% and 67.95%) when 1 candidate, 2 candidates, and 3 candidates are proposed, respectively. In this case, 7,034 of 11,101 words are tagged correctly. That is, the performance is 63.36%.</Paragraph> <Paragraph position="2"> In the next experiment, we adopt middle categories (i.e., 94 categories) rather than the above small categories (i.e., 1428 categories). Table 8 shows that the overall performance is improved by 11.05%. It also lists the results with the combinations of first-n and middle categories. Under the middle categories and 1-3 proposed candidates, the performance for tagging low, middle and high ambiguous words are 71.02%, 73.88%, and 75.94%, respectively.</Paragraph> <Paragraph position="3"> Total 8,033 of 11,101 words are tagged correctly. In other words, the performance is 72.36%.</Paragraph> </Section> <Section position="3" start_page="137" end_page="137" type="sub_section"> <SectionTitle> 4.3 Tagging Unknown Words </SectionTitle> <Paragraph position="0"> There are 1,979 unknown words in our test corpus. Total 1,663 words have been tagged manually. In the experiments, we consider the effects from training corpus and mapping table.</Paragraph> <Paragraph position="1"> Table 9 shows the performance. M1 and P1 employ more restrictive mapping table, while M2 and P2 adopt less restrictive mapping table.</Paragraph> <Paragraph position="2"> M1 and M2 use the training result in Section 3.1 (i.e., unambiguous words), while P1 and P2 utilize the training result in Section 3.2 (i.e., unambiguous and ambiguous words). In the baseline model, all 1428 Cilin tags are the candidates of unknown words. The performance is worse. On the average, the precision is 1.22%. M1 is the best because more restrictive mapping table reduces the possibility of mapping errors. This table also lists the perforrnanee of each category. It meets our expectation, i.e., tagging verb is harder than tagging other categories. Next we use POS to improve the performance. POS narrows down the number of candidates, so that the overall performance is enhanced from 27.13%% to 34.35%%.</Paragraph> <Paragraph position="3"> In summary, we consider the overall performance of tagging our sample data.</Paragraph> <Paragraph position="4"> Recall that there are 35,921 words in the test corpus. Except the stop words that are not tagged by the sense tagger, there remain 13,586 unambiguous words, 11,101 ambiguous words, and 1,633 unknown words for tagging. From Tables 6 and 9, we know 5,867 unambiguous words and 561 unknown words are tagged correctly. The sense tagger achieves the performance of 76.04%.</Paragraph> </Section> </Section> class="xml-element"></Paper>