File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2132_evalu.xml
Size: 2,845 bytes
Last Modified: 2025-10-06 14:00:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2132"> <Title>A Multi-Neuro Tagger Using Variable Lengths of Contexts</Title> <Section position="6" start_page="804" end_page="805" type="evalu"> <SectionTitle> 5 Experimental Results </SectionTitle> <Paragraph position="0"> The Thai corpus used in the computer experiments contains 10,452 sentences that are randomly divided into two sets: one with 8,322 sentences for training and another with 2,130 sentences for testing. The training and testing sets contain, respectively, 22,311 and 6,717 ambiguous words that serve as more than one POS and were used for training and testing.</Paragraph> <Paragraph position="1"> Because there are 47 types of POSs in Thai (Charoenporn et al., 1997), n in (6), (10), and (14) was set at 47. The single neuro-taggers are 3-layer neural networks whose input length, l(IPT) (=l+ l+r), is set to 3-7 and whose size is p x 2 a x n, where p = n x I(IPT). The multi-neuro tagger is constructed by five (i.e., rn = .5) single-neuro taggers, SNTi (i = 1,...,.5), in which l(IPTi) = 2 + i.</Paragraph> <Paragraph position="2"> Table 1 shows that no matter whether the information gain (IG) was used or not, the multi-neuro tagger has a correct rate of over 94%, which is higher than that of any of the single-neuro taggers. This indicates that by using the multi-neuro tagger the length of the context need not be chosen empirically; it can be selected dynamically instead. If we focus on the single-neuro taggers with inputs greater than four, we can see that the taggers with information gain are superior to those without information gain. Note that the correct rates shown in the table were obtained when only counting the ambiguous words in the testing set. The correct rate of the multi-neuro tagger is 98.9% if all the words in the testing set (the ratio of ambiguous words was 0.19) are counted. Moreover, although the overall performance is not improved much by adopting the information gains, the training can be greatly speeded up. It takes 1024 steps to train the first tagger, SNT1, when the information gains are not used and only 664 steps to train the same tagger when the information gains are used.</Paragraph> <Paragraph position="3"> Figure 4 shows learning (training) curves in different cases for the single-neuro tagger with six input elements. Thick line shows the case in which the tagger is trained by using trained weights of the tagger with five input elements as initial values. The thin line shows the case in which the tagger is trained independently. The dashed line shows the case in which the tagger is trained independently and does not use the information gain. From this figure, we know that the training time can be greatly reduced by using the previous result and the information gain.</Paragraph> </Section> class="xml-element"></Paper>