File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-1107_evalu.xml
Size: 5,643 bytes
Last Modified: 2025-10-06 13:59:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1107"> <Title>Chinese Chunking with another Type of Spec</Title> <Section position="9" start_page="3" end_page="5" type="evalu"> <SectionTitle> 6 Data and Evaluation </SectionTitle> <Paragraph position="0"> The performance of chunking is commonly measured with three figures: precision (P), recall (R) and F measure that are defined in CoNLL2000.</Paragraph> <Paragraph position="1"> Besides these, we also use two other measurements to evaluate the performance of bracketing and labeling respectively: RCB(ratio of crossing brackets), that is the percentage of the found brackets which cross the correct brackets; LA(labeling accuracy), that is the percentage of the found chunks which have the correct labels.</Paragraph> <Paragraph position="2"> datain test chunks of No.</Paragraph> <Paragraph position="3"> boundarieschunk crossed chunks theof No.</Paragraph> <Paragraph position="5"> The average length (ALen) of chunks for each type is the average number of tokens in each chunk of given type. The overall average length is the average number of tokens in each chunk. To be more disinterested, outside tokens (including outside punctuations) are also concerned and each of them is counted as one chunk.</Paragraph> <Section position="1" start_page="3" end_page="5" type="sub_section"> <SectionTitle> 6.1 Chunking performance with our spec </SectionTitle> <Paragraph position="0"> Training and test was done on the PK corpus.</Paragraph> <Paragraph position="1"> Table 3 shows the detail information. We use the uni-gram of chunk POS rules as the baseline.</Paragraph> <Paragraph position="2"> close test and open test when HMM and ten folds TBL based error correction (EC) are done respectively.</Paragraph> <Paragraph position="3"> As can be seen, the performance of open test doesn't drop much. For open test, HMM achieves 6.9% F improvement, 3.4% RCB reduction on baseline; error correction gets another 2.7% F improvement, 0.3% RCB reduction. Labeling accuracy is so high even with the baseline, which indicates that the hard point of chunking is to identify the boundaries of each chunk.</Paragraph> <Paragraph position="4"> Table 5 shows the performance of each type of chunks respectively. NP and VP amount to approximately 76% of all chunks, so their chunking performance dominates the overall performance. Although we extend VP and PP, their performances are much better than overall. The performance of INDP can arrive 99% although it is much longer than other types. Because its surface evidences are clear and complete owing to its definition: the meta-data of a document, all the descriptions inside a pair of parenthesis, and also certain fixed phrases which do not act as a syntactic constituent in a sentence. From the relative lower performance of NP, but the most part of all chunks, we can conclude that the hardest issue of Chinese chunking is to identify boundaries of NPs.</Paragraph> <Paragraph position="5"> All the chunking errors could be classified into four types: wrong labeling, under-combining, overcombining and overlapping. Table 6 lists the number and percentage of each type of errors. Under-combining errors count about a half number of overall chunking errors, however it is not a problem in certain applications because they does not cross the brackets, thus there are still opportunities to combine them later with additional knowledge. If we evaluate the chunking result without counting those under-combining errors, the F score of the proposed chunker achieves 95.45%. Error type No.of the Errors Percentage With comparison we also use some other learning methods, MBL(Bosch and Buchholz, 2002), SVM(Kudoh and Matsumoto, 2001) and TBL to build the chunker. The features for MBL and SVM are the POS of current, left two and right two words, lexical of current, left one and right one word. TiMBL and SVM-light are used as the tools. For SVM, we convert the chunk marks BIOES to BI and the binary class SVM is used to classifier the chunk boundary, then some rules are used to identify its label. For TBL, the rule templates are all the possible combinations of the features and the initial state is that each word is a chunk. Table 7 shows the result. As seen, without error correction all these models do not perform well and our HMM gets the best performance.</Paragraph> </Section> <Section position="2" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 6.2 Further applications </SectionTitle> <Paragraph position="0"> The length of chunks with our spec (AoL is 1.38) is longer than other Treebank-derived specs (AoL of S1 is 1.239) and closer to the constituents of sentence. Thus there are several applications benefit from the fact, such as: 1) The longest/full noun phrase identification.</Paragraph> <Paragraph position="1"> According to our statistics, due to including noun-noun compounds, 'a_n_n' and 'm_n_n' inside NPs, 65% noun chunks are already the longest/full noun phrases and other 22% could become the longest /full noun phrases by only one next combining step.</Paragraph> <Paragraph position="2"> 2) The predicate-verb identification.</Paragraph> <Paragraph position="3"> By extending the average length of VPs, the main verb (or predicate-verb, also called tensed verb in English) of a given sentence could be identified based on certain surface evidences with a relatively high accuracy. With certain definition our statistics based on our test set show that 84.88% of those main verbs are located in the first longest VPs among all VPs in a sentence.</Paragraph> </Section> </Section> class="xml-element"></Paper>