File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0112_metho.xml

Size: 15,039 bytes

Last Modified: 2025-10-06 14:10:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0112">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Hybrid Approach to Chinese Base Noun Phrase Chunking</Title>
  <Section position="4" start_page="87" end_page="88" type="metho">
    <SectionTitle>
2 Task Description
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="87" end_page="87" type="sub_section">
      <SectionTitle>
2.1 Data Representation
</SectionTitle>
      <Paragraph position="0"> Ramshaw and Marcus (1995) gave mainly two kinds of base NPs representation - the open/close bracketing and IOB tagging. For example, a bracketed Chinese sentence, [ Wai Shang (foreign businessmen) Tou Zi (investment)] Cheng Wei (become) [ Zhong Guo (Chinese) Wai Mao (foreign trade)] [ Zhong Yao (important) Zeng Chang Dian (growth)] . The IOB tags are used to indicate the boundaries for each base NP where letter 'B' means the current word starts a base NP, 'I' for a word inside a base NP and 'O' for a word outside a NP chunk. In this case the tokens for the former sentence would be labeled as follows: Wai Shang /B Tou Zi /I Cheng Wei /V Zhong Guo /B Wai Mao /I Zhong Yao /B Zeng Chang Dian /O . /O Currently, most of the work on base NP identification employs the trainable, corpus-based algorithm, which makes full use of the tokens and corresponding POS tags to recognize the chunk segmentation of the test data. The SVM and CRF are two representative effective models widely used.</Paragraph>
    </Section>
    <Section position="2" start_page="87" end_page="88" type="sub_section">
      <SectionTitle>
2.2 Chunking with SVMs
</SectionTitle>
      <Paragraph position="0"> SVM is a machine learning algorithm for a linear binary classifier in order to maximize the margin of confidence of the classification on the training data set. According to the different requirements, distinctive kernel functions are employed to transfer non-linear problems into linear problems by mapping it to a higher dimension space.</Paragraph>
      <Paragraph position="1"> By transforming the training data into the form with IOB tags, we can view the base NP chunking problem as a multi-class classification problem. As SVMs are binary classifiers, we use the pairwise method to convert the multi-class problem into a set of binary class problem, thus the I/O/B classifier is reduced into 3 kinds of binary classifier -- I/O classifier, O/B classifier, B/I classifier.</Paragraph>
      <Paragraph position="2"> In our experiments, we choose TinySVM  (Kudo and Matsumoto, 2001) as the one of the baseline systems for our chunker. In order to construct the feature sets for training SVMs, all information available in the surrounding contexts, including tokens, POS tags and IOB tags. The tool YamCha makes it possible to add new features on your own. Therefore, in the training stage, we also add two new features according to the words. First, we give special tags to the noun words, especially the proper noun, as we find in the experiment the proper nouns sometimes bring on errors, such as base  NP &amp;quot;Si Chuan (Sichuan)/NR Pen Di (basin)/NN&amp;quot;, containing the proper noun &amp;quot;Si Chuan /NR&amp;quot;, could be mistaken for a single base NP &amp;quot;Pen Di /NN&amp;quot;; Second, some punctuations such as separating marks, contribute to the wrong chunking, because many Chinese compound noun phrases are connected by separating mark, and the ingredients in the sentence are a mixture of simple nouns and noun phrases, for example,  The part of base NP - &amp;quot;Zhong Guo /B She Hui /I Ke Xue Yuan /I&amp;quot; can be recognized as three independent base NPs --&amp;quot;Zhong Guo /B She Hui /B Ke Xue Yuan /B&amp;quot;. The kind of errors comes from the conjunction &amp;quot;He (and)&amp;quot; and the successive sequences of nouns, which contribute little to the chunker. More information d analyses will be provided in Section 4. an</Paragraph>
    </Section>
    <Section position="3" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
2.3 Conditional Random Fields
</SectionTitle>
      <Paragraph position="0"> Lafferty et al.( 2001) present the Conditional Random Fields for building probabilistic models to segment and label sequence data, which was used effectively for base NP chunking (Sha &amp; Pereira, 2003). Lafferty et al. (2001) point out that each of the random variable label sequences Y conditioned on the random observation sequence X. The joint distribution over the label sequence Y given X has the form</Paragraph>
      <Paragraph position="2"/>
      <Paragraph position="4"> are labels, x is an input sequence, i is an input position, ()Z x is a normalization factor; k l is the parameter to be estimated from training data. Then we use the maximum likelihood training, such as the log-likelihood to train CRF given</Paragraph>
      <Paragraph position="6"> can be computed using a variant of the forward-backward algorithm. We define a transition matrix as following:</Paragraph>
      <Paragraph position="8"> They also present a novel approach to model construction and feature selection in shallow parsing.</Paragraph>
      <Paragraph position="9"> We use the software CRF++  as our Chinese base NP chunker baseline software. The results of CRF are better than that of SVM, which is the same as the outcome of the English base NP chunking in (Sha &amp; Pereira, 2003). However, we find CRF products some errors on identifying long-range base NP, while SVM performs well in this aspect and the errors of SVM and CRF are of different types. In this case, we develop a combination approach to improve the results.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="88" end_page="90" type="metho">
    <SectionTitle>
3 Our Approach
</SectionTitle>
    <Paragraph position="0"> [?] (Tjong et al., 2000) pointed out that the performance of machine learning can be improved by combining the output of different systems, so they combined the results of different classifiers  and obtained good performance. Their combination system generated different classifiers by using different data labels and applied respective voting weights accordingly. (Kudo and Matsumoto 2001) designed a voting arrangement by applying cross validation and VC-bound and Leave-One-Out bound for the voting weights.</Paragraph>
    <Paragraph position="1"> The voting systems improve the accuracy, the choices of weights and the balance between different weights is based on experiences, which does not concern the inside features of the classification, without the guarantee of persuasive theoretical supports. Therefore, we developed a hybrid approach to combine the results of the SVM and CRF and utilize their advantages.</Paragraph>
    <Paragraph position="2"> (Simon, 2003) pointed out that the SVM guarantees a high generalization using very rich features from the sentences, even with a large and highdimension training data. CRF can build efficient and robust structure model of the labels, when one doesn't have prior knowledge about data.</Paragraph>
    <Paragraph position="3"> Figure 1 shows the preliminary chunking and pos-processing procedure in our experiments First of all, we use YamCha and CRF++ respectively to treat with the testing data. We got two original results from those chunkers, which use the exactly same data format; in this case we can compare the performance between CRF and SVM. After comparisons, we can figure out the same words with different IOB tags from the two former chunkers. Afterward, there exist two problems: how to pick out the IOB tags identified improperly and how to modify those wrong IOB tags.</Paragraph>
    <Paragraph position="4"> To solve the first question, we use the conditional probability from the CRF to help determine the wrong IOB tags. For each word of the testing data, the CRF chunker works out a conditional probability for each IOB tag and chooses the most probable tag for the output. We bring out the differences between the SVM and CRF, such as &amp;quot;Si Chuan (Sichuan)&amp;quot; in a base noun phrase is recognized as &amp;quot;I&amp;quot; and &amp;quot;O&amp;quot; respectively, and the distance between P(I |&amp;quot;Si Chuan &amp;quot;) and P(O |&amp;quot; Si Chuan &amp;quot;) is tiny. According to our experiment, about 80% of the differences between SVM and CRF share the same statistical characters, which indicate the correct answers are inundated by the noisy features in the classifier.</Paragraph>
    <Paragraph position="5">  Using the comparison between SVM and CRF we can check most of those errors. Then we could build some simple grammar rules to figure out the correct tags for the ambiguous words corresponding to the surrounding contexts. Then At the error pruning step, judging from the surrounding texts and the grammar rules, the base NP is corrected to the right form. We give 5 mainly representative grammar rules to explain how they work in the experiments.</Paragraph>
    <Paragraph position="6"> The first simple sample of grammar rules is just like &amp;quot;BNP - NR NN&amp;quot;, used to solve the proper noun problems. Take the &amp;quot; Si Chuan (Sichuan)/NR/B Pen Di (basin)/NN/I&amp;quot; for example, the comparison finds out the base NP recognized as &amp;quot;Si Chuan (Sichuan)/NR/I Pen Di (basin)/NN/B&amp;quot;. Second, with respect to the base NP connecting with separating mark and conjunction words, two rules &amp;quot;BNP - BNP CC (BNP  |Noun), BNP -BNP PU (BNP  |Noun)&amp;quot; is used to figure out those errors; Third, with analyzing our experiment results, the CRF and SVM chunker recognize differently on the determinative, therefore the rule &amp;quot;BNP - JJ BNP&amp;quot;, our combination methods figure out new BNP tags from the preliminary results according to this rule. Finally, the most complex situation is the determination of the Base NPs composed of series of nouns, especially the proper nouns. With figuring out the maximum length of this kind of noun phrase, we highlight the proper nouns and then separate the complex noun phrase to base noun phrases, and according to the our experiments, this  method could solve close to 75% of the ambiguity in the errors from complex noun phrases. Totally, the rules could solve about 63% of the found errors.</Paragraph>
  </Section>
  <Section position="6" start_page="90" end_page="91" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> The CoNLL 2000 provided the software  to convert Penn English Treebank II into the IOB tags form. We use the Penn Chinese Treebank 5.0  , which is improved and involved with more POS tags, segmentation and syntactic bracketing. As the sentences in the Treebank are longer and related to more complicated structures, we modify the software with robust heuristics to cope with those new features of the Chinese Treebank and generate the training and testing data sets from the Treebank. Afterward we also make some manual adjustments to the final data.</Paragraph>
    <Paragraph position="1"> In our experiments, the SVM chunker uses a polynomial kernel with degree 2; the cost per unit violation of the margin, C=1; and tolerance of the termination criterion, 0.01e = . In the base NPs chunking task, the evaluation metrics for base NP chunking include precision P, recall R and the F b . Usually we refer to the F b as the creditable metric.</Paragraph>
    <Paragraph position="2">  All the experiments were performed on a Linux system with 3.2 GHz Pentium 4 and 2G memory. The total size of the Penn Chinese Treebank words is 13 MB, including about 500,000 Chinese words. The quantity of training corpus amounts to 300,000 Chinese words. Each word contains two Chinese characters in average. We mainly use five kinds of corpus, whose sizes include 30000, 40000, 50000, 60000 and 70,000 words. The corpus with an even larger size is improper according to the training corpus  From Figure 2, we can see that the results from CRF are better than that from SVM and the error-pruning performs the best. Our hybrid error-pruning method achieves an obvious improvement F-scores by combining the outcome from SVM and CRF classifiers. The test F-scores are decreasing when the sizes of corpus increase. The best performance with F-score of 89.27% is achieved by using a test corpus of 30k words. We get about 1.0% increase of F-score after using the hybrid approach. The F-score is higher than F-score 87.75% of Chinese base NP chunking systems using the Maximum Entropy method in (Zhou et al., 2003),. Which used the smaller 3 MB Penn Chinese Treebank II as the corpus. The Chinese Base NP chunkers are not superior to those for English. Zhang and Ando (2005) produce the best English base NP accuracy with F-score of 94.39+ (0.79), which is superior to our best results. The previous work mostly considered base NP chunking as the classification problem without special attention to the lexical information and syntactic dependence of words. On the other hand, we add some grammar rules to strength the syntactic dependence between the words. However, the syntactic structure derived from Chinese is much more flexible and complex than that from English. First, some Chinese words contain abundant meanings or play different syntactic roles. For example, &amp;quot;Qi Zhong (among which)/NN Zhong Qing (Chongqing)/NR Di Qu (district)/NN&amp;quot; is recognized as a base NP. Actu- null ally the Chinese word &amp;quot;Qi Zhong /NN (among)&amp;quot; refers to the content in the previous sentence and &amp;quot;Qi Zhong (thereinto)&amp;quot; sometimes used as an adverb. Second, how to deal with the conjunctions is a major problem, especially the words &amp;quot;Yu (and)&amp;quot; can appear in the preposition structure &amp;quot;Yu ...... Xiang Guan (relate to)&amp;quot;, which makes it difficult to judge those types of differences. Third, the chunkers can not handle with compact sequence data of chunks with name entities and new words (especially the transliterated words) satisfactorily, such as &amp;quot;Zhong Guo ( China ) /NR Hong Shi Zi Hui ( Red Cross ) /NRMing Yu ( Honorary ) /NN Hui Chang (Chairman ) /NN Jiang Ze Min ( Jiang Ze-min ) /NR&amp;quot; As it points above, the English name entities sequences are connected with the conjunction such as &amp;quot;of, and, in&amp;quot;. While in Chinese there are no such connection words for name entities sequences. Therefore when we use the statistical methods, those kinds of sequential chunks contribute slightly to the feature selection and classifier training, and are treated as the useless noise in the training data. In the testing section, it is close the separating margin and hardly determined to be in the right category. What's more, some other factors such as Idiomatic and specialized expressions also account for the errors. By highlighting those kinds of words and using some rules which emphasize on those proper words, we use our error-pruning methods and useful grammar rules to correct about 60% errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML