File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1025_intro.xml
Size: 7,426 bytes
Last Modified: 2025-10-06 14:01:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1025"> <Title>A Maximum Entropy Chinese Character-Based Parser</Title> <Section position="5" start_page="0" end_page="0" type="intro"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> All experiments reported here are conducted on the latest LDC release of the Chinese Treebank, which consists of about a108a11a17 a12a11a163 words. Word parse trees are converted to character trees using the procedure described in Section 2. All traces and functional tags are stripped in training and testing. Two results are reported for the character-based parsers: the F-measure of word segmentation and F-measure of constituent labels. Formally, leta164a35a165a11a38a64a153a94a45a92a58a92a164a167a166a84a38a64a153a99a45 be the number of words of thea153a114a19a116 reference sentence and its parser output, respectively, and a168a169a38a64a153a94a45 be the number of common words in thea153a67a114a19a116 sentence of test set, then the word segmentation F-measure is</Paragraph> <Paragraph position="2"> The F-measure of constituent labels is computed similarly:</Paragraph> <Paragraph position="4"> where a155 a165 a38a64a153a94a45 and a155 a166 a38a64a153a94a45 are the number of constituents in the a153a114a19a116 reference parse tree and parser output, respectively, and a68a180a38a64a153a99a45 is the number of common constituents. Chunk-level labels converted from POS tags (e.g., NR , NN and VV etc in (1)) are included in computing label F-measures for character-based parsers.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Impact of Training Data </SectionTitle> <Paragraph position="0"> The rst question we have is whether CTB is large enough in the sense that the performance saturates.</Paragraph> <Paragraph position="1"> The rst set of experiments are intended to answer this question. In these experiments, the rst a9 a12a14a7 CTB is used as the training set and the rest a1a16a12a14a7 as the test set. We start with a1a16a12a14a7 of the training set and increase the training set each time by a1a16a12a14a7 . Only language-independent features are used in these experiments. null Figure 1 shows the word segmentation F-measure and label F-measure versus the amount of training data. As can be seen, F-measures of both word segmentation and constituent label increase monotonically as the amount of training data increases.</Paragraph> <Paragraph position="2"> If all training data is used, the word segmentation F-measure is a9a11a17 a3a182a181a21a7 and label F-measure a0a2a1a4a3a15 a7 . These results show that language-independent features work fairly well a major advantage of data-driven statistical approach. The learning curve also shows that the current training size has not reached a saturating point. This indicates that there is room to improve our model by getting more training data.</Paragraph> <Paragraph position="3"> measure and parsing label F-measure vs. percentage of training data.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Effect of Lexical Features </SectionTitle> <Paragraph position="0"> In this section, we present the main parsing results.</Paragraph> <Paragraph position="1"> As it has not been long since the second release of CTB and there is no commonly-agreed training and test set, we divide the entire corpus into 10 equal partitions and hold each partition as a test set while the rest are used for training. For each training-test conguration, a baseline model is trained with only language independent features. Baseline word segmentation and label F-measures are plotted with dottedline in Figure 2. We then add extra lexical features described in Section 3.1 to the model. Lexical questions are derived from a 58K-entry word list. The word list is broken into 4 sub-lists based on word length, ranging from 2 to 5 characters. Lexical features are computed by answering one of the three questions in Table 5. Intuitively, these questions would help the model to identify word boundaries, which in turn ought to improve the parser. This is con rmed by results shown in Figure 2. The solid two lines represent results with enhanced lexical questions. As can be seen, lexical questions improve signi cantly both word segmentation and parsing across all experiments. This is not surprising as lexical features derived from the word list are complementary to language-independent features computed from training sentences.</Paragraph> <Paragraph position="2"> measures vs. the experiment numbers. Lines with triangles: segmentation; Lines with circles: label; Dotted-lines: language-independent features only; Solid lines: plus lexical features.</Paragraph> <Paragraph position="3"> Another observation is that results vary greatly across experiment con gurations: for the model trained with lexical features, the second experiment has a label F-measure a0a11a181a100a3a182a0a21a7 and word-segmentation F-measure a9a21a15 a3a6a5a8a7 , while the sixth experiment has a label F-measure a15a11a15 a3a19a1a18a7 and word-segmentation F-measurea9 a5a144a3a15 a7 . The large variances justify multiple experiment runs. To reduce the variances, we report numbers averaged over the 10 experiments in Table 6. Numbers on the row starting with WS are word-segmentation results, while numbers on the last row are F-measures of constituent labels. The second column are average F-measures for the baseline model trained with only language-independent features. The third column contains F-measures for the model trained with extra lexical features. The last column are releative error reduction. The best average word-segmentation F- null language-independent features. LexFeat: plus lexical features. Numbers are averaged over the 10 experiments in Figure 2.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Effect of Syntactic Information on Word-segmentation </SectionTitle> <Paragraph position="0"> Since CTB provides us with full parse trees, we want to know how syntactic information affects wordsegmentation. To this end, we devise two sets of experiments: 1. We strip all POS tags and labels in the Chinese Treebank and retain only word boundary information. To use the same maximum entropy parser, we represent word boundary by dummy constituent label W . For example, the sample full parse trees.</Paragraph> <Paragraph position="1"> With these two representations of CTB, we repeat the 10 experiments of Section 4.2 using the same lexical features. Word-segmentation results are plotted in Figure 3. The model trained with word boundary information has the worst performance, which is not surprising as we would expect information such as POS tags to help disambiguate word boundaries. What is surprising is that syntactic information beyond POS tags has little effect on word-segmentation there is practically no difference between the solid line (for the model trained with full parse trees) and the dashed-line (for the model trained with POS information) in Figure 3. This result suggests that most ambiguities of Chinese word boundaries can be resolved at lexical level, and high-level syntactic information does not help much to word segmentation in the current parser.</Paragraph> </Section> </Section> class="xml-element"></Paper>