XML Viewer - h05-1100

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/h05-1100_evalu.xml
Size: 9,957 bytes
Last Modified: 2025-10-06 13:59:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1100">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 795-802, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Morphology and Reranking for the Statistical Parsing of Spanish</Title>
  <Section position="6" start_page="798" end_page="801" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Our models were trained using a training set consisting of 80% of the data (2,801 sentence/tree pairs, 75,372 words) available to us in the 3LB treebank.</Paragraph>
    <Paragraph position="1"> We reserved the remaining 20% (692 sentences, 19,343 words) to use as unseen data in a test set.</Paragraph>
    <Paragraph position="2"> We selected these subsets with two criteria in mind: first, respecting the boundaries of the texts by placing articles in their entirety into either one subset or the other; and second, maintaining, in each subset, the same proportion of genres found in the original set of trees. During development, we used a cross- null are represented with a flat structure as in (a). For coordination involving a non-terminal X (X =s in the example), we insert new nodes X-CC1 and X-CC2 to form the structure in (b).</Paragraph>
    <Paragraph position="3"> validation approach on the training set to test different models. We divided the 2,800 training data trees into 14 different development data sets, where each of these data sets consisted of 2,600 training sentences and 200 development sentences. We took the average over the results of the 14 splits to gauge the effectiveness of the model being tested.</Paragraph>
    <Paragraph position="4"> To evaluate our models, we considered the recovery of labeled and unlabeled dependencies as well as labeled constituents. Unlabeled dependencies capture how the words in a sentence depend on one another. Formally, they are tuples {headchild index, modifier index}, where the indices indicate position in the sentence. Labeled dependencies include the labels of the modifier, headchild, and parent non-terminals as well. The root of the tree has a special dependency: {head index} in the unlabeled case and {TOP, headchild index, root non-terminal} in the labeled case. The labeled constituents in a tree are all of the non-terminals and, for each, the positions of the words it spans. We use the standard definitions of precision, recall, and F-measure.5 5When extracting dependencies, we replaced all non-punctuation POS labels with a generic label TAG to avoid conflating tagging errors with dependency errors. We also included the structural changes that we imposed during preprocessing. Results for constituent precision and recall were computed after we restored the trees to the original treebank structure.  dependencies, both scores are shown. Row 1 shows results on a baseline model containing almost no morphological information. The subsequent rows represent a subset of the models with which we experimented: n(P,N,V) uses number for pronouns, nouns, and verbs; n(A,D,N,P,V) uses number for adjectives, determiners, nouns, pronouns, and verbs; n(V) uses number for verbs; m(V) uses mode for verbs; t(V) uses tense for verbs; p(V) uses person for verbs; g(V) uses gender for verbs; the models in rows 9-12 are combinations of these models, and in row 13, n(A,D,N,V,P) combines with g(A,D,N,V,P), which uses gender for adjectives, determiners, nouns, verbs, and pronouns. The results of the best-performing model are in bold.  morphological model that scored highest during development. Row 3 gives the accuracy of the reranking approach, when applied to n-best output from the model in Row 2.</Paragraph>
    <Section position="1" start_page="799" end_page="800" type="sub_section">
      <SectionTitle>
5.1 The Effects of Morphology
</SectionTitle>
      <Paragraph position="0"> In our first experiments, we trained over 50 models, incorporating different morphological information into each in the way described in Section 3.1.</Paragraph>
      <Paragraph position="1"> Prior to running the parsers, we trained the POS tagger described in (Collins, 2002). The output from the tagger was used to assign a POS label for unknown words. We only attempted to parse sentences under 70 words in length.</Paragraph>
      <Paragraph position="2"> Table 3 describes some of the models we tried during development and gives results for each. Our baseline model, which we used to evaluate the effects of using morphology, was Model 1 (Collins, 1999) with a simple POS tagset containing almost no morphological information. The morphological models we show are meant to be representative of both the highest-scoring models and the performance of various morphological features. For instance, we found that, in general, gender had only a slight impact on the performance of the parser. Note that gender is not a morphological attribute of Spanish verbs, and that the inclusion of verbal features, particularly number, mode, and person, generated the strongest-performing models in our experiments.</Paragraph>
      <Paragraph position="3"> Table 4 shows the results of running two models on the test set: the baseline model and the best-performing morphological model from the development stage. This model uses the number and mode of verbs, as well as the number of adjectives, determiners, nouns, and pronouns.</Paragraph>
      <Paragraph position="4"> The results in Tables 3 and 4 show that adding some amount of morphological information to a parsing model is beneficial. We found, however, that adding more information does not always lead to improved performance (see, for example, rows 11 and 13 in Table 3). Presumably this is because the tagset grows too large.</Paragraph>
      <Paragraph position="5"> Table 5 takes a closer look at the performance  of the best-performing morphological model in the recovery of particular labeled dependencies. The breakdown shows the top 15 dependencies in the gold-standard trees across the entire training set.</Paragraph>
      <Paragraph position="6"> Collectively, these dependencies represent around 72% of the dependencies seen in this data.</Paragraph>
      <Paragraph position="7"> We see an extraordinary gain in the recovery of some of these dependencies when we add morphological information. Among these are the two involving postmodifiers to verbs. When examining the output of the morphological model, we found that much of this gain is due to the fact that there are two non-terminal labels used in the treebank that specify modal information of verbs they dominate (infinitivals and gerunds): with insufficient morphological information, the baseline parser was unable to distinguish regular verb phrases from these more specific verb phrases.</Paragraph>
      <Paragraph position="8"> Some dependencies are particularly difficult for the parser, such as that in which SBAR modifies a noun ({GRUP TAG SBAR R}). We found that around 20% of cases of this type in the training set involve structures like el proceso de negociones que (in English the process of negotiation that). This type of structure is inherently difficult to disambiguate. In Spanish, such structures may be more common than in English, since phrases involving nominal modifiers to nouns, like negotiation process, are always formed as noun + de + noun.</Paragraph>
    </Section>
    <Section position="2" start_page="800" end_page="800" type="sub_section">
      <SectionTitle>
5.2 Experiments with Reranking
</SectionTitle>
      <Paragraph position="0"> In the reranking experiments, we follow the procedure described in (Collins and Koo, 2005) for creation of a training set with n-best parses for each sentence. This method involves jack-knifing the data: the training set of 2,800 sentences was parsed in 200-sentence chunks by an n-best morphological parser trained on the remaining 2,600 sentences. This ensured that each sentence in the training data had n-best output from a baseline model that was not trained on that sentence. We used the optimal morphological model (n(A,D,N,V,P)+m(V)) to generate the n-best lists, and we used the feature set described in (Collins and Koo, 2005). The test results are given in Table 4.6 6Note that we also created development sets for development of the reranking approach, and for cross-validation of the single parameter C in approach of (Bartlett et al., 2004).  dencies (representing around 72% of all dependencies) in the gold-standard trees across all training data. The first column shows the type and subtype, where the subtype is specified as the 4-tuple {parent non-terminal, head non-terminal, modifier non-terminal, direction}; the second column shows the count for that subtype and the percent of the total that it represents (where the total is 62,372) . The model BL is the baseline, and M is the morphological model n(A,D,N,V,P)+m(V).</Paragraph>
    </Section>
    <Section position="3" start_page="800" end_page="801" type="sub_section">
      <SectionTitle>
5.3 Statistical Significance
</SectionTitle>
      <Paragraph position="0"> We tested the significance of the labeled precision and recall results in Table 4 using the sign test.</Paragraph>
      <Paragraph position="1"> When applying the sign test, for each sentence in the test data we calculate the sentence-level F1 constituent score for the two parses being compared.</Paragraph>
      <Paragraph position="2"> This indicates whether one model performs better on that sentence than the other model, or whether the two models perform equally well, information used by the sign test. All differences were found to be statistically significant at the level p = 0.01.7 7When comparing the baseline model to the morphological model on the 692 test sentences, F1 scores improved on 314 sentences, and became worse on 164 sentences. When comparing the baseline model to the reranked model, 358/157 sen-</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML