File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/h05-1065_evalu.xml
Size: 7,744 bytes
Last Modified: 2025-10-06 13:59:20
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1065"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 515-522, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Disambiguation of Morphological Structure using a PCFG</Title> <Section position="8" start_page="518" end_page="520" type="evalu"> <SectionTitle> 7 Results </SectionTitle> <Paragraph position="0"> The parser was trained using the Inside-Outside algorithm. By default, (a) the initialization of the rule probabilities was non-uniform as described in section 5, (b) training was based on tokens (i.e. the frequency of the training items was taken into account), and (c) all training iterations were lexicalized. Training was quite fast. One training iteration on 2.3 million word forms took about 10 minutes on a Pentium IV running at 3 GHz.</Paragraph> <Paragraph position="1"> Figure 4 shows the exact match accuracy of the Viterbi parses depending on the number of training iterations, whichrangesfrom0(theinitial, untrained model) to 15. For comparison, a baseline result is shown which was obtained by selecting the set of simplest analyses and choosing one of them at random3. The baseline accuracy was 45.3%. The parsingaccuracyofthedefaultmodeljumpsfromastart- null ing value of 41.8% for the untrained model (which is below the baseline) to 58.5% after a single training iteration. The peak performance is reached after 8 iterations with 65.4%. The average accuracy of the models obtained after 6-25 iterations is 65.1%.</Paragraph> <Paragraph position="2"> Results obtained with type-based training, where each word receives the same weight ignoring its frequency, were virtually identical to those of the default model. If the parser training was started with a uniform initial model, however, the accuracy dropped by about 6 percentage points. Figure 4 also shows that the performance of an unlexicalized 3In fact, we counted a word with n simplest analyses as 1/n correct instead of actually selecting one analysis at random, in ordertoavoidadependencyofthebaselineresultontherandom number generation.</Paragraph> <Paragraph position="3"> PCFG is about 13% lower.</Paragraph> <Paragraph position="4"> We also experimented with a combination of unlexicalized and lexicalized training. Lexicalized models have a huge number of parameters. Therefore, there is a large number of locally optimal parameter settings to which the unsupervised training can be attracted. Purely lexicalized training is likely to get stuck in a local optimum which is close to the starting point. Unlexicalized models, on the other hand, have fewer parameters, a smaller number of local optima and a smoother search space. Unlexicalized training is therefore more likely to reach a globally (near-)optimal point and provides a better starting point for the lexicalized training.</Paragraph> <Paragraph position="5"> Figure 5 shows that initial unlexicalized training indeed improves the accuracy of the parser. With one iteration of unlexicalized training (see &quot;unlex 1&quot; in figure 5), the accuracy increased by about 3%.</Paragraph> <Paragraph position="6"> The maximum of 68.4% was reached after 4 iterations of lexicalized training. The results obtained with 2 iterations of unlexicalized training were very similar. With 3 iterations, the performance dropped almost to the level of the default model. It seems that some of the general preferences learned during unlexicalized training are so strong after three iterations that they cannot be overruled anymore by the lexeme-specific preferences learned in the lexicalized training.</Paragraph> <Paragraph position="7"> or 3 iterations of unlexicalized training, followed by lexicalized training In order to assess the parsing results qualitatively, 100 parsing errors of version &quot;unlex 2&quot; were randomly selected and inspected. It turned out that the parser always preferred right-branching structures over left-branching structures in complex compounds with three or more elements, which resulted in 57 errors caused by left-branching structures.</Paragraph> <Paragraph position="8"> Grammars trained without the initial unlexicalized training showed no systematic preference for right-branching structures. In the test data, left-branching structures were two times more frequent than right-branching structures.</Paragraph> <Paragraph position="9"> 29 disambiguation errors resulted from selecting the wrong stem although the structure of the analysis was otherwise correct. In the word Rechtskonstruktion (legal construction), for instance, the first elementRechtswasderivedfromtheadjectiverechts (right) rather than the noun Recht (law). Similarly, the adjective quelloffen (open-source) was derived from the verb quellen (to swell) rather than the noun Quelle (source).</Paragraph> <Paragraph position="10"> Six errors involved a combination of compounding and suffix derivation (e.g. the word Flugbegleiterin (stewardess)). The parser preferred the analysis where the derivation is applied first (Flug-Begleiterin (flight attendant[female])), whereas in the gold standard analysis, the compound is formed first (Flugbegleiter-in (steward-ess).</Paragraph> <Paragraph position="11"> In order to better understand the benefits of unlexicalized training, we also examined the differences between the best model obtained with one iteration of unlexicalized (unlex1), and the best model obtained without unlexicalized training (default).</Paragraph> <Paragraph position="12"> 30 cases involved left-branching vs. right-branching compounds. The unlex1 model showed a higherpreferenceforright-branchingstructuresthan the default model, but produced also left-branching structures (unlike the model unlex2). In 15 of the 30 cases, unlex1 correctly decided for a right-branching structure; in 13 cases, unlex1 was wrong with proposing a right-branching structure. In two cases, unlex1 correctly predicted a left-branching structure and the default model predicted a right-branching structure.</Paragraph> <Paragraph position="13"> 32 differences were caused by lexical ambiguities. In 24 cases, only one stem was ambiguous. 15 times unlex1 was right (e.g. Moskaureise - Moskow trip[sg] vs. Moskow rice[pl]) and nine times the default model was right (e.g. Jodtabletten - iodine pill vs. iodine tablet). In 8 cases, two morphemes were involved in the ambiguity. In all these cases, unlex1 generated the correct analysis (e.g. Sportraum &quot;sport room&quot; vs. &quot;Spor[name] dream&quot;). Nine ambiguities involved the length of verb prefixes. Six times, unlex1 correctly decided for a longerprefix(e.g.gegen&quot;uber-stehen(toface)instead of gegen-&quot;uberstehen (to &quot;counter-survive&quot;). In another experiment, we tested the parser on the first test data set (data1) where simplex words, part-of-speech ambiguities, frequent words and repeated occurrences were not removed. The baseline accuracy on this data was 43.75%. Figure 6 shows the results obtained with different numbers of unlexicalizedtrainingiterationsanalogoustofigure5. Strictly lexicalized training produced the best results, here.</Paragraph> <Paragraph position="14"> The maximal accuracy was 58.59% which was obtained after 7 iterations. In contrast to the experiments on data2, the accuracy decreased by more than 1.5% when the training was continued. As said in the introduction, we think that part-of-speech ambiguities are better resolved by a part-of-speech tagger and that frequent words can be disambiguated manually.</Paragraph> </Section> class="xml-element"></Paper>