File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2922_concl.xml
Size: 3,842 bytes
Last Modified: 2025-10-06 13:55:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2922"> <Title>Experiments with a Multilanguage Non-Projective Dependency Parser</Title> <Section position="9" start_page="168" end_page="169" type="concl"> <SectionTitle> 7 Error Analysis </SectionTitle> <Paragraph position="0"> The parser does not attempt to assign a dependency relation to the root. A simple correction of assigning a default value for each language gave an improvement in the LAS as shown in Table 1.</Paragraph> <Section position="1" start_page="168" end_page="169" type="sub_section"> <SectionTitle> 7.1 Portuguese </SectionTitle> <Paragraph position="0"> Out of the 45 dependency relations that the parser had to assign to a sentence, the largest number of errors occurred assigning N<PRED (62), ACC (46), PIV (43), CJT (40), N< (34), P< (30).</Paragraph> <Paragraph position="1"> The highest number of head error occurred at the CPOS tags PRP with 193 and V with 176. In particular just four prepositions (em, de, a, para) accounted for 120 head errors.</Paragraph> <Paragraph position="2"> Most of the errors occur near punctuations. Often this is due to the fact that commas introduce relative phrases or parenthetical phrases (e.g. &quot;o suspeito, de 38 anos, que trabalha&quot;), that produce diversions in the flow. Since the parser makes decisions analyzing only a window of tokens of a limited size, it gets confused in creating attachments. I tried to add some global context features, to be able to distinguish these cases, in particular, a count of the number of punctuation marks seen so far, whether punctuation is present between the focus words. None of them helped improving precision and were not used in the submitted runs.</Paragraph> </Section> <Section position="2" start_page="169" end_page="169" type="sub_section"> <SectionTitle> 7.2 Czech </SectionTitle> <Paragraph position="0"> Most current parsers for Czech do not perform well on Apos (apposition), Coord (coordination) and ExD (ellipses), but they are not very frequent. The largest number of errors occur on Obj (166), Adv (155), Sb (113), Atr (98). There is also often confusion among these: 33 times Obj instead of Adv, 32 Sb instead of Obj, 28 Atr instead of Adv.</Paragraph> <Paragraph position="1"> The high error rate of J (adjective) is expected, mainly due to coordination problems. The error of R (preposition) is also relatively high. Prepositions are problematic, but their error rate is higher than expected since they are, in terms of surface order, rather regular and close to the noun. It could be that the decision by the PDT to hang them as heads instead of children, causes a problem in attaching them. It seems that a post-processing may correct a significant portion of these errors.</Paragraph> <Paragraph position="2"> The labels ending with _Co, _Ap or _Pa are nodes who are members of the Coordination, Apposition or the Parenthetical relation, so it may be worth while omitting these suffixes in learning and restore them by post-processing.</Paragraph> <Paragraph position="3"> An experiment using as training corpus a subset consisting of just sentences which include non-projective relations achieved a LAS of 65.28 % and UAS of 76.20 %, using MBL.</Paragraph> <Paragraph position="4"> Acknowledgments. Kiril Ribarov provided insightful comments on the results for Czech.</Paragraph> <Paragraph position="5"> The following treebanks were used for training the parser: (Afonso et al., 2002; Atalay et al., 2003; Bohmova et al., 2003; Brants et al., 2002; Chen et al., 2003; Civit Torruella and Marti Antonin, 2002; Dzeroski et al., 2006; Hajic et al., 2004; Kawata and Bartels, 2000; Kromann, 2003; Nilsson et al., 2005; Oflazer et al., 2003; Simov et al., 2005; van der Beek et al., 2002).</Paragraph> </Section> </Section> class="xml-element"></Paper>