File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2926_evalu.xml
Size: 2,226 bytes
Last Modified: 2025-10-06 13:59:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2926"> <Title>A Pipeline Model for Bottom-Up Dependency Parsing</Title> <Section position="6" start_page="188" end_page="189" type="evalu"> <SectionTitle> 4 Analysis and Discussion </SectionTitle> <Paragraph position="0"> We observed that our UAS for Arabic is generally lower than for other languages. The reason for the low accuracy of Arabic is that the sentence is very long. In the training data for Arabic, there are 25% sentences which have more than 50 words. Since we use a pipeline model in our algorithm, it required more predictions to complete a long sentence. More predictions in pipeline models may result in more mistakes. We think that this explains our relatively low Arabic result. Moreover, in our current system, we use the same window size (2,4) for feature extraction in all languages. Changing the windows size seems to be a reasonable step when the sentences are longer.</Paragraph> <Paragraph position="1"> For Czech, one reason for our relatively low result is that we did not use the whole training corpus due to time limitation 2 . Actually, in our experiment on the development set, when we increase the size of training data in the training phase we got significantly higher result than the system trained on the smaller data. The other problem for Czech is that Czech is one of the languages with many types of part of speech and dependency types, and also the 2Training our system for most languages takes 30 minutes or 1 hour for both phases of labeling HEAD and DEPREL. It takes 6-7 hours for Czech with 50% training data.</Paragraph> <Paragraph position="2"> LAS=Labeled Attachment Score, LAC=Label Accuracy, AV=Average score, and SD=standard deviation. length of the sentences in Czech is relatively long.</Paragraph> <Paragraph position="3"> These facts make recognizing the HEAD and the types of dependencies more dif cult.</Paragraph> <Paragraph position="4"> Another interesting aspect is that we have not used the information about the syntactic and/or morphological features (FEATS) properly. For the languages for which FEATS is available, we have a larger gap, compared with the top system.</Paragraph> </Section> class="xml-element"></Paper>