File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2923_evalu.xml

Size: 2,570 bytes

Last Modified: 2025-10-06 13:59:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2923">
  <Title>LingPars, a Linguistically Inspired, Language-Independent Machine Learner for Dependency Treebanks</Title>
  <Section position="6" start_page="173" end_page="174" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> Because of LingPars' strong focus on function tags, a separate analysis of attachment versus label per formance was thought to be of interest. Ill. 1 plots the latter (Y-axis) against the former (X-axis), with dot size symbolizing treebank size. In this evalua tion, a fixed training chunk size of 50,000 tokens11 was used, and tested on a different sample of 5,000 tokens (see also 5/50 evaluation in ill. 2). For most languages, function performance was better than attachment performance (3.2 percentage points on average, as opposed to 0.44 for the CoNLL sys tems overall), with dots above the hyphenated &amp;quot;di agonal of balance&amp;quot;. Interestingly, the graphics also makes it clear that performance was lower for small treebanks, despite the fact that training cor pus size had been limited in the experiment, possi bly indicating correlated differences in the balance between tag set size and treebank size.</Paragraph>
    <Paragraph position="1">  Ill. 2 keeps the information from ill. 1 (5/50-dep and 5/50-func), represented in the two lower lines, but adds performance for maximal training corpus size12 with (a) a randomly chosen test chunk of 5,000 tokens not included in the training corpus (5/all-5) and (b) a 20,000 token chunk from the training corpus (20/all). Languages were sorted ac 11Smaller for Slovene and Arabic (for these languages: largest possible) 12Due to deadline time constraints, an upper limit of 400,000 lines was forced on the biggest treebanks, when training for unknown test data, meaning that only 1/2 of the German data and 1/3 of the Czech data could be used.  cording to 20/all-func accuracy. As can be seen from the dips in the remaining (lower) curves, small training corpora (asterisk-marked languages) made it difficult for the parser (1) to match 20/all attachment performance on unknown data, and (2) to learn labels/functions in general (dips in all function curves, even 20/all). For the larger tree banks, the parser performed better (1-3 percentage points) for the full training set than for the 50,000 token training set.</Paragraph>
    <Paragraph position="2"> Illustration 2: Performance with different training cor pus sizes (upper 2 curves: Test data included)</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML