File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2414_evalu.xml

Size: 2,057 bytes

Last Modified: 2025-10-06 13:59:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2414">
  <Title>Memory-based semantic role labeling: Optimizing features, algorithm, and output</Title>
  <Section position="4" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> We started with a feature selection process with the features described in section 2. This experiment used a basic k-nn classifier without feature weighting, a nearest neighborhood of size 1, attenuated words, and output post-processing. We evaluated the effect of trigram output classes by performing an experiment with and without them. The feature selection experiment without tri-gram output classes selected 10 features and obtained an F =1 score of 46.3 on the development data set. The experiment that made use of combined classes selected 12 features and reached a score of 51.8.</Paragraph>
    <Paragraph position="1"> We decided to continue using trigram output classes.</Paragraph>
    <Paragraph position="2"> Subsequently, we optimized the parameters of our machine learner based on the features in the second experiment and performed another feature selection experiment with these parameters. The performance effects can be found in Table 1 (rows b and c). An additional parameter optimization step did not have a substantial effect (Table 1, row d).</Paragraph>
    <Paragraph position="3"> After training a stacked classifier while using the output of the best first stage learner, performance went up from 56.5 to 58.8. Additional feature selection and parameter optimization were useful at this level (F =1=60.9, see Table 1). Most of our other performance gain was obtained by a continued process of classifier stacking. Parameter optimization did not result in improved performance when stacking more than one classifier. Feature selection was useful for the third-stage classifier but not for the next one. Our final system obtained an F =1 score of 63.0 on the development data (Table 1) and 60.1 on the test set (Table 4).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML