File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0907_evalu.xml

Size: 9,198 bytes

Last Modified: 2025-10-06 13:59:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0907">
  <Title>Making Sense of Japanese Relative Clause Constructions</Title>
  <Section position="6" start_page="0" end_page="85" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> In evaluation, we compare different clausal interpretation selection techniques. We further go on to investigate the efficacy of different parameter partitions on disambiguation, and generate a learning curve.</Paragraph>
    <Paragraph position="1"> Evaluation was carried out by way of stratified 10-fold cross validation throughout, using the C4.5 decision tree learner (Quinlan, 1993).3 As C4.5 induces a unique decision tree from the training data and then applies this to the test data, we are able to evaluate both training and test classification accuracy, i.e. the relative success of the decision tree in classifying the training data and test data, respectively. null The data used in evaluation is a set of 5143 RCC instances from the EDR corpus (EDR, 1995), of which 4.7% included cosubordinated relative clauses (i.e. the total number of unit relative clauses is 5408). Each RCC instance was manually annotated for default interpretation independent of sentential context. The 10 most-frequent interpretations (out of 27) in this test set are presented below:  Based on this, we can derive a baseline accuracy of 64.0%, obtained by allocating the SUBJECT interpretation to every RCC input.</Paragraph>
    <Paragraph position="2"> 2Note that in (7), the SUBJECT interpretation is shared between a passive and active clause. It is because the interpretational parallelism occurs at the grammatical relation level rather than case-role level that we select grammatical relations for our argument case-slot gapping types.</Paragraph>
    <Section position="1" start_page="85" end_page="85" type="sub_section">
      <SectionTitle>
5.1 Evaluation of analytical disambiguation
</SectionTitle>
      <Paragraph position="0"> First, we evaluate analytical disambiguation by decomposing each RCC into its component cosubordinated RCCs and selecting most plausible interpretation for each unit clause (UC). We compare: (a) a random selection baseline method (RandomUC); (b) a method where all feature vectors for the unit relative clause are logically AND'ed together (ANDUC); (c) a method where all feature vectors for the unit clause are logically OR'ed together (ORUC); and (d) the cascaded-heuristic method from a2 4.2 above (HeuristicUC). The results for the various methods are presented in Fig. 1. Note that 28.8% of clauses occurring in the data are associated with analytical ambiguity, and for the remainder, there is only one verb entry in the case frame dictionary.</Paragraph>
      <Paragraph position="1"> HeuristicUC outperforms the RandomUC baseline to a level of statistical significance,4 in both training and testing. ORUC lags behind HeuristicUC in testing in particular, but is vastly superior to ANDUC, which  nations (C = case slot instantiation, N = head noun semantics, and V = head verb class) is marginally worse than RandomUC in both training and testing.</Paragraph>
      <Paragraph position="2"> Based on these results, we conclude that our system of cascaded heuristics (HeuristicUC) is the best of the tested methods and use this as our intra-clause disambiguation method in subsequent evaluation.</Paragraph>
    </Section>
    <Section position="2" start_page="85" end_page="85" type="sub_section">
      <SectionTitle>
5.2 Disambiguation via cosubordination
</SectionTitle>
      <Paragraph position="0"> Next, we test the cosubordination-based disambiguation techniques. The two core paradigms we consider are: (1) unit clause (UC) analysis, where each cosubordinated clause is considered independently, as in a2 5.1; and (2) clause-integrated (CI) analysis, where we actively use cosubordination in disambiguation.</Paragraph>
      <Paragraph position="1"> For unit clause analysis, we replicate the basic HeuristicUC methodology from above and also extend it by logically AND'ing together the case slot instantiation flags between unit clause feature vectors to maintain a consistently applicable case-role gapping analysis (Heuristica10UC).</Paragraph>
      <Paragraph position="2"> For clause-integrated analysis, we apply Heuristic in intra-clausal analysis, then either logically OR or AND the component unit clause feature vectors together, producing methods ORCI and ANDCI, respectively.</Paragraph>
      <Paragraph position="3"> The training and test accuracies for the described methods over the full data set are given in Fig. 2. Heuristica10UC (incorporating inter-clausal coordination of only case slot data) appears to offer a slight advantage over HeuristicUC, but the two clause-integrated analysis methods of ORCI and ANDCI are significantly superior in both testing and training.</Paragraph>
      <Paragraph position="4"> Overall, the best-performing method is ANDCI at a test accuracy of 88.9%.</Paragraph>
      <Paragraph position="5"> It is difficult to gauge the significance of the results given that coordinating RCC's account for only 4.7% of the total data. One reference point is the performance of the HeuristicUC method over only simple (non-cosubordinated) RCCs. This gives a training accuracy of 90.6% and test accuracy of 89.3%, suggesting that we are actually doing slightly worse over cosubordinated RCCs than simple RCCs, but that we gain considerably from employing a clause-integrated approach relative to simple unit clause analysis.</Paragraph>
      <Paragraph position="6"> An absolute cap on performance for the original system can be obtained through non-deterministic evaluation, whereby the system is adjudged to be correct in the instance that the correct analysis is produced for any one unit clause analysis (out of the multiple analyses per clause). This produces an accuracy of 90.2%, which is presented as Upper Bound in Fig. 2. Given that all that the proposed method is doing is choosing between the different unit clause analyses, it cannot hope to better this.</Paragraph>
      <Paragraph position="7"> Relative to the baseline and upper bound, the error reduction for the clause-integrated ANDCI method is 96.6%, a very strong result.</Paragraph>
    </Section>
    <Section position="3" start_page="85" end_page="85" type="sub_section">
      <SectionTitle>
5.3 Additional evaluation
</SectionTitle>
      <Paragraph position="0"> We further partitioned up the parameter space and ran C4.5 over the different combinations thereof, using ANDCI. The particular parameter partitions we target are case slot instantiation flags (C: 11 features), head noun semantics (N: 14 features) and verb classes (V: 27 features).</Paragraph>
      <Paragraph position="1"> The system results over the individual parameter partitions, and the various combinations of case slot instantiation, head noun semantics and verb classes (e.g. N+V = head noun semantics and verb classes), are presented in Fig. 3.5 The value of head noun semantics is borne out by the high test accuracy for N of 76.0%. We can additionally see that case slot instantiation and verb class features provide approximately equivalent discriminatory power, both well above the absolute base-line of 64.0%. This is despite case slot instantiation flags being less than half the number of verb classes, largely due to the direct correlation between case slot instantiation judgements and case-slot gapping analyses, which account for around 80% of all RCCs.</Paragraph>
      <Paragraph position="2"> The affinity between case slot instantiation judgements and the semantics of the head noun is evidenced in the strong performance of C+N, although even here, verb classes gain us an additional 5% of performance. Essentially what is occurring here is that selectional preferences between particular head noun semantics and certain case-slot/analysis types 5Note that C+N+V corresponds to the full parameter space, and is identical to ANDCI in Figure 2.</Paragraph>
      <Paragraph position="3"> are incrementally enhanced as we add in the extra dimensions of case slot instantiation and verb classes. The orthogonality of the three dimensions is demonstrated by the incremental performance improvement as we add in extra parameter types.</Paragraph>
      <Paragraph position="4"> This finding provides evidence for our earlier claims about selection in RCCs being based on the combination of head noun semantics, verb classes and information about what case slots are vacant in the relative clause.</Paragraph>
      <Paragraph position="5"> To determine if the 90.2% upper bound on classification accuracy for the given experimental setup is due to limitations in the particular resources we are using or an inherent bound on the RCC interpretation task as defined herein, we performed a manual annotation task involving 4 annotators and 100 randomly-selected RCCs, taken from the 5143 RCCs used in this research. The mean agreement between the annotators was 90.0%, coinciding remarkably well with the 90.2% figure. This provides extra evidence for the success of the proposed method, and suggests that there is little room for improvement given the current task definition.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML