XML Viewer - w98-1224

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/w98-1224_concl.xml
Size: 6,594 bytes
Last Modified: 2025-10-06 13:58:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1224">
  <Title>Do Not Forget: Full Memory in Memory-Based Learning of Word Pronunciation *</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Discussion and future research
</SectionTitle>
    <Paragraph position="0"> As previous research has suggested (Daelemans, 1996; Daelemans, Van den Bosch, and Weijters, 1997a; Van den Bosch, 1997), keeping full memory in memory-based learning of word pronunciation strongly appears to yield optimal generalisation accuracy. The experiments in this paper show that optimi~tion of memory use in memory-based learning while preserving generalisation accuracy can only be performed by (i) replacing instance tokens by instance types with frequency information, and (ii) removing minority ambiguities. Both optimi~tions can be performed straightforwardly; minority ambiguities can be traced with less effort than by using class-prediction strength. Our implementation of IB1-I6 described in (Daelemans and Van den Bosch, 1992; Daelemans, Van den Bosch, and Weijters, 1997b) already makes use of this knowledge, albeit partially (it stores class distributions with letterwindow types).</Paragraph>
    <Paragraph position="1"> Our results also show that atypicality, non-typicality, and typicality (Zhang, 1992), and friendly-neighbourhood size are all estimates of exceptionality that indicate the importance of instance types for classification, rather than their removability. As far as these estimates of exeeptionality are viable, our results suggest that exceptions should be kept in memory and not be thrown away.</Paragraph>
    <Paragraph position="2"> van den Bosch and Daelemans 201 Memory-Based Learning of Word Pronunciation  numbers of edited instances, according to the tested exeeptionality criteria atypical, typical, boundary, small neighbourhood, low prediction strength, and random selection. Performances, denoted by points, are measured when 1%, 2%, 5%, and 10% of the most exceptional instance types ate edited. Lazy vs. eager; not stable vs. unstable F~om the results in this paper and those reported eatlier (Daelemans, Van den Bosch, and Weijters, 1997a; Van den Bosch, 1997), it appeats that no compromise can be made on memory-base learning in terms of abstraction by forgetting without losing generalisation accuracy. Consistently lower performances axe obtained with algorithms that forget by constructing decision trees or connectionist networks, or by editing instance types. Generalisation accuracy appears to be related to the dimension lazyeager leaxning; for the Gs task (and for many other language tasks, (Daelemans, Van den Bosch, and Weijtezs, 1997a)), it is demonstrated that memory-based lazy leatning leads to the best generalisation accuracies, Another explanation for the difference in performance between decision-tree, connectionist, and editing methods versus pure memory-based leaxning is that the former generally display high ~ar/ance, which is the portion of the generalisation error caused by the u~tabili~/of the learning algorithm (Breiman, 1996a). An algorithm is unstable when small perturbations in the learning material lead to large differences in induced models, and stable othezwise; pure memory-based learning algorithms axe said to be very stable, and decision-tree algorithms and conneetionist learning to be unstable (Breiman, 1996a). High variance is usually coupled with low bias, i.e., unstable leaxning algorithms with high vaziance tend to have few limitations in the fxeedom to approximate the task or function to be leaxned) (Bzeiman, 1996b). Breiman points out that often the opposite also holds: a stable classitiez with a low variance can display a high bias when it cannot represent data adequately in its available set of models, but it is not cleat whether or how this applies to pure memory-based leatning as in ml-IG; its success in representing the Gs data and other language tasks quite adequately would rather suggest that IB 1-I6 has both low vatiance and low bias.</Paragraph>
    <Paragraph position="3"> Apatt fzom the possibility that the lazy and eager leatning algorithms investigated here and in eatllez work do not have a strongly contrasting bias, we conjecture that the editing methods discussed here, and some specific decision-tree leaxning algorithms investigated eaxlier (i.e., IGTItEE (Daclemuns, Van den Bosch, and Weijters, 1997b), a decision tree learning algorithm that is an approximate optimisation of IBI-IG) have a slmilat vatia~lce to that of IB1-IG; they axe virtually as stable as ~I-IQ. We base this conjecture on the fact that the standard deviations of both decision-tree learning and memory-based learning trained and tested on the GS data axe not only very small (in the order of 1/10 percents), but also hatdiy different (cf. (Van den Bosch, 1997) for details and examples). Only counectionist networks trained with back-propagation and decision-tree leaxning with pruning display latger standard deviations when accuracies ate averaged over expervan den Bosch and Daelemans 202 Memory-Based Learning of Word Pronunciation  iments (Van den Bosch, 1997); the stable-unstable dimension might play a role there, but not in the difference between pure memory-based learning and edited memory-based learning.</Paragraph>
    <Paragraph position="4"> Future research The results of the present study suggest that the following questions be investigated in future research: null , The tested criteria for editing can be employed as instance weights as in EACH (Salzberg, 1990) and PEI3LS (Cost and Salzberg, 1993), rather than as criteria for instance removal.</Paragraph>
    <Paragraph position="5"> Instance weighting, preserving pure memory-based learning, may add relevant information to similarity matching, and may improve IB1-IG~s performance.</Paragraph>
    <Paragraph position="6"> . Different data sets of different sizes may contain different portions of atypical instances or minority ambiguities. Moreover, data sets may contain pure noise. While atypical or exceptional instances may (and do) return in test material, the chances of noise to return is relativdy minute. Our results generalise to data sets with approximately the characteristics of the Gs dataset. Although there are indications that data sets representing other language tasks indeed share some essential characteristics (e.g., memory-based learning is consistently the best-performlng algorithm), more investigation is needed to make these characteristics explicit.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML