XML Viewer - w98-1224

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1224_intro.xml
Size: 8,072 bytes
Last Modified: 2025-10-06 14:06:44
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1224">
  <Title>Do Not Forget: Full Memory in Memory-Based Learning of Word Pronunciation *</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Memory-based learning of classification tasks is a branch of supervised machine learning in which the learning phase consists simply of storing all encountered instances from a training set in memory (Aha, 1997). Memory-based learning algorithms do not invest effort during learning in abstracting from the tr-lnlng data, such as eager-learning (e.g., decision-tree algorithms, rule-induction, or connectionist-learning algorithms, (Qululan, 1993; Mitchell, 1997)) do. Rather, they defer investing effort until new instances axe presented. On being presented with an instance, a memory-based *This research was done in the context of the &amp;quot;Induction of Linguistic Knowledge&amp;quot; research programme, partially supported by the Foundation for Language Speech and Logic (TSL), which is funded by the Netherlands Organization for Scientific Research (NWO). Part of the first author's work was performed at the Department of Computer Science of the Unlversiteit Maastricht.</Paragraph>
    <Paragraph position="1"> learning algorithm searches for a best-matching instance, or, more generically, a set of the k best-matching instances in memory. Having found such a set of h best-matching instances, the algorithm takes the (majority) class with which the instances in the set axe labeled to be the class of the new instance. Pure memory-based learning algorithms implement the classic k-nearest neighbour algorithm (Cover and Hart, 1967; Devijver and Kittler, 1982; Aha, Kibler, and Albert, 1991); in different contexts, memory-based learning algorithms have also been named lazy, instance-based, exemplarbased, memory-based, case-based learning or reasoning (Stanfdl and Waltz, 1986; Kolodner, 1993; Aha, Kibler, and Albert, 1991; Aha, 1997)) Memory-based learning has been demonstrated to yield accurate models of various natural language tasks such as grapheme-phoneme conversion, word stress assignment, part-of-speech tagging, and PP-attachment (Daelemans, Van den Bosch, and Weijters, 1997a). For example, the memory-based learning algorithm ml-IG (Daelemans and Van den Bosch, 1992; Daclemans, Van den Bosch, and We~jters, 1997b), which extends the well-known ml algorithm (Aha, Kibler, and Albert, 1991) with an information-gain weighted similaxity mettic, has been demonstrated to perform adequately and, moreover, consistently and significantly better than eager-lea~'ning algorithms which do invest effort in abstraction during learning (e.g., decision-tree learning (Daelemans, Van den Bosch, and Weijters, 1997b; Quinlan, 1993), and connectionist learning (Rumelhart, Hinton, and Williams, 1986)) when trained and tested on a range of morpho-phonological tasks (e.g., morphological segmentation, grapheme-phoneme conversion, syllabitlcation, and word stress assignment) (Daelemans, Gillis, and Durieux, 1994; Van den Bosch, Daelemans, and We~jters, 1996; Van den Bosch, 1997). Thus, when learning NLP tasks, the abstraction oeeurnng in decision trees (i.e., the explicit forgetting of information considered to be redundant) and in conneetionist networks (i.e., a non-symbolic encoding and decoding in relatively small numbers of connection van den Bosch and Daelemans 195 Memory-Based Learning of Word Pronunciation Antal van den Bosch and Walter Daelemans (1998) Do Not Forget: Full Memory in Memory-Based Learning of Word Pronunciation. In D.M.W. Powers (ed.) NeMLaP3/CoNLL98: New Methods in Language Processing and Computational Natural Language Learning, ACL, pp 195-204.</Paragraph>
    <Paragraph position="2"> weights) both hamper accurate generalisation of the learned knowledge to new material.</Paragraph>
    <Paragraph position="3"> These findings appear to contrast with the general assumption behind eager learning, that data representing real-world classification tasks tends to contains (i) redundancy and (ii) exceptions: redundant data can be compressed, yielding smaller descriptions of the original data; some exceptions (e.g., low-frequency exceptions) can (or should) be discarded since they are expected to be bad predictors for classhying new (test) material. However, both redundancy and exeeptionality cannot be computed trivially; heuristic functions are generally used to estimate them (e.g., functions from ixLformation theory (Qnlnl~m, 1993)). The lower generalization accuracies of both decision-tree and eonnectionist learning, compared to memory-based learning, on the above-mentioned NLP tasks, suggest that these heuristic estimates may not be the best choice for learning NLP tasks. It appears that in order to learn such tasks successfully, a learning algorithm should not forget (i.e., explicitly remove from memory) any information contained in the learning material: it should not abstract from the individual instances.</Paragraph>
    <Paragraph position="4"> An obvious type of abstraction that is not harmful for generalisation accuracy (but that is not always acknowledged in implementations of memory-based learning) is the straightforward abstraction from tokens to types with frequency information.</Paragraph>
    <Paragraph position="5"> In general, data sets representing natural language tasks, when large enough, tend to contain considerable numbers of duplicate sequences mapping to the same output or class. For example, in data representing word pronunciations, some sequences of letters, such as ing at the end of English words, occur hundreds of times, while each of the sequences is pronounced identically, viz. /llJ/. Instead of storing all individual sequence tokens in memory, each set of identical tokens can be safely stored in memory as a single sequence type with frequency information, without loss of generalisation accuracy (Daeleroans and Van den Bosch, 1992; Daelemans, Van den Bosch, and Weijters, 1997b). Thus, forgetting instance tokens and replacing them by instance types may lead to considerable computational optlmi~ations of memory-based learning, since the memory that needs to be searched may become considerably smaller * Given the safe, performance-preserving optlmi-e~tion of replacing sets of instance tokens by instance types with frequency information, a next step of investigation into optlmlsing memory-based learning is to measure the effects offorge~ing instance types on grounds of their exceptionality, the underlying idea being that the more exceptional a task instance type is, the more likely it is that it is a bad predictor for new instances. Thus, exceptionality should in some way express the unsuitability of a task instance type to be a best match (nearest neighbour) to new instances: it would be unwise to copy its associated classification to best-matching new instances. In this paper, we investigate three criteria for estimating an instance type's exceptionality, and removing instance types estimated to be the most exceptional by each of these criteria. The criteria investigated are  1. typicality of instance types; 2. class prediction strength of instance types; 3. fi-iendly-neighbourhood size of instance types; 4. random (to provide a baseline experiment).</Paragraph>
    <Paragraph position="6">  We base our experiments on a large data set of English word pronunciation. We briefly describe this data set, and the way it is converted into an instance base fit for memotT-based learning, in Section 2. In Section 3 we describe the settings of our experiments and the memory-based learning algorithm IBI-Io with which the experiments are performed. We then turn to describing the notions of typicality, class-prediction strength, and friendly-neighbourhood size, and the functions to estimate them, in Section 4. Section 5 provides the experimental results. In Section 6, we discuss the obtained results and formulate our conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML