File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-0702_metho.xml
Size: 5,752 bytes
Last Modified: 2025-10-06 14:07:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0702"> <Title>Combining a self-organising map with memory-based learning</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 A hybrid SOM/MBL classifier </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Labelled SOM and MBL (LSOMMBL) </SectionTitle> <Paragraph position="0"> A modified SOM was developed called Labelled SOM. Training proceeds as follows: a0 Each training item has an associated label. Initially all the map units are unlabelled.</Paragraph> <Paragraph position="1"> a0 When an item is presented, the closest unit out of those with the same label as the input and those that are unlabelled is chosen as the winner. Should an unlabelled unit be chosen it gets labelled with the input's label.</Paragraph> <Paragraph position="2"> a0 The weights for neighbouring units are updated as with the standard SOM if they share the same label as the input or are unlabelled.</Paragraph> <Paragraph position="3"> a0 When training ends, all the training inputs are presented to the SOM and the winners for each training input noted. Unused units are discarded.</Paragraph> <Paragraph position="4"> Testing proceeds as follows: a0 When an input is presented a winning unit is found for each category.</Paragraph> <Paragraph position="5"> a0 The closest match is selected from the training items associated with each of the winning units found.</Paragraph> <Paragraph position="6"> a0 The most frequent classification for that match is chosen.</Paragraph> <Paragraph position="7"> It is thus hoped that the closest matches for each category are found and that these will include the closest match amongst them.</Paragraph> <Paragraph position="8"> Assuming each unit is equally likely to be chosen, the average number of comparisons here is given bya1a3a2a5a4a7a6a9a8a11a10a12a4a14a13 wherea1 is the number of categories, a4 is the number of units in the map and a8 is the number of training items. Choosing</Paragraph> <Paragraph position="10"> the experiments the size of the map was chosen to be close to a16 a8 . This system is referred to as LSOMMBL.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 SOM and MBL (SOMMBL) </SectionTitle> <Paragraph position="0"> In the experiments, a comparison with using the standard SOM in a similar manner was performed. Here the SOM is trained as normal on the training items.</Paragraph> <Paragraph position="1"> At the end of training each item is presented to the SOM and the winning unit noted as with the modified SOM above. Unused units are discarded as above.</Paragraph> <Paragraph position="2"> During testing, a novel item is presented to the SOM and the top C winners chosen (i.e. the a1 closest map units), where C is the number of categories. The items associated with these winners are then compared with the novel item and the closest match found and then the most frequent classification of that match is taken as before. This system is referred to as SOMMBL.</Paragraph> <Paragraph position="3"> 5 The task: Base NP chunking The task is base NP chunking on section 20 of the Wall Street Journal corpus, using sections 15 to 18 of the corpus as training data as in (Ramshaw and Marcus, 1995). For each word in a sentence, the POS tag is presented to the system which outputs whether the word is inside or outside a base NP, or on the boundary between 2 base NPs.</Paragraph> <Paragraph position="4"> Training items consist of the part of speech (POS) tag for the current word, varying amounts of left and right context (POS tags only) and the classification frequencies for that combination of tags. The tags were represented by a set of vectors. 2 sets of vectors were used for comparison. One was an orthogonal set with a vector of all zeroes for the &quot;empty&quot; tag, used where the context extends beyond the beginning/end of a sentence. The other was a set of 25 dimensional vectors based on a representation of the words encoding the contexts in which each word appears in the WSJ corpus. The tag representations were obtained by averaging the representations for the words appearing with each tag. Details of the method used to generate the tag representations, known as lexical space, can be found in (Zavrel and Veenstra, 1996). Reilly (1998) found it beneficial when training a simple recurrent network on word prediction.</Paragraph> <Paragraph position="5"> The self-organising maps were trained as follows: null a0 For maps with 100 or more units, the training lasted 250 iterations and the neighbourhood started at a radius of 4 units, reducing by 1 unit every 50 iterations to 0 (i.e. where only the winner's weights are modified). The learning rate was constant at 0.1.</Paragraph> <Paragraph position="6"> a0 For the maps with 6 units, the training lasted 90 iterations, with an initial neighbourhood of 2, reducing by one every thirty iterations.</Paragraph> <Paragraph position="7"> a0 For the maps with 30 units, the training lasted 150 iterations, with an initial neighbourhood of 2, reduced by one every 50 iterations. A single training run is reported for each network since the results did not vary significantly for different runs.</Paragraph> <Paragraph position="8"> These map sizes were chosen to be close to the square root of the number of items in the training set. No attempt was made to systematically investigate whether these sizes would optimise the performance of the system. They were chosen purely to minimise the number of comparisons performed.</Paragraph> </Section> </Section> class="xml-element"></Paper>