File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0607_evalu.xml
Size: 4,453 bytes
Last Modified: 2025-10-06 13:58:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0607"> <Title>EBLA: A Perceptually Grounded Model of Language Acquisition</Title> <Section position="6" start_page="0" end_page="75" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> EBLA was evaluated using three criteria. First, overall success was measured by comparing the number of correct entity-lexeme mappings to the total number of entities detected. Second, acquisition speed was measured by comparing the average number of experiences needed to resolve a word in comparison to the total number of experiences processed. Third, descriptive accuracy was measured by presenting EBLA with new, unlabeled experiences, and determining its ability to generate protolanguage descriptions based on prior experiences. null The test sets for EBLA were comprised of eight simple animations created using Macromedia Flash, and 319 short digital videos. While the results for the animations were somewhat better than those for the videos, only the results for the larger and more complex video test set will be presented here.</Paragraph> <Paragraph position="1"> Of the 319 videos, 226 were delivered to EBLA for evaluating lexical acquisition accuracy and speed and 167 were delivered to EBLA for evaluating descriptive accuracy. Videos were removed from the full set of 319 because of problems with over and undersegmentation in the vision processing system. Figure 4 demonstrates the types of problems encountered by EBLA's vision system. It shows the polygon tracings for three frames from a single video shot with the Garfield toy. The frame on the left was correctly segmented, the frame in the middle was undersegmented where the hand has been merged into the background and essentially disappeared, and the frame on the right was oversegmented where the Garfield toy has been split into two objects.</Paragraph> <Section position="1" start_page="0" end_page="75" type="sub_section"> <SectionTitle> Videos </SectionTitle> <Paragraph position="0"> To measure acquisition speed and accuracy, the 226 videos were delivered to EBLA at random, ten times for each of nineteen different minimum standard deviation (s min ) values. The value of s min used to match the attribute values to existing entities was varied from 5% to 95% in increments of 5%.</Paragraph> <Paragraph position="1"> Figure 5 shows the success rates for lexeme mappings for each of the nineteen s min values. For s min values of 5% and 10%, the acquisition success was only 76% and 85% respectively. This can be attributed to the amount of variation in the entities for the videos. A stricter matching criteria results in more unmatched entities. For all of the other s min values the acquisition success rate was better than 90% and as high as 95.8% for a s min value of 45%.</Paragraph> <Paragraph position="2"> For the lower values of s min , there were very few incorrect descriptions, but many entities did not map to a known lexeme. As s min was increased, the situation reversed with almost every entity mapping to some lexeme, but many to the wrong lexeme. The most accurate descriptions were produced for a s min value of 15% where just over 65% of the entities were described correctly. These are reasonably good results considering the amount that any given entity varied from video to video, especially the object-object relation entities. For a full discussion of both the animation and video results for EBLA see chapter 6 of Pangburn (2002). Figure 6 displays the average acquisition speed for the videos. It indicates that for the first few videos, it took an average of over twenty experiences to resolve all of the entity-lexeme mappings. After about seventyfive experiences had been processed, this average dropped to about five experiences, and after about 150 experiences, the average fell below one. To evaluate the descriptive accuracy of EBLA, 157 of the 167 best videos were randomly processed in acquisition mode and the remaining ten were processed in description mode. This scenario was run ten times for each of the same nineteen s min values used to evaluate acquisition success. The results are shown in table 2. It is important to note that for a given s min value, EBLA often returned multiple &quot;matching&quot; lexemes. When this happened, both the correct and incorrect lexemes were scored pro-rata.</Paragraph> </Section> </Section> class="xml-element"></Paper>