XML Viewer - w03-0607

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0607_metho.xml
Size: 15,733 bytes
Last Modified: 2025-10-06 14:08:26
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0607">
  <Title>EBLA: A Perceptually Grounded Model of Language Acquisition</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 EBLA Model
</SectionTitle>
    <Paragraph position="0"> The EBLA Model (Pangburn 2002) operates by observing a series of &amp;quot;experiences&amp;quot; in the form of short movies. Each movie contains a single event such as an arm/hand picking up a ball, and takes the form of either an animation or an actual video. The model detects any significant objects in each movie and determines what, if any, relationships exist among those objects. This information is then stored so that repeatedly occurring  As part of each experience, EBLA receives a textual description of the event taking place. These descriptions are comprised of protolanguage such as &amp;quot;hand pickup ball.&amp;quot; To acquire this protolanguage, EBLA must correlate the lexical items in the descriptions to the objects and relations in each movie. Figure 1 provides a graphical representation of the method used by EBLA to process experiences.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Model Abstractions and Constraints
</SectionTitle>
      <Paragraph position="0"> The EBLA Model has been constrained in several ways.</Paragraph>
      <Paragraph position="1"> First, the model's perceptual capabilities are limited to a two-dimensional vision system that reduces objects to single color polygons.</Paragraph>
      <Paragraph position="2"> Second, the model has not been provided with any audio processing capabilities. Because of this, all experience descriptions presented to or generated by EBLA are textual.</Paragraph>
      <Paragraph position="3"> Third, the model only attempts to acquire a protolanguage of nouns and verbs. Thus, syntax, word order, punctuation, etc. do not apply. This conforms with early human language acquisition since children do not begin to use phrases and clauses until somewhere between eighteen and thirty-six months of age (Calvin and Bickerton 2001).</Paragraph>
      <Paragraph position="4"> The final constraint on EBLA is that it only operates in an unsupervised mode. This means that the model does not receive any sort of feedback regarding its accuracy. This is definitely a worst-case scenario since children receive frequent social mediation in all aspects of development.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Experiences Processed by the EBLA Model
</SectionTitle>
      <Paragraph position="0"> The experiences processed by the EBLA Model are based on simple spatial-motion events, and take the form of either animations or real videos. Each experience contains an arm/hand performing some simple action on a variety of objects. For the animations, the actions include pickup, putdown, touch, and slide, and the objects include a green ball and a red cube (see figure 2). For the real videos, the actions include push, pull, slide, touch, tipover, roll, pickup, putdown, drop, and tilt, and the objects include several colored bowls, rings, and cups, a green ball, a dark blue box, a blue glass vase, a red book, and an orange stuffed Garfield cat (see figure 3).</Paragraph>
      <Paragraph position="1"> hand pickup ball hand touch ball hand putdown cube Figure 2. Frames from Various Animations Processed by EBLA All of the videos were shot two to three times from both the left and right side of a makeshift stage. Angle of approach, grasp, and speed were varied at random. Multiple actions were performed on each object, but the actual object-event combinations varied somewhat based on what was feasible for each object. Dropping the glass vase, for example, seemed a bit risky.</Paragraph>
      <Paragraph position="2"> hand push vase hand roll ring hand touch garfield hand tipover cup hand pickup ball hand pull book</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
by EBLA
3.3 Entity Recognition
</SectionTitle>
      <Paragraph position="0"> The EBLA Model has a basic perceptual system, which allows it to &amp;quot;see&amp;quot; the significant objects in each of its experiences. It identifies and places polygons around the objects in each video frame, using a variation of the mean shift analysis image segmentation algorithm (Comaniciu 2002). EBLA then calculates a set of static attribute values for each object and a set of dynamic attribute values for each object-object relation. The sets of attribute-value pairings are very similar to the linking feature structures (f-structs) used by Bailey (1997).</Paragraph>
      <Paragraph position="1"> Each unique set of average attribute values defines an entity, and is compared to the entities from prior experiences. In order to match existing entities with those in the current experience, the existing entity must have average values for all attributes within a single standard deviation (s) of the averages for the current entity.</Paragraph>
      <Paragraph position="2"> When this occurs, the current entity is merged with the existing entity, creating a more prototypical entity definition. Otherwise, a new entity definition is established.</Paragraph>
      <Paragraph position="3"> To prevent entity definitions from becoming too narrowly defined, a minimum standard deviation (s min ) is established as a percentage of each average attribute value. In essence, s min defines how much two entities must differ to be considered distinct, and thus can have a significant impact on the number of unique entities recognized by EBLA.</Paragraph>
      <Paragraph position="4"> Both the object and relation attributes for EBLA were determined experimentally based on data available from the computer vision system. To aid in the debugging and evaluation of EBLA as well as to restrict any assumptions about early perception in children, an effort was made to keep the attributes as simple as possible. The five object attributes and seven relation attributes  object horizontal coordinate of object's center of gravity relative to the width of a bounding rectangle around the object</Paragraph>
      <Paragraph position="6"> object vertical coordinate of object's center of gravity relative to the height of a bounding rectangle around the object contact relation Boolean value indicating if two objects are in contact with one another x-relation relation indicates whether one object is to the left of, on top of, or to the right of another object y-relation relation indicates whether one object is above, on top of, or below another object delta-x relation indicates whether the horizontal distance between two objects is increasing, decreasing, or unchanged delta-y relation indicates whether the vertical distance between two objects is increasing, decreasing, or unchanged x-travel relation indicates direction of horizontal travel for both objects y-travel relation indicates direction of vertical travel for  Because average attribute values are used to define entities, temporal ordering is not explicitly stored in EBLA. Rather, the selected relation attributes implicitly indicate how objects interact over time. For example, EBLA is able to distinguish between pickup and putdown entities using the average &amp;quot;delta-y&amp;quot; attribute value--for pickup, the vertical distance between the two objects involved is decreasing over the experience and for putdown, the vertical distance is increasing. Currently, object entities are defined using all of the object attributes, and relation entities are defined using all of the relation attributes. There is no mechanism to drop attributes that may not be relevant to a particular entity. For example, grayscale color value may not have anything to do with whether or not an object is a ball, but EBLA would likely create separate entities for a light-colored ball and a dark-colored ball.</Paragraph>
      <Paragraph position="7"> A variation of the model-merging algorithm employed by Bailey (1997) could be applied to drop attributes unrelated to the essence of a particular entity. Because EBLA currently uses a limited number of attributes, dropping any would likely lead to overgeneralization of entities, but with more attributes, it could be a very useful mechanism. Such a mechanism would also improve EBLA's viewpoint invariance. For example, when detecting a putdown object-object relation, EBLA is not affected by small to moderate changes in angle, distance, or objects involved, but is affected by the horizontal orientation. Dropping the &amp;quot;x-relation&amp;quot; and &amp;quot;x-travel&amp;quot; attributes from the putdown entity would remedy this.</Paragraph>
      <Paragraph position="8"> Work is underway to determine how to incorporate a 3D graphics engine into EBLA in order to build a more robust perceptual system. While this would obviously limit the realism, it would allow for the quick addition of attributes for size, volume, distance, texture, speed, acceleration, etc. Another option is to develop new attribute calculators for the current vision system such as those employed by Siskind (2000) to determine force dynamic properties.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Lexical Acquisition
</SectionTitle>
      <Paragraph position="0"> Once EBLA has generated entities for the objects and object-object relations in each experience, its final task is to map those entities to the lexemes (words) in protolanguage descriptions of each experience. Protolanguage was chosen because it is the first type of language acquired by children. The particular variety of protolanguage used for the EBLA's experience descriptions has the following characteristics:  1. Word order is not important, although the descriptions provided to EBLA are generally in the format: subject-manipulation-object (e.g. &amp;quot;hand touch ball&amp;quot;).</Paragraph>
      <Paragraph position="1"> 2. Verbs paired with particles are combined into a single word (e.g. &amp;quot;pick up&amp;quot; becomes &amp;quot;pickup&amp;quot;). 3. Words are not case-sensitive (although there is an option in EBLA to change this).</Paragraph>
      <Paragraph position="2"> 4. Articles (e.g. &amp;quot;a,&amp;quot; &amp;quot;an,&amp;quot; &amp;quot;the&amp;quot;) can be added to de- null scriptions, but are generally uninterpretable by EBLA.</Paragraph>
      <Paragraph position="3"> It should be noted that EBLA is not explicitly coded to ignore articles, but since they are referentially ambiguous when considered as individual, unordered lexemes, EBLA is unable to map them to entities. Adding articles to the protolanguage descriptions generally slows down EBLA's average acquisition speed.</Paragraph>
      <Paragraph position="4"> In order to map the individual lexemes in the protolanguage descriptions to the entities in each experience, EBLA must overcome referential ambiguity. This is because EBLA operates in a bottom-up fashion and is not primed with any information about specific entities or lexemes. If the first experience encountered by EBLA is a hand sliding a box with the description &amp;quot;hand slide box,&amp;quot; it has no idea whether the lexeme &amp;quot;hand&amp;quot; refers to the hand object entity, the box object entity, or the slide relation entity. This same referential ambiguity exists for the &amp;quot;slide&amp;quot; and &amp;quot;box&amp;quot; lexemes. EBLA can only overcome this ambiguity by comparing and contrasting the current experience with future experiences. This process of resolving entity-lexeme mappings is a variation of the cross-situational learning employed by Siskind (1992; 1997).</Paragraph>
      <Paragraph position="5"> For each experience, two lists are created to hold all of the unresolved entities and lexemes. EBLA attempts to establish the correct mappings for these lists in three stages:  1. Lookup any known resolutions from prior experiences. null 2. Resolve any single remaining entity-lexeme pairings. null 3. Apply cross-situational learning, comparing unre null solved entities and lexemes across all prior experiences, repeating stage two after each new resolution.</Paragraph>
      <Paragraph position="6"> To perform the first stage of lexical resolution, EBLA reviews known entity-lexeme mappings from prior experiences. If any match both an entity and lexeme in the current experience, those pairings are removed from the unresolved entity and lexeme lists. The second stage operates on a simple process of elimination principal. If at any point during the resolution process both the unresolved entity and lexeme lists contain only a single entry, it is assumed that those entries map to one another. In addition, prior experiences are searched for the same entity-lexeme pairing and resolved if found. Since resolving mappings in prior experiences can generate additional instances of single unmapped pairings, the entire second stage is repeated until no new resolutions are made.</Paragraph>
      <Paragraph position="7"> The third and final stage of resolution is by far the most complex and involves a type of cross-situational inference. Basically, by comparing the unresolved entities and lexemes across all experiences in a pair wise fashion, EBLA can infer new mappings. If the cardinality of the intersection or difference between the unmapped entities and lexemes for a pair of experiences is one, then that intersection or difference defines a mapping. In more formal terms:  1. Let i and j be any two experiences, i [?] j.</Paragraph>
      <Paragraph position="8"> 2. Let E</Paragraph>
      <Paragraph position="10"> [?] unmapped entities for i and j respectively.</Paragraph>
      <Paragraph position="12"> [?] unmapped lexemes for i and j respectively. null</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4. If |{E
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> To demonstrate how all three stages work together, consider the following example. If the model was exposed to an experience of a hand picking up a ball with the description &amp;quot;hand pickup ball&amp;quot; followed by an experience of a hand picking up a box with the description &amp;quot;hand pickup box,&amp;quot; it could take the set differences discussed in stage three for the two experiences to resolve the &amp;quot;ball&amp;quot; lexeme to the ball entity and the &amp;quot;box&amp;quot; lexeme to the box entity. Assuming that these were the only two experiences presented to the model, it would not be able to resolve &amp;quot;hand&amp;quot; or &amp;quot;pickup&amp;quot; to the corresponding entities because of referential ambiguity. If the model was then exposed to a third experience of a hand putting down a ball with the description &amp;quot;hand putdown ball,&amp;quot; it could resolve all of the remaining mappings for all three experiences. Using the technique discussed in stage one, it could resolve &amp;quot;ball&amp;quot; based on known mappings from the prior experiences. It could then take the set intersection with the unmapped items in either of the first two experiences to resolve &amp;quot;hand.&amp;quot; This would leave a single unmapped pairing in each of the three experiences, which could be resolved using the process of elimination discussed in stage two. Note that taking the set difference rather than the intersection between the third and first or second experiences would have worked equally well to resolve &amp;quot;hand pickup&amp;quot; and &amp;quot;hand putdown.&amp;quot;</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML