File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0605_intro.xml

Size: 7,581 bytes

Last Modified: 2025-10-06 14:01:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0605">
  <Title>An Architecture for Word Learning using Bidirectional Multimodal Structural Alignment</Title>
  <Section position="3" start_page="1" end_page="2" type="intro">
    <SectionTitle>
2 General Architecture
</SectionTitle>
    <Paragraph position="0"> We propose an architecture to answer the following question: assuming that a new word is embedded in a phrase with other previously acquired words, how can we exploit this linguistic context to focus on the fragments of non-linguistic input most likely to correspond with the new word? The semantic principle of compositionality states that the meaning of any expression (such as a phrase) is a function of the meaning of its sub-expressions, where the particular function is determined by the method of composition. For example, the expression &amp;quot;The quick fox jumped over the log&amp;quot; can be considered a composition of the sub-expressions &amp;quot;The quick</Paragraph>
    <Paragraph position="2"> tures infers the correspondences C-1 and 2-B(D).</Paragraph>
    <Paragraph position="3"> Structural alignment between semantic representations will bring unknown words into correspondence with their probable semantics.</Paragraph>
    <Paragraph position="4"> fox&amp;quot; and &amp;quot;over the log&amp;quot;, where syntactic composition with &amp;quot;jumped&amp;quot; is the method of composition. In other words, the semantics of this sentence can be expressed: jumped(SEMANTICS(&amp;quot;The quick fox&amp;quot;), SE-MANTICS(&amp;quot;over the log&amp;quot;)). Recursive application of this principle reveals that the semantic value of an expression is a structured representation.</Paragraph>
    <Paragraph position="5"> Intuitively, then, we can approach our bootstrapping problem by structural alignment (Gentner and Markman, 1997). Structural alignment is a process in which corresponding elements in two structured representations are identified by matching. Correspondence between non-matching elements is then implied by the structural constraints of the representations.</Paragraph>
    <Paragraph position="6"> For example, in Figure 1, structural alignment first matches A, E, and F between the two representations.</Paragraph>
    <Paragraph position="7"> Then, based on structural constraints, 1 is inferred to correspond with C, and 2 with B(D). In our architecture, structural alignment of the semantics of known words (and linguistic constituents formed thereof) with semantic structures observed in the non-linguistic domain will cause an alignment of unknown words with probable corresponding semantic fragments, thereby achieving our word learning goal of exploiting linguistic context to focus on fragments of the semantic input.</Paragraph>
    <Paragraph position="8"> The remainder of this section describes the representations and methods required by a system seeking to implement this general architecture.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Semantic Representation
</SectionTitle>
      <Paragraph position="0"> In order to perform structural alignment, the representation for the semantic domain must have several key properties: null * The domain must be a structural representation and it must be symbolic, in order to allow alignment of symbols.</Paragraph>
      <Paragraph position="1"> * For inferences made from structural alignment to be valid, the representation must obey the principle of compositionality.</Paragraph>
      <Paragraph position="2"> * The representation should contain orthogonal elements (i.e. the same piece of semantics is not encoded into multiple symbols) so that there are canonical ways of expressing particular meanings. * Finally, the semantic representation must be lexicalized, implying that the semantics of any linguistic phrase can be cleanly divided amongst the phrase's constituent words. Each word should get a single connected semantic structure that does not share semantic symbols with any other word.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.2 Semantic Processing
</SectionTitle>
      <Paragraph position="0"> It is likely that the actual non-linguistic input modality will not be an appropriate structured symbolic representation. For example, the visual, aural, and kinesthetic modalities are non-symbolic. In any system dealing with such an input modality, it will be necessary to have modules that extract structured symbolic representations from the unstructured input.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
2.3 Linguistic Processing
</SectionTitle>
      <Paragraph position="0"> One challenge in performing structural alignment against language input is that the structured semantic representation of the linguistic input is implicit rather than explicit. Therefore, we need methods for parsing and an appropriate grammar. The grammar and parsing algorithms we choose must support several non-standard features.</Paragraph>
      <Paragraph position="1"> First, we expect to encounter word meanings which are unknown, so our selected techniques must support gaps in the parse. We also require a reversible grammar, so that, when presented with the meaning of an entire expression and the meaning of some of its subexpressions, we can infer the meaning of the remaining subexpressions.</Paragraph>
      <Paragraph position="2">  Although it may not be required, parsing techniques that use partial structural alignment are preferred. Words and phrases have many possible interpretations and this problem is exacerbated by unknown words in the linguistic input. Since targets for the parse are available in the semantic input domain, use of these targets to guide the search through the space of possible linguistic interpretations is advantageous. Increasing structural alignment between the parsed semantics and the input semantics could be such a guiding heuristic. As a side effect of using structural alignment as a parsing heuristic, we should expect the parser to manipulate partial semantic and syntactic structures throughout the parsing process, as opposed to generating semantics from a completed syntactic parse tree after parsing is completed.</Paragraph>
      <Paragraph position="3">  Mathematically, this is equivalent to saying that there will be cases where we know a, f, and x in the equality a = f(x, y), and we want to be able to infer the value of y. In order to do so, we must be able to compute the functional inverse of f with respect to y. That is, we want the function f  plementation. Our learning-enabling structure is based on a semantic representation (Section 4) which is obtained by translating video inputs (Section 6). We then use a bidirectional search process to parse the linguistic input and to structurally align linguistic semantics with the non-linguistic semantics. (Section 5).</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
2.4 Structural Alignment
</SectionTitle>
      <Paragraph position="0"> Gentner and Markman (1997) describe the requisite components of a structural alignment system as (1) methods for matching structural atoms, (2) methods for identifying sets of compatible atom matches (for example, ruling out cases in which two atoms in one structure map to the same atom in another structure), and (3) methods using atom matches to guide the matching of large portions of structure.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML