File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1050_metho.xml

Size: 13,069 bytes

Last Modified: 2025-10-06 14:13:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1050">
  <Title>Analysis of Scene Identification Ability of Associative Memory with Pictorial Dictionary</Title>
  <Section position="5" start_page="311" end_page="311" type="metho">
    <SectionTitle>
3 Representation and Process-
</SectionTitle>
    <Paragraph position="0"> ing Theory ~ ::: ......... :...x, ...... ::~ i ~: :~,g ! ::~+ ~zi~;~:~;: iL.'~ i: : ,,',~ if:t .......... :~ .:2:?:'~ &amp;quot;&gt;.':&amp;quot;5-% 11711 words, 384 scenes wall 0,01\ units o.o04~N, side 0.008~----~ wall O.Ol--~all' ~',:,~ ;~ I bookself 07251// row 0.7///...: ~--~l * :...' : * i. :</Paragraph>
    <Section position="1" start_page="311" end_page="311" type="sub_section">
      <SectionTitle>
3.1 Representation of OPED
The Oxford Pictorial English Dictionary(OPED) h,~s
</SectionTitle>
      <Paragraph position="0"> very simple form of text and picture (Fig.3). In this example, the upper part is a picture of a living room scene, and the lower part consists of words of corresponding parts as follows: i wall units</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="311" end_page="312" type="metho">
    <SectionTitle>
2 side wall
3 bookself
</SectionTitle>
    <Paragraph position="0"> OPP;I) has originally a hierachlcal structure of catego rization (as in the left side of Fig.2), but we use the middle level of it (shaded part in the figure), which is most easily interl)retal~h!.</Paragraph>
    <Paragraph position="1"> To llrovide the associative memory model for l)ro cessing words and selecting scenes, we, encode the OPED entries in tile WAVE model ms depicted in Fig.3. The weights between scene elements are automatically learned during tile constructiou of the associative memory.</Paragraph>
    <Section position="1" start_page="311" end_page="312" type="sub_section">
      <SectionTitle>
3.2 Simplified Model of Associative
Memory WAVE
</SectionTitle>
      <Paragraph position="0"> The aim of using m~sociative memory for identification is to select tile most likely scene based on incomplete word data from sentences. Ii and Ci are set to be elements of input space SI, scene space So:, respectively, in an ideal state, the approl)riate scene Ci is  mfiquely indexed by z~ssociation from a complete input vector: Ii A Ci.</Paragraph>
      <Paragraph position="1"> In the typical situation, however, the complete index is not provided and we require a way of ranking campeting scenes by defining a weighted activation value which depends on the i)artial inlmt, or set of ambiguous words, as follows:</Paragraph>
      <Paragraph position="3"> where the weight of each compone.nt is given bythe conditional probability value</Paragraph>
      <Paragraph position="5"> A maximum-likelihoad scene is selected by a winner-</Paragraph>
      <Paragraph position="7"> This type of assaeiative meinory has following featttres: null * Unlike correlative models (Amari S. and Maginu K., 1988), neither distortion of pattern nor pseudo local minimum solutions arise from memorizing other patterns.</Paragraph>
      <Paragraph position="8"> * Memory capacity is O(mn) compared to O(n &amp;quot;2) of correlative Inodel, where m is average immber of wards per scene, and n is the total number af possible words.</Paragraph>
      <Paragraph position="9"> * Unlike back-propagation learning algorithms, incremental learning is l)ossilflc at any time in WAVE.</Paragraph>
    </Section>
    <Section position="2" start_page="312" end_page="312" type="sub_section">
      <SectionTitle>
3.3 Recalling prol)ability and estima-
</SectionTitle>
      <Paragraph position="0"> tion of required quantity of infof mation Tile me`asure of scene selectivity is reduced to tile condition whether given words are unique to the SCelle. If all input words are cOlnlnon to l)lura\] scenes, they can not determine the original scene uniquely. For exampie, tile system can not determine whether to choose category CA ar CB only by seeing element q}' in Fig.4. If 'a' or tile set {a, b} is given, it is able ta select CA. Here we estimate the selectivity by the ratio of successfld cases to all of possible cases ,as follaws(n is the mlml}er of total elements, k is the number of elements related to each scene, aim m is the total number of scenes; incomplete information is dellned as a partial vector of elements number s (0 &lt; s &lt; k)).</Paragraph>
      <Paragraph position="1"> Tile pral)ability that s elements are shared si,nultaneously by two patterns is kCs-t n-kCk.-.s-1 v(,,, k, ~) = (~) n Ck Ta extend this probal)ility to generalized cases of m patterns, we use the munber s of elements of the (1)artial) input vector. It can be estimated by counting the negative ease where illore thall one pattern shares</Paragraph>
      <Paragraph position="3"> The results using this formula are shawn hi the next section.</Paragraph>
    </Section>
    <Section position="3" start_page="312" end_page="312" type="sub_section">
      <SectionTitle>
3.4 Infornmtion Entropy
</SectionTitle>
      <Paragraph position="0"> As an alternative method of ewduation of spatialsee.he information of aPED, we consider here selfinformation entropy and mntual-informatian entropy along with the information theory of Shannon</Paragraph>
      <Paragraph position="2"> Fig.5 illustrates a talking scene. Although sentences involving many ambiguous wards are handed fr&lt;&gt;m the speaker to the listener, the listener can disambiguate them with some kind of knowkedge common to these people. Conversely, the listner can determine scene 1)y the hande&lt;l sentences. The entropy of scene selection ainbiguity is reduced by the interaction. We can define a concept of self-infarmation (SI) af the spatial-scene idetification module as the entropy of ainbiguous words or scenes. Assuming equal probalfility to the scene selection with no harmed ward, the entropy of the spatial-scene identitication can be cal-</Paragraph>
      <Paragraph position="4"> After the identiticatian, the meaning of eact, word can be selected according to each a selection distril)ution flmctian updated by the Bayesian rule.</Paragraph>
      <Paragraph position="6"> Each P,j is equal to Wij as in Eq.(2). &lt;&gt; represents ensemble average over each xl.</Paragraph>
      <Paragraph position="7">  as the contribution of additional words to identify a scene, and consequently, tile selectiveness of the target word or scene. In order to select a word meauing or scene fi'om the possible space Y, the space C of M1 other words are considered in the calculation of conditional entropy (CE). Mutual-information entrot&gt;y per word is calculated by following formula:</Paragraph>
      <Paragraph position="9"> interpreted ,as the reduction from a previous conditional entropy to corresponding updated conditional entrolyy with additional words. We l)ro vide a theoretical estimation of sclf-informatio,l of spatial-scenes with the dictionary in Table 2.</Paragraph>
      <Paragraph position="10"> Tile result suggests that it has the spa.tial-scene identification ability with a few words 1)rese,'va tion. It also supl)orts the consequence of a h)gicalsummation algorithm shown in next section.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="312" end_page="315" type="metho">
    <SectionTitle>
4 Analyses of identification
</SectionTitle>
    <Paragraph position="0"> module Here we propose analyses of OPED and results of theoretical simulations. As formula (9) is expensive(11711! times), we use a Monte-Carlo simulation to abstract its characteristics. Iteration thne in each case is 1,000. * Fig.6 (a) shows a distribution of number of elements involved in each scene in OPED. It approximated a Gaussian distribution and has a average # Elemems i m ..... tog(el .... IS? nes per eldegmerit \]  wdue of 184.2. This value is used ill the theoretical simulations.</Paragraph>
    <Paragraph position="1"> * Fig.6 (b) shows a distribution of number of scenes which are related to one element. The region where more than 100 scenes are related to one word are those for trivial words like 'a', 'the', 'of', 'that', 'to', 'in', ~and', ~for', 'with', 's'. Although we could ignore these words for an actual application, we use them for fairness.</Paragraph>
    <Paragraph position="2"> * Selection probability in the case that partial words of scenes arc input to the mssoeiative men&gt; cry is illustrated in Fig.7. The recall rate incre`ases `as the input vector (set of words) becmnes more similar to c:omplete vector (set of words) pattern. Only about tlve words are enough to identify each scene at recognition rate of 90 percent. Compared to the average, number of 184 words ill each scene, this required mlmber is sufficiently small. It proves good performance of the `associative memory used in this module. 'l~heoretical resuits of a random distribution model is also shown in Fig.7. The cause of the discrepancy between the experiment and theoryis describe&lt;l latter. The dotted line 'EXACT' ill the tlgure is a result ilSing logical-smnmation. &amp;quot;File crossing point &lt;&gt;f the 'OPED' line and the 'IgXACT' line. is remarkable.</Paragraph>
    <Paragraph position="3"> Tile former has the adwmtage of expecting with relatively high-probMfility (likelihood) using input words of small number. Though with more additional words, the algorithm is deDated by the simple logical-sumination. As our architecture PDAI&amp;CD uses dual-phase of expectation and evaluation, we can get a solution with maximum-likelihood slttisfying constraints automatically. * Fig.8 shows tile distribution of mnnber of elements contributing to identify each scene uniquely.</Paragraph>
    <Paragraph position="4"> * In order to clarify tile discrepancy of tlle experimental anC/l theoretical results, tile number of elelnents overlal)lmd ill any two st:ones are connted.  As in Fig.9, tit(', number of overlal)ping (,lernents in the. the.oretieal e~dculation is very small compared to the experhr,ents with Of)El). OPfi',D-2 ill tile figure illustrates the same ,C/alue without using trivial words like 'a', 'the', 'of', 'that', 'to', 'in', 'and', fief', 'with', 's'. But the. existence of these words can not explain the whole discrepancy. This will be deserilled in the next section ill more detail.</Paragraph>
    <Paragraph position="5"> * As filrther investigation in order to explain tile discrepancy of 'EXACT'(logical-sunnnation) and 'OPED'(with our associative memory), distrilmtion of weight v~tlues is shown in l,'ig.10. I,~)/';icalsurnmation me.thod is achieved by a spe(:ial algorithm similar to the associative memory. Only tile ditferenee is that it uses equal weight value with- null out any wtrianee, llut in practic~tl, the experimental result of 'OPED' as ill \]'~ig.10 shows am existence of enormous wtriance ill tile distrilmtion of weight value. Though tile varimme helps the selectivity with it few words, it disturhs the expectivity with lllOl'e thall l\]lrt!e w()rds eolivers(qy, l\[el'e we sumnmrize the interl)ret;ttion of the gaps ~tmonF, the theoretical expectation, the rest, It of logic~tlsummalion('\]';XAC'.l&amp;quot;), and the system('OPl~,l)'):  1. l'~xsistem:e of trivial words in most of tile seelleS.</Paragraph>
    <Paragraph position="6"> 2. Variance of weight distribution.</Paragraph>
    <Paragraph position="7"> 3. l)ilference of characteristics hetwee.n algo- null rithms.</Paragraph>
    <Paragraph position="8"> * Abstracted results are summarized in Tabh.'.3. In this table, the number of re.gistered words ill dictionary itself is ditferent from the nurnber of the total words analyzed hy our systern. The diserepalley arises mainly Dora the fact that we analyzed emnpound words into simple words (e.g. 'research laboratory' to 'research' ~'~ittl 'laboratory').</Paragraph>
  </Section>
  <Section position="8" start_page="315" end_page="315" type="metho">
    <SectionTitle>
5 Summary
</SectionTitle>
    <Paragraph position="0"> We analyzed the selectivity of our 384 living scenes with many sets of words which are part of 11,711 words used in the dictionary OPED. The average munber of words in one scene is about 184. The probability of recalling correct scenes with input partial words is difl'erent from the theoretical simulation of random assignment constructed with vMues of these parameters. Unlike random generation of arbitrary symbols, semantics of natural language consists of highly-correlated meanings of words. Although the theoretical simulation of the simplified model suggests a rough estimation of disambiguation requirements we should analyze the dictionary itself as in this paper.</Paragraph>
    <Paragraph position="1"> Another suggestive analysis is using Shannon's information or entropy, which gives us more accurate.</Paragraph>
    <Paragraph position="2"> information depending on prol)ability of each phenomenon. It shows how to estimate the amount of semantic ambiguity.</Paragraph>
    <Paragraph position="3"> Spatial-scene identification is one of the simplest kind of context necessary to disambiguate meaning of words an(\[ offer a new method for future integration of natural language processing and visual pattern recognition. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML