File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-3015_metho.xml

Size: 11,996 bytes

Last Modified: 2025-10-06 14:10:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-3015">
  <Title>Clavius: Bi-Directional Parsing for Generic Multimodal Interaction</Title>
  <Section position="4" start_page="85" end_page="87" type="metho">
    <SectionTitle>
2 The Algorithm
</SectionTitle>
    <Paragraph position="0"> CLAVIUS expands parses according to a best-first process where newly expanded edges are ordered according to trainable criteria of multimodal language, as discussed in SS3. Figure 3 shows a component breakdown of CLAVIUS 's software architecture. The sections that follow explain the flow of information through this system from sensory input to semantic interpretation.</Paragraph>
    <Paragraph position="1">  fundamental software components.</Paragraph>
    <Section position="1" start_page="85" end_page="86" type="sub_section">
      <SectionTitle>
2.1 Lexica and Preprocessing
</SectionTitle>
      <Paragraph position="0"> Each unique input modality is asynchronously monitored by one of T TRACKERS, each sending ann-bestlistoflexicalhypothesesto CLAVIUS for any activity as soon as it is detected. For example, a gesture tracker (see Figure 4a) parametrizes the gestures preparation, stroke/point, and retraction (McNeill, 1992), with values reflecting spatial positions and velocities of arm motion, whereas  our speech tracker parametrises words with part-of-speech tags, and prior probabilities (see Figure 4b). Although preprocessing is reduced to the identification of lexical tokens, this is more involved than simple lexicon lookup due to the modelling of complex signals.</Paragraph>
    </Section>
    <Section position="2" start_page="86" end_page="86" type="sub_section">
      <SectionTitle>
2.2 Data Structures
</SectionTitle>
      <Paragraph position="0"> All TRACKERS write their hypotheses directly to the first of three SUBSPACES that partition all partial parses in the search space. The first is the GENERALISER's subspace, Ks[G], which is monitored by the GENERALISER thread the first part of the parser. All new parses are first written to Ks[G] before being moved to the SPECIFIER's active and inactive subspaces, Ks[SAct], and Ks[SInact], respectively. Subspaces are optimised for common operations by organising parses by their scores and grammatical categories into depth-balanced search trees having the heap property. The best partial parse in each subspace can therefore be found in O(1) amortised time.</Paragraph>
    </Section>
    <Section position="3" start_page="86" end_page="86" type="sub_section">
      <SectionTitle>
2.3 Generalisation
</SectionTitle>
      <Paragraph position="0"> The GENERALISER monitors the best partial parse, Psg, in Ks[G], and creates new parses Psi for all grammar rules Gi having CATEGORY(Psg) on the right-hand side. Effectively, these new parses are instantiations of the relevant Gi, with one constituent unified to Psg. This provides the impetus towards sentence-level parses, as simplified in Algorithm 1 and exemplified in Figure 5. Naturally, if rule Gi has more than one constituent (c &gt; 1) of type CATEGORY(Psg), then c new parses are created, each with one of these being instantiated.</Paragraph>
      <Paragraph position="1"> Since the GENERALISER is activated as soon as input is added to Ks[G], the process is interactive (Tomita, 1985), and therefore incorporates the associatedbenefitsofefficiency. Thisiscontrasted with the all-paths bottom-up strategy in GEMINI (Dowding et al, 1993) that finds all admissable edges of the grammar.</Paragraph>
    </Section>
    <Section position="4" start_page="86" end_page="87" type="sub_section">
      <SectionTitle>
2.4 Specification
</SectionTitle>
      <Paragraph position="0"> The SPECIFIER thread provides the impetus towards complete coverage of the input, as simplified in Algorithm 2 (see Figure 6). It combines parses in its subspaces that have the same top-level grammar expansion but different instantiated constituents. The resulting parse merges the semantics of the two original graphs only if unification succeeds, providing a hard constraint against the combination of incongruous information. The result, Ps, of specification must be written to Ks[G], otherwise Ps could never appear on the RHS of another partial parse. We show how associated vulnerabilities are overcome in SS3.2 and SS3.4.</Paragraph>
      <Paragraph position="1"> Specification is commutative and will always provide more information than its constituent graphs if it does not fail, unlike the 'overlay'</Paragraph>
    </Section>
    <Section position="5" start_page="87" end_page="87" type="sub_section">
      <SectionTitle>
2.5 Cognition
</SectionTitle>
      <Paragraph position="0"> The COGNITION thread monitors the best sentence-level hypothesis, PsB, in Ks[SInact], and terminates the search process once PsB has remained unchallenged by new competing parses for some period of time.</Paragraph>
      <Paragraph position="1"> Once found, COGNITION communicates PsB to the APPLICATION. Both COGNITION and the APPLICATION read state information from the MySQL WORLD database, as discussed in SS3.5, though only the latter can modify it.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="87" end_page="88" type="metho">
    <SectionTitle>
3 Applying Domain-Centric Knowledge
</SectionTitle>
    <Paragraph position="0"> Upon being created, all partial parses are assigned a score approximating its likelihood of being part of an accepted multimodal sentence. The score of partial parse Ps, SCORE(Ps) =</Paragraph>
    <Paragraph position="2"> is a weighted linear combination of independent scoring modules (KNOWLEDGE SOURCES). Each module presents a score function ki : Ps - Rfractur[0..1] according to a unique criterion of multimodal language, weighted by oi, also on Rfractur[0..1]. Some modules provide 'hard constraints' that can outright forbid unification, returning ki = [?][?] in those cases. A subset of the criteria we have explored are outlined below.</Paragraph>
    <Section position="1" start_page="87" end_page="87" type="sub_section">
      <SectionTitle>
3.1 Temporal Alignment (k1)
</SectionTitle>
      <Paragraph position="0"> By modelling the timespans of parses as Gaussians, where u and s are determined by the midpoint and 12 the distance between the two endpoints, respectively - we can promote parses whose constituents are closely related in time with the symmetric Kullback-Leibler divergence,</Paragraph>
      <Paragraph position="2"> Therefore, k1 promotes more locally-structured parses, and co-occuring multimodal utterances.</Paragraph>
    </Section>
    <Section position="2" start_page="87" end_page="87" type="sub_section">
      <SectionTitle>
3.2 Ancestry Constraint (k2)
</SectionTitle>
      <Paragraph position="0"> A consequence of accepting n-best lexical hypotheses for each word is that we risk unifying parses that include two competing hypotheses.</Paragraph>
      <Paragraph position="1"> For example, if our speech TRACKER produces hypotheses &amp;quot;horse&amp;quot; and &amp;quot;house&amp;quot; for ambiguous input, then k2 explicitly prohibits the parse &amp;quot;the horse and the house&amp;quot; with flags on lexical content.</Paragraph>
    </Section>
    <Section position="3" start_page="87" end_page="88" type="sub_section">
      <SectionTitle>
3.3 Probabilistic Grammars (k3)
</SectionTitle>
      <Paragraph position="0"> We emphasise more common grammatical constructions by augmenting each grammar rule with an associated probability, P(Gi),</Paragraph>
      <Paragraph position="2"> top-level expansion of Ps.</Paragraph>
      <Paragraph position="3"> Probabilities are trainable by maximum likelihood estimation on annotated data. Within the context of CLAVIUS , k3 promotes the processing of new input words and shallower parse trees.</Paragraph>
    </Section>
    <Section position="4" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.4 Information Content (k4), Coverage (k5)
</SectionTitle>
      <Paragraph position="0"> The k4 module partially orders parses by preferring those that maximise the joint entropy between the semantic variables of its constituent parses. Furthermore, we use a shifted sigmoid</Paragraph>
      <Paragraph position="2"> [?]1, topromoteparses that maximise the number of 'words' in a parse. These two modules together are vital in choosing fully specified sentences.</Paragraph>
    </Section>
    <Section position="5" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.5 Functional Constraints (k6)
</SectionTitle>
      <Paragraph position="0"> Each grammar rule Gi can include constraint functions f : Ps - Rfractur[0,1] parametrised by values in instantiated graphs. For example, the function T FOLLOWS(Ps1,Ps2) returns 1 if constituent Ps2 follows Ps1 in time, and [?][?] otherwise, thus maintaining ordering constraints. Functions are dynamically loaded and executed during scoring.</Paragraph>
      <Paragraph position="1"> Since functions are embedded directly within parse graphs, their return values can be directly incorporated into those parses, allowing us to utilise data in the WORLD. For example, the function OBJECTAT(x,y,&amp;o) determines if an object exists at point (x,y), as determined by a pointing gesture, and writes the type of this object, o, to the graph, which can later further constrain the search.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="88" end_page="89" type="metho">
    <SectionTitle>
4 Early Results
</SectionTitle>
    <Paragraph position="0"> We have constructed a simple blocks-world experiment where a user can move, colour, create, and delete geometric objects using speech and pointing gestures with 74 grammar rules, 25 grammatical categories, and a 43-word vocabulary. Ten users were recorded interacting with this system, for a combined total of 2.5 hours of speech and gesture data, and 2304 multimodal utterances. Our randomised data collection mechanism was designed to equitably explore the four command types. Test subjects were given no indication as to the types of phrases we expected - but were rather shown a collection of objects and were asked to replicate it, given the four basic types of actions.</Paragraph>
    <Paragraph position="1"> Several aspects of the parser have been tested at this stage and are summarised below.</Paragraph>
    <Section position="1" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
4.1 Accuracy
</SectionTitle>
      <Paragraph position="0"> Table 1 shows three hand-tuned configurations of the module weights oi, with o2 = 0.0, since k2  achieved for each Ohmi on each of the four tasks, where precision is defined as the proportion of correctly executed sentences. These are compared against the CMU Sphinx-4 speech recogniser using the unimodal projection of the multimodal grammar. Here, conjunctive phrases such as &amp;quot;Put a sphere here and colour it yellow&amp;quot; are classified according to their first clause.</Paragraph>
      <Paragraph position="1"> Presently, correlating the coverage and probabilistic grammar constraints with higher weights ( &gt; 30%) appears to provide the best results. Creation and colouring tasks appeared to suffer most due to missing or misunderstood head-noun modifiers (ie., object colour). In these examples, CLAVIUS ranged from a [?]51.7% to a 62.5% relative error reduction rate over all tasks.</Paragraph>
    </Section>
    <Section position="2" start_page="88" end_page="89" type="sub_section">
      <SectionTitle>
4.2 Work Expenditure
</SectionTitle>
      <Paragraph position="0"> To test whether the best-first approach compensates for CLAVIUS ' looser constraints (SS1.2), a simple bottom-up multichart parser (SS1.1) was constructed and the average number of edges it produces on sentences of varying length was measured. Figure 8 compares this against the average number of edges produced by CLAVIUS on the same data. In particular, although CLAVIUS generally finds the parse it will accept relatively quickly ('CLAVIUS - found'), the COGNITION module will delay its acceptance ('CLAVIUS - accepted') for a time. Further tuning will hopefully reduce this 'waiting period'.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML