File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0614_metho.xml
Size: 14,069 bytes
Last Modified: 2025-10-06 14:14:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0614"> <Title>Grammatical analysis in the OVIS spoken-dialogue system</Title> <Section position="3" start_page="66" end_page="66" type="metho"> <SectionTitle> 2 A computational grammar for </SectionTitle> <Paragraph position="0"> Dutch In developing the OVIS grammar we have tried to combine the short-term goal of developing a grammar which meets the requirements imposed by the application (i.e. robust processing of the output of the speech recognizer, extensive coverage of locative phrases and temporal expressions, and the construction of fine-grained semantic representations) with the long-term goal of developing a general, computational, grammar which covers all the major constructions of Dutch.</Paragraph> <Paragraph position="1"> The grammar currently covers the majority of verbal subcategorization types (intransitives, transitives, verbs selecting a PP, and modal and auxiliary verbs), NP-syntax (including pre- and post-nominal modification, with the exception of relative clauses), PP-syntax, the distribution of vP-modifiers, various clausal types (declaratives, yes/no and WHquestions, and subordinate clauses), all temporal expressions and locative phrases relevant to the domain, and various typical spoken-language constructs. Due to restrictions imposed by the speech recognizer, the lexicon is relatively small (2000 word forms, most of which are names of stations and cities).</Paragraph> <Paragraph position="2"> From a linguistic perspective, the OVIS-grammar can be characterized as a constraint-based grammar, which makes heavy use of (multiple) inheritance. As the grammar assumes quite complex lexical signs, inheritance is absolutely essential for organizing the lexicon succinctly. However, we not only use inheritance at the level of the lexicon (which is a well-known approach to computational lexica), but have also structured the rule-component using inheritance. null An important restriction imposed by the grammar-parser interface is that rules must specify the category of their mothers and daughters. That is, a DCGs are called pure if they do not contain any calls to external Prolog predicates.</Paragraph> <Paragraph position="3"> each rule must specify the type of sign of its mother and daughters. A consequence of this requirement is that general rule-schemata, as used in Categorial Grammar and HPSG cannot be used directly in the OVIS grammar. A rule which specifies that a head daughter may combine with a complement daughter, if this complement unifies with the first element on SUBCAT of the head (i.e. a version of the categorial rule for functor-argument application) cannot be implemented directly, as it leaves the categories of the daughters and mother unspecified. Nevertheless, capturing generalizations of this type does seem desirable.</Paragraph> <Paragraph position="4"> We have therefore adopted an architecture for grammar rules similar to that of HPSG (Pollard and Sag, 1994), in which individual rules are classified in various structures, which are in turn defined in terms of general principles. For instance; the grammar currently contains several head-complement rules (which allow a verb, preposition, or determiner to combine with one or more complements). These rules need only specify category-information and the relative order of head and complement(s). All other information associated with the rule (concerning the matching of head-features, the instantiation of features used to code long-distance dependencies, and the semantic effect of the rule) follows from the fact that the rules are instances of the class head-complement structure. This class itself is defined in terms of general principles, such as the head-feature, valence, filler and semantics principle. Other rules are defined in terms of the classes head-adjunct and head-filler structure, which in thrn inherit from (a subset of) the general principles mentioned above.</Paragraph> <Paragraph position="5"> Thus, even though the grammar contains a relatively large number of rules (compared to lexicalist frameworks such as HPSG and cG), the redundancy in these rules is minimal.</Paragraph> <Paragraph position="6"> The resulting grammar has the interesting prop-erty that it combines the strong tendency towards lexicalism and positing general combinatoric rule schemata present in frameworks such as HPSG with relatively specific grammar rules to facilitate efficient processing.</Paragraph> </Section> <Section position="4" start_page="66" end_page="67" type="metho"> <SectionTitle> 3 Interaction with the dialogue </SectionTitle> <Paragraph position="0"> manager The semantic component of the grammar produces (simplified) Quasi-Logical Forms (Alshawi, 1992). These are linguistically motivated, domain-independent representations of the meaning of utterances. null QLFS allow considerable underspecification. This is convenient in this application because most ambiguities that arise, such as ambiguities of scope, do not need to be resolved. These QLFs are translated into domain-specific &quot;updates&quot; to be passed on to the dialogue manager (DM) for further processing. The DM keeps track of the information provided by the user by maintaining an information state or form. This form is a hierarchical structure, with slots and values for the origin and destination of a connection, for the time at which the user wants to arrive or depart, etc. The distinction between slots and values can be regarded as a special case of ground and focus distinction (Vallduvi, 1990). Updates specify the ground and focus of the user utterances. For example, the utterance &quot;No, I don't want to travel to Leiden but to Abcoude/&quot; yields the following update: us erwant s. tray el. de st inat ion.</Paragraph> <Paragraph position="1"> (\[# place.town.leiden\] ; \[ ! place, town. abcoude\] ) One important property of this representation is that it allows encoding of speech-act information. The &quot;#&quot; in the update means that the information between the square brackets (representing the focus of the user-utterance) must be retracted, while the &quot;!&quot; denotes the corrected information.</Paragraph> </Section> <Section position="5" start_page="67" end_page="68" type="metho"> <SectionTitle> 4 Robust parsing </SectionTitle> <Paragraph position="0"> The input to the NLP module consists of word-graphs produced by the speech recognizer. A word-graph is a compact representation for all lists of words that the speech recognizer hypothesizes for a spoken utterance. The nodes of the graph represent points in time, and an edge between two nodes represents a word that may have been uttered between the corresponding points in time. Each edge is associated with an acoustic score representing a measure of confidence that the word perceived there is the word that was actually uttered. These scores are negative logarithms of probabilities and therefore require addition as opposed to multiplication when two scores are combined.</Paragraph> <Paragraph position="1"> At an early stage, the word-graph is optimized to eliminate the epsilon transitions. Such transitions represent periods of time when the speech recognizer hypothesizes that no words are uttered. After this optimization, the word-graph contains exactly one start node and one or more final nodes, associated with a score, representing a measure of confidence that the utterance ends at that point.</Paragraph> <Paragraph position="2"> In the ideal case, the parser will find one or more paths in a given word-graph that can be assigned an analysis according to the grammar, such that the paths cover the complete time span of the utterance, i.e. the paths lead from the start node to a final node. Each analysis gives rise to an update of the dialogue state. From that set of updates, one is then passed on to the dialogue manager.</Paragraph> <Paragraph position="3"> However, often no such paths can be found in the word-graph, due to: * errors made by the speech recognizer, * linguistic constructions not covered in the grammar, and * irregularities in the spoken utterance.</Paragraph> <Paragraph position="4"> Our solution is to allow recognition of paths in the word-graph that do not necessarily span the complete utterance. Each path should be an instance of some major category from th~ grammar, such as S, NP, PP, etc. In our application, this often comes down to categories such as &quot;temporal expression&quot; and &quot;locative phrases&quot;. Such paths will be called maximal projections. A list of maximal projections that do not pair-wise overlap and that lie on a single path from the start node to a final node in the word-graph represents a reading of the utterance. The transitions between the maxima! projections will be called skips.</Paragraph> <Paragraph position="5"> The optimal such list is computed, according to criteria to be discussed below. The categories of the maximal projections in the list are then combined and the update for the complete utterance is computed. This last phase, contains, among other things, some domain-specific linguistic knowledge dealing with expressions that may be ungrammatical in other domains; e.g. the utterance &quot;Amsterdam Rotterdam&quot; does not exemplify a general grammatical construction of Dutch, but in the particular domain of OVIS such an utterance occurs frequently, with the meaning &quot;departure from Amsterdam and arrival in Rotterdam&quot;.</Paragraph> <Paragraph position="6"> We will now describe the robust parsing module in more detail. The first phase that is needed is the application of a parsing algorithm which is such that: 1. grammaticality is investigated for all paths, not only for the complete paths from the first to a final node in the word-graph, and 2. grammaticality of those paths is investigated for each category from a fixed set.</Paragraph> <Paragraph position="7"> Almost any parsing technique, ,such as left-corner parsing, LR parsing, etc., can be adapted so that the first constraint above is satisfied; the second constraint is achieved by structuring the grammar such that the top category directly generates a number of grammatical categories.</Paragraph> <Paragraph position="8"> The second phase is the selection of the optimal list of maximal projections lying on a single path from the start node to a final node. At each node we visit, we compute a partial score consisting of a tuple (S, P, A), where S is the number of transitions on the path not part of a maximal projection (the skips), P is the number of maximal projections, A is the sum of the acoustic scores of all the transitions on the path, including those internal in maximal projections. We define the relation ~ on triples such that ($1, P1, A1) ~ ($2, P2, A2) if and only if:</Paragraph> <Paragraph position="10"> In words, for determining which triple has minimal score (i.e. is optimal), the number of skips has strictly the highest importance, then the number of projections, and then the acoustic scores.</Paragraph> <Paragraph position="11"> Our branch-and-bound algorithm maintains a priority queue, which contains pairs of the form (g, (S, P, A)), consisting of a node g and a triple (S, P,A) found at the node, or pairs of the form (N, (S, P, A)), with the same meaning except that N is now a final node of which the acoustic score is incorporated into A. Popping an element from the queue yields a pair of which the second element is an optimal triple with regard to the relation ~ fined above. Initially, the queue contains just (No, (0, 0, 0)), where No is the start node, and possibly (No, (0, 0, A)), if No is also a final state with acoustic score A.</Paragraph> <Paragraph position="12"> A node N is marked as seen when a triple has been encountered at N that must be optimal with respect to all paths leading to N from the start node. The following is repeated until a final node is found with an optimal triple: 1. Pop an optimal element from the queue.</Paragraph> <Paragraph position="13"> 2. If it is of the form (N, (S, P,A)) then return the path leading to that triple at that node, and halt.</Paragraph> <Paragraph position="14"> 3. Otherwise, let that element be (N, (S, P, A)). 4. If N was already marked as seen then abort this iteration and return to step 1.</Paragraph> <Paragraph position="15"> 5. Mark N as seen.</Paragraph> <Paragraph position="16"> 6. For each maximal projection from N to M with acoustic score A', enqueue (M, (S, P + t, A + A')). If M is a final node with acoustic score A&quot;, then furthermore enqueue (M, (S, P+i, A+ A' + A&quot;)).</Paragraph> <Paragraph position="17"> 7. For each transition from N to M with acoustic score A', enqueue (U, (S + 1, P, A + A')). If U is a final node with acoustic score A ~, then furthermore enqueue (U, (S + 1, P, A + A' + A&quot;)). Besides S, P, and A, other factors can be taken into account as well, such as the semantic score, which is obtained by comparing the updates corresponding to maximal projections with the meaning of the question generated by the system prior to the user utterance.</Paragraph> <Paragraph position="18"> We are also experimenting with the bigram score. Bigrams attach a measure of likelihood to the occurrence of a word given a preceding word.</Paragraph> <Paragraph position="19"> Note that when bigrams are used, simply labelling nodes in the graph as seen is nc~t a valid method to prevent recomputation of subpaths. The required adaptation to the basic branch-and-bound algorithm is not discussed here.</Paragraph> <Paragraph position="20"> Also, in the actual implementation the X best readings are produced, instead of a single best reading. This requires a generalization of the above procedure so that instead of using the label &quot;seen&quot;, we attach labels &quot;seen i C/imes&quot; to each node, where 0<i<X.</Paragraph> </Section> class="xml-element"></Paper>