File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1140_metho.xml

Size: 14,827 bytes

Last Modified: 2025-10-06 14:11:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1140">
  <Title>TItE ROLE OF SEMANTIC PROCESSING IN AN AUTOMATIC SPEECH UNDERSTANDING SYSTEM</Title>
  <Section position="3" start_page="0" end_page="597" type="metho">
    <SectionTitle>
3. Semantic reasoning in EVAR
</SectionTitle>
    <Paragraph position="0"> In our speech understanding system, the semantic analysis as defined above comprises the following tasks:  - resolution of lexical ambiguities - interpretation of constituents with respect to their semantic features - choice between alternative syntactic hypotheses and between alternative interpretations of constituents - revelation of semantic anomalies due to recognition errors - representation of the case structure - inference of expectations on the rest of the sentence.  These problems are solved by three fundamental operations of the semantics module: local interpretation by unification of semantic features, contextual interpretation by case frame analysis, and top-down hypotheses.</Paragraph>
    <Section position="1" start_page="0" end_page="596" type="sub_section">
      <SectionTitle>
3.1 Local interpretation of constituents
</SectionTitle>
      <Paragraph position="0"> One of the main tasks of the module consists in mapping syntactic structures (hypotheses) to caseframe instances. As this mapping essentially relies on semantic features, the features of a phrase have to be determined first. On the one hand, this means resolution of lexical ambiguities, on the other hand, this process supports the choice between alternative word and structural hypotheses. The principle is to reduce lexical ambiguities by selectionaI features of the phrase heads that constrain dependent words and phrases. To determine the features of a phrase, all meaning alternatives of its constituents are unified and tested for compatibility. The test yields a rating that is the higher, the more constituents are compatible with the nucleus class. Of all possible feature combinations, the one with the highest consistency is chosen. The semantic consistency rating of a group can also be regarded as a measure for the plausibility of a syntactic hypothesis. As low semantic ratings may result from grouping wrong word hypotheses, a search for alternative word and constituent hypotheses may be reasonable in an area with bad semantic consistency.</Paragraph>
      <Paragraph position="1"> The combinatoric constraints of words are expressed in the dictionary by the feature SELECTION. The system of semantic classes (features) is organized in a conceptual hierarchy, thus, with a given class selected by the phrase head all its subclasses are accepted as compatible. The system presently used consists of about 110 semantic features and is represented as a concept hierarchy in the network formalism.</Paragraph>
    </Section>
    <Section position="2" start_page="596" end_page="596" type="sub_section">
      <SectionTitle>
3.2 Contextual interpretation
</SectionTitle>
      <Paragraph position="0"> When constituents are locally interpreted, they are matched to the caseframes of some verbal groups in order to decide which constituents fit together and to represent their functional relationships. Usually there are different verb frames for a verb corresponding to its alternative meanings. The assumption is that the frame for the intended meaning will be the one that can fill most of its case slots.</Paragraph>
      <Paragraph position="1"> The mapping of a semantically interpreted phrase structure to a easeframe is accomplished by three different matching functions. The syntax module produces syntactic structure hypotheses that are represented as network instances. Due to competing and erroneous word hypotheses and structural ambiguities there will be competing syntactic structures as well. Every syntactic hypothesis has a score to reflect its reliability and importance. Depending on whether a complete and spanning sentence hypothesis could be found, one of two matching functions is selected: Frame Sentence Match takes a good scoring sentence hypothesis, the immediate constituents of which have already been interpreted, and tries to match them to cases in the alternative frames of tile head verb. Matching criteria are the constituent type that is required for a certain case and the selectional restrictions imposed by the verb.</Paragraph>
      <Paragraph position="2"> The second version (Frame Constituents Match) has been implemented in order to cope with only partially recognized sentence structures, ie. with isolated constituents. It is expected that complete (and completely recognized) sentences more likely tend to be the exception in spoken dialogue, and that it is advantageous to envolve semantic interpretation as soon as possible. In this case, the frames of the best scoring verbal groups are matched to the best scoring constituent hypotheses.</Paragraph>
      <Paragraph position="3"> For every successful configuration of a frame and filling constituents a frame instance is constructed with case attributes filled by the fitting constituents.</Paragraph>
      <Paragraph position="4"> The matching process yields plausibility scores for the embedding of constituents into all alternative caseframes that may represent different meanings of the (assumed) head verb.</Paragraph>
      <Paragraph position="5"> The score is a function of different factors: the number of obligatory slots that could be filled, reliability scores from the other modules, consistency ratings of the constituents, fulfilment of selectional restrictions, the relative length of the time intervall (in the speech signal) not covered by the hypothesis.</Paragraph>
      <Paragraph position="6"> The valency structure providing only a minimum framework for a sentence, a third interpretation function is needed to evaluate the functional relations of additional modifiers not constrained by valency. It mainly rests on the semantic properties of the 'functional words', that is prepositions and conjunctions, and of adverbs. Their semantic classes (eg. CAUSE, DIRECTION, SINCE) characterize the relation of prepositional and adverbial groups and subordinate clauses to the main clause.</Paragraph>
    </Section>
    <Section position="3" start_page="596" end_page="597" type="sub_section">
      <SectionTitle>
3.3 Top-down analysis
</SectionTitle>
      <Paragraph position="0"> Motivation The analysis so far can only be successful if a verb was uttered by the user that was also recognized with a satisfying certainty by the word recognition module. This is a very hard restriction for the user (to avoid for example elliptical constructions without an explicit articulation of a just mentioned verb) as also for the word recognition of the system.</Paragraph>
      <Paragraph position="1"> The special problem with spoken natural language is that you will never have the really uttered string of word hypotheses which covers the whole speech signal and is furtherlnore syntactic correct. On the other hand it is likely that with all the generated word hypotheses there would be many possibilities of chaining some of them to such a string. So the system will neither find out if a word was uttered that isn't known to it nor that an ellipsis was uttered. That could be found only in written language, for example by cmnmunicating with the user by a terminal. But analyzing spoken utterances in a dialogue there would always be wrong alternatives to the unknown or missing word or the missing syntactic constituent.</Paragraph>
      <Paragraph position="2"> This fact implies that it isn't possible to restrict the user to a certain range of speech, for example to formulate only complete sentences containing at least a subject and a verb. Whether any of such given restricting rules are violated is ahnost impossible to discover.</Paragraph>
      <Paragraph position="3"> Besides this 'technical' point of view our system should 'behave' like a normal human commnnication partner, ie. it should be able to handle all formulations that are normally used in an information dialogue between two human partners.</Paragraph>
      <Paragraph position="4"> Example: UI: When does the next train leave for Itamburg? SI: (there leaves one) At 12:15 hours.</Paragraph>
      <Paragraph position="5"> U2: And (is there another one) a little bit later? $2: That is the last (train to Hamburg) for today.</Paragraph>
      <Paragraph position="6"> Such elliptical sentence structures (in which not only the verb is possibly missing but also a noun group such as in $2) prevent unnecessary redundancy and effect the conversation becoming more natural and fluent.</Paragraph>
      <Paragraph position="7"> Top-down Hypotheses of Verbs In addition to the former described Frame Constituent Match, a kind of bottom-up analysis, a method is developed to analyze a spoken utterance without beginning with the verb of the sentence. Also this method is based on the valency theory (see above). Here we try to conclude from a set of constituent hypotheses produced by the syntax module to a set of possible verbframes containing slots for some of the found constituents which should not be competing with regard to the speech signal.</Paragraph>
      <Paragraph position="8"> Therefore it was necessary to organize the database containing the verbframes in a way that the actants (represented as attributes of the concept verb in a semantic network) of the verb (the concept) could be attained not only by seeking the verb and its information, but also in a direct way without knowing the affiliated concept.</Paragraph>
      <Paragraph position="9"> In German constituents have four selective features that can be used to restrict the number of the possible candidates for an attribute: the type of the constituent (for example noun group or prepositional group) - semantic class which the constituent can be an instance of - if the constituent is a prepositional or adverbial group the preposition respectively the semantic class of the preposition of the group - the case of the noun of the constituent (if any noun is present).</Paragraph>
      <Paragraph position="10"> For generating top-down hypotheses of verbs the last feature will not be used, because in German the endings which determine the case of a noun are all similar and so are the inflected word-forms of one lexeme. It is supposed (and partly shown by experiments) that the recognition and distinction of such word-forms is not reliable enough to base the further analysis on it. It would better serve for the verification of so far found syntactic and semantic hypotheses.</Paragraph>
      <Paragraph position="11">  &amp;quot;ankommenl 1&amp;quot; corresponds to &amp;quot;arrive&amp;quot; in the meaning of &amp;quot;The train arrives at Hamburg.&amp;quot; &amp;quot;umsteigenl 1&amp;quot; corresponds to &amp;quot;change&amp;quot; in the meaning of &amp;quot;I changed the train in Hamburg.&amp;quot; The prepositional group (PNG) &amp;quot;in Hamburg&amp;quot; can be interpreted as the LOCATION attribute of &amp;quot;ankommenl 1&amp;quot; or of &amp;quot;umsteigen I 1&amp;quot;.</Paragraph>
      <Paragraph position="12"> Another problem with the lexicon is that it mustn't contain lexemes for many applications in order to reduce the possibilities of 'correct' verbframes, Although the semantics module in EVAR should be independent of a specific task domain it is not realistic to permit always all meanings of the whole lexicon for the semantic analysis. Therefore it is intended to use for the first step of analysis only a part of the lexicon which is locally determined by the pragmatic module and the dialogue module, dependent on the dialogue context and the expectations for the next dialogue step. Both modules together have the 'knowledge' about the world, as far as it is needed, the specific domain and the linguistic and situative context of the dialogue.</Paragraph>
      <Paragraph position="13"> For the so far accomplished experiments two different verb lexicons were used. They were generated in a heuristic way limitating the whole range of our domain independent lexicon to a more or less restricted task domain. This was done prior to the analysis because up to now the pragmatic module is not realized. One of these lexicons contains only verbs that are used in our application 'Intercity Train 'Information', Other Top-down Hypotheses There are other possibilities too to generate top-down hypotheses in the semantics module: - We try to reduce the number of the word hypotheses by first seeking semantically compatible word groups (they need not to be adjacent, but must not be competing). With this method the head verb and also descriptions for the syntactic realization of its attributes can be predicted.</Paragraph>
      <Paragraph position="14"> - Another type oPS top-down hypotheses could be generated by seeking missing ie. not yet instantiated attributes of a verbframe, eg. &amp;quot;The train leaves )'or Hamburg.&amp;quot; - Sometimes the meaning of a sentence does not bear on the head verb but on a noun in that sentence, for example &amp;quot;Is there a good connection from Munich to Hamburg tomorrow morning.&amp;quot; In such cases it regards a nounframe instead of a verbframe assuming that the head verb is performative like &amp;quot;ask&amp;quot;, &amp;quot;excuse&amp;quot; and &amp;quot;must&amp;quot; or could be combined with nearly every noun like &amp;quot;have&amp;quot;, &amp;quot;be&amp;quot; and &amp;quot;become&amp;quot;. - There is always the possibility to limitate the range of the speech signal for the top-down hypotheses: They only have to be sought where the so far found hypotheses are not. In addition information about word order in German sentences could often be used to restrict the possible range for a certain sentence part further.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="597" end_page="597" type="metho">
    <SectionTitle>
4. Outlook
</SectionTitle>
    <Paragraph position="0"> Experiments with the so far implemented semantics module indicate that without considering the dialogue context the semantic analysis will produce too many hypotheses. Therefore it will be necessary to take account of it with the further developments by making pragmatic predictions about the following user utterances.</Paragraph>
    <Paragraph position="1"> With 'knowledge of the world', a special user model which describes all assumptions about the user and his intentions, and a memory about the course of the dialogue it is possible to predict the semantic and syntactic structure of the next user utterance, and also the words which can appear in tiffs structure.</Paragraph>
    <Paragraph position="2"> This research was supported by the German Ministry of Research and Technology B/riFt (in part by the joint project speech understanding in cooperation with Siemens AG, Muenchen).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML