File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/p97-1021_concl.xml
Size: 6,930 bytes
Last Modified: 2025-10-06 13:57:44
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1021"> <Title>A DOP Model for Semantic Interpretation*</Title> <Section position="7" start_page="164" end_page="165" type="concl"> <SectionTitle> 5 Experiments on the OVIS </SectionTitle> <Paragraph position="0"> tree-bank The NWO 4 Priority Programme &quot;Language and Speech Technology&quot; is a five year research programme aiming at the development of advanced telephone-based information systems. Within this programme, the OVIS 5 tree-bank is created. Using a pilot version of the OVIS system, a large number of human-machine dialogs were collected and transcribed. Currently, 10.000 user utterances have received a full syntactic and semantic analysis. Regrettably, the tree-bank is not available (yet) to the public. More information on the tree-bank can be found on http : ~~grid. let. rug. nZ : 4321/. The semantic domain of all dialogs, is the Dutch railways schedule. The user utterances are mostly answers to questions, like: &quot;From where to where do you want to travel?&quot;, &quot;At what time do you want to arrive in Amsterdam?&quot;, &quot;Could you please repeat your destination?&quot;. The annotation method is robust and flexible, as we are dealing with real, spoken data, containing a lot of clearly ungrammatical utterances. For the annotation task, the annotation face, written by Bonnema, offering all functionality needed for examining, evaluating, and editing syntactic and semantic analyses. SEMTAGS is mainly used for correcting the output of the DOP-parser.</Paragraph> <Paragraph position="1"> It incrementally builds a probabilistic model of corrected annotations, allowing it to quickly suggest alternative semantic analyses to the annotator. It took approximately 600 hours to annotate these 10.000 utterances (supervision included).</Paragraph> <Paragraph position="2"> Syntactic annotation of the tree-bank is conventional. There are 40 different syntactic categories in the OVIS tree-bank, that appear to cover the syntactic domain quite well. No grammar is used to determine the correct annotation; there is a small set of guidelines, that has the degree of detail necessary to avoid an &quot;anything goes&quot;-attitude in the annotator, but leaves room for his/her perception of the structure of an utterance. There is no conceptual division in the tree-bank between POS-tags and nonterminal categories.</Paragraph> <Paragraph position="3"> Figure 9 shows an example tree from the treebank. It is an analysis of the Dutch sentence: &quot;Ik(I) wil( want ) niet( not ) vandaag( today) maar( but ) morgen(tomorrow) naar(to) Almere Buiten(Almere Buiten)&quot;. The analysis uses the formula schemata discussed in section 3.2, but here the interpretations of daughter-nodes are so-called &quot;update&quot; expressions, conforming to a frame structure, that are combined into an update of an information state. The complete interpretation of this utterance is: user.wants.((\[#today\];\[itomorrow\]);destination.- place.(town.almere;suffix.buiten)). The semantic formalism employed in the tree-bank is the topic of the next section.</Paragraph> <Section position="1" start_page="164" end_page="165" type="sub_section"> <SectionTitle> 5.1 The Semantic formalism </SectionTitle> <Paragraph position="0"> The semantic formalism used in the OVIS tree-bank, is a frame semantics, defined in Veldhuijzen van Zanten (1996). In this section, we give a very short impression. The well-formedness and validity of an expression is decided on the basis of a type-lattice, called a frame structure. The interpretation of an utterance, is an update of an information state. An information state is a representation of objects and the relations between them, that complies to the frame structure. For OVIS, the various objects are related to concepts in the train travel domain. In updating an information state, the notion of a slot-value assignment is used. Every object can be a slot or a value. The slot-value assignments are defined in a way that corresponds closely to the linguistic notion of a ground-focus structure.</Paragraph> <Paragraph position="1"> The slot is part of the common ground, the value is new information. Added to the semantic formalism are pragmatic operators, corresponding to denial, confirmation , correction and assertion 6 that indicate the relation between the value in its scope, and the information state.</Paragraph> <Paragraph position="2"> An update expression is a set of paths through the frame structure, enhanced with pragmatic operators that have scope over a certain part of a path. For the semantic DOP model, the semantic type of an expression C/ is a pair of types (tz,t2). Given the type-lattice &quot;/-of the frame structure, tl is the lowest upper bound in T of the paths in C/, and t2 is the greatest lower bound in Tof the paths in C/.</Paragraph> </Section> <Section position="2" start_page="165" end_page="165" type="sub_section"> <SectionTitle> 5.2 Experimental results </SectionTitle> <Paragraph position="0"> We performed a number of experiments, using a random division of the tree-bank data into test- and training-set. No provisions were taken for unknown words. The results reported here, are obtained by randomly selecting 300 trees from the tree-bank. All utterances of length greater than one in this selection are used as testing material. We varied the size of the training-set, and the maximal depth of the subtrees. The average length of the test-sentences was 4.74 words. There was a constraint on the extraction of subtrees from the training-set trees: subtrees could have a maximum of two substitution-sites, and no more than three contiguous lexical nodes (Experience has shown that such limitations improve prob- null ability estimations, while retaining the full power of DOP). Figures 10 and 11 show results using a training set size of 8500 trees. The maximal depth of sub-trees involved in the parsing process was varied from 1 to 5. Results in figure 11 concern a match with the total analysis in the test-set, whereas Figure 10 shows success on just the resulting interpretation.</Paragraph> <Paragraph position="1"> Only exact matches with the trees and interpretations in the test-set were counted as successes. The experiments show that involving larger fragments in the parsing process leads to higher accuracy. Apparently, for this domain fragments of depth 5 are too large, and deteriorate probability estimations 7. The results also confirm our earlier findings, that semantic parsing is robust. Quite a few analysis trees that did not exactly match with their counterparts in the test-set, yielded a semantic interpretation that did match. Finally, figures 12 and 13 show results for differing training-set sizes, using subtrees of maximal depth 4.</Paragraph> </Section> </Section> class="xml-element"></Paper>