File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2808_concl.xml
Size: 4,167 bytes
Last Modified: 2025-10-06 13:54:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2808"> <Title>Making Relative Sense: From Word-graphs to Semantic Frames</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> 7 Analysis and Concluding Remarks </SectionTitle> <Paragraph position="0"> In the cases of hypothesis and semantic disambiguation the knowledge-driven system scores significantly above the baseline (22,85 a2 and 11.28 a2 respectively) as shown in Table 1.</Paragraph> <Paragraph position="1"> In the case of tagging the semantic relations a baseline computation has (so far) been thwarted by the difficulties in calculating the set of markable-specific tagsets out of the ontological model and attribute-specific values found in the data. However, the performance may even be seen especially encouraging in comparison to the case of sense disambiguation. However, comparisons might well be misleading, as the evaluation criteria defined different views on the data. Most notably this is the case in examining the concept sets of the best SRHs as given potentially existing disambiguated representations. While this can certainly be the case, i.e. utterances for which these concept sets constitute the correct set can easily be imagined, the underlying potential utterances, however, did not occur in the data set examined in the case of the sense disambiguation evaluations.</Paragraph> <Paragraph position="2"> A more general consideration stems from the fact that both the knowledge store used and coherence scoring method have been shown to perform quite robustly for a variety of tasks. Some of these tasks - which are not mentioned herein - are executed by different processing components that employ the same underlying knowledge model but apply different operations such as overlay and have been reported elsewhere (Alexandersson and Becker, 2001; Gurevych et al., 2003b; Porzel et al., 2003b). In this light such evaluations could be used to single out an evaluation method for finding gaps and inconsistencies in the ontological model. While such a bootstrapping approach to ontology building could assist in avoiding scaling-related decreases in the performance of knowledge-based approaches, our concern in this evaluation also was to be able to set up additional examinations of the specific nature of the inaccuracies, by looking at the interdependencies between relation tagging and sense disambiguation.</Paragraph> <Paragraph position="3"> There remain several specific questions to be answered on a more methodological level as well. These concern ways of measuring the task-specific perplexities or comparable baseline metrics to evaluate the specific contribution of the system described herein (or others) for the task of making sense of ASR output. Additionally, methods need to be found in order to arrive at aggregate measures for computing the difficulty of the combined task of sense disambiguation and relation tagging and for evaluating the corresponding system performance. In future we will seek to remedy this situation in order to arrive at two general measurements: a0 a way of assessing the increases in the natural language understanding difficulty that result from scaling NLU systems towards more conversational and multi-domain settings; a0 a way of evaluating the performance of how individual processing components can cope with the scaling effects on the aggregate challenge to find suitable representations of spontaneous natural language utterances.</Paragraph> <Paragraph position="4"> In the light of scalability it is also important to point out that scaling such knowledge-based approaches comes with the associated cost in knowledge engineering, which is still by and large a manual process. Therefore, we see approaches that attempt to remove (or at least widen) the knowledge acquisition bottleneck to constitute valuable complements to our approach, which might be especially relevant for designing a bootstrapping approach that involves automatic learning and evaluation cycles to create scalable knowledge sources and approaches to natural language understanding.</Paragraph> </Section> class="xml-element"></Paper>