File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3505_metho.xml
Size: 17,565 bytes
Last Modified: 2025-10-06 14:10:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3505"> <Title>Scaling Natural Language Understanding via User-driven Ontology Learning Berenike Loos</Title> <Section position="4" start_page="34" end_page="35" type="metho"> <SectionTitle> 3 Natural Language versus Ontology </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="34" end_page="35" type="sub_section"> <SectionTitle> Learning </SectionTitle> <Paragraph position="0"> Before describing the actual ontology learning process it is important to make a clear distinction between the two fields involved: This is on the one hand natural language and on the other hand ontology learning.</Paragraph> <Paragraph position="1"> The corpora to extract knowledge from should come from the internet as this source provides the most up-to-date information. The natural language texts are rich in terms, which can be used as labels of concepts in the ontology and rich in semantic relations, which can be used as ontological relations (aka properties).</Paragraph> <Paragraph position="2"> The connection between the two areas which are working on similar topics but are using different terminology needs a distinction between the extraction of semantic information from natural language and the final process of integrating this knowledge into an ontology.</Paragraph> <Paragraph position="3"> from natural language text. On the left side relevant natural language terms are extracted. During a transformation process they are converted into labels of concepts and relations of an ontology. Proper nouns</Paragraph> </Section> </Section> <Section position="5" start_page="35" end_page="35" type="metho"> <SectionTitle> 4 Scaling NLU via User-driven Ontology </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="35" end_page="35" type="sub_section"> <SectionTitle> Learning </SectionTitle> <Paragraph position="0"> A user-driven ontology learning framework should be able to acquire knowledge at the run time of the NLU system. Therefore, terms which are not understood by the system have to be identified. In dialog systems this is true for all terms uttered or written by a user, which are not presently contained in the lexicon or can be derived by means of derivational or flexional morphology. In the following I will refer to these terms as unknown terms2.</Paragraph> <Paragraph position="1"> When a user of an open-domain spoken dialog system makes an utterance, it happens regularly, that the term is not represented in the system's lexicon.</Paragraph> <Paragraph position="2"> Since it is assumed, in this work, that the meaning of terms is represented by means of a formal ontology, a user-driven ontology learning framework is needed to determine the corresponding concepts for these terms, e.g., via a search on topical corpora.</Paragraph> <Paragraph position="3"> For instance, a term such as Auerstein could be employed to query a search engine. By applying natural language patterns, as proposed by Hearst (1992) and statistical methods, as proposed by Faulhaber et al.</Paragraph> <Paragraph position="4"> (2006) possible hypernyms or sets of hypernym candidates of the term can be extracted. For these a corresponding concept (or set of possible concepts) in the ontology employed by the dialog system need to be found. Last but not least the unknown term has to be inserted into the ontology as either an instance or a subclass of that concept. This process is described in greater detail in Section 5.4).</Paragraph> <Paragraph position="5"> It is important to point out that terms often have more than one meaning, which can only be determined by recourse to the context in which it is uttered/found (Widdows, 2003), (Porzel et al., 2006).</Paragraph> <Paragraph position="6"> Therefore, information about this context needs to be added in order to make searching for the right hypernym feasible3 as shown in Section 5.3. For example, the term Lotus can refer to a flower, a specific type of car or among copious other real world entities to a restaurant in Heidelberg. Therefore, a scal-</Paragraph> </Section> </Section> <Section position="6" start_page="35" end_page="38" type="metho"> <SectionTitle> 5 On-demand learning </SectionTitle> <Paragraph position="0"> From the cognitive point of view learning makes only sense when it happens on-demand. On-demand means, that it occurs on purpose and that activity is involved rather than passivity. As pointed out by Spitzer (2002) for human beings activity is necessary for learning. We cannot learn by drumming data into our brain through listening cassettes when sleeping or by similar fruitless techniques. The reason for this is, that we need active ways of structuring the data input into our brain. Furthermore, we try only to learn what we need to learn and are therefore quite economic with the &quot;storage space&quot; in our brain.</Paragraph> <Paragraph position="1"> It makes not only for humans sense to simply learn whatever they need and what is useful for them. Therefore, I propose that ontology learning, as any other learning, is only useful and, in the end, possible if it is situated and motivated by the given context and the user needs. This can entail learning missing concepts relevant to a domain or to learn new concepts and instances which become necessary due to changes in a domain.</Paragraph> <Paragraph position="2"> However, the fundamental ontological commitments should be adhered to. So, for example, the decision between a revisionary and a descriptive ontology should be kept in the hand of the knowledge engineer, as well as the choice between a multiplicative and a reductionist modeling4. As soon as the basic structure is given new knowledge can be integrated into this structure. Thus, for a reductionist ontology a concept such as Hotel should be appended only once, e.g. to an ontological concept as PhysicalObject rather than NonPhysicalObject.</Paragraph> <Paragraph position="3"> In the following I will describe the various steps and components involved in on-demand ontology learning.</Paragraph> <Section position="1" start_page="36" end_page="36" type="sub_section"> <SectionTitle> 5.1 Unknown terms in dialog systems </SectionTitle> <Paragraph position="0"> In case the dialog system works with spoken language one can use the out-of-vocabulary (OOV) classification of the speech recognizer about all terms not found in the lexicon (Klakow et al., 2004). A solution for a phoneme-based recognition is the establishment of corresponding bestrated grapheme-chain hypotheses (Gallwitz, 2002).</Paragraph> <Paragraph position="1"> Those can be used for a search on the internet. In case the dialog system only works with written language it is easier to identify terms, which cannot be mapped to ontological concepts, at least if they are spelled correctly. To evaluate the framework itself adequately it is useful to apply only correctly written terms for a search.</Paragraph> <Paragraph position="2"> Later on in both cases - i.e. in spoken and written dialog systems - a ranking algorithm of the best, say three, hypotheses should be selected to find the most adequate term. Here methods like the one of Google &quot;Did you mean...&quot; for spelling errors could be used. 4More information on these and other ontological choices can be found summarized in (Cimiano et al., 2004)</Paragraph> </Section> <Section position="2" start_page="36" end_page="36" type="sub_section"> <SectionTitle> 5.2 Language Understanding </SectionTitle> <Paragraph position="0"> All correctly recognized terms of the user utterance can be mapped to concepts with the help of an analysis component. Frequently, production systems (Engel, 2002), semantic chunkers (Bryant, 2004) or simple word-to-concept lexica (Gurevych et al., 2003) are employed for this task. Such lexica assign corresponding natural language terms to all concepts of an ontology. This is especially important for a later semantic disambiguation of the unknown term (Loos and Porzel, 2004). In case the information of the concepts of the other terms of the utterance can help to evaluate results: When there is more than one concept proposal for an instance (i.e. on the linguistic side a proper noun like Auerstein) found in the word-to-concept lexicon, the semantic distance between each proposed concept and the other concepts of the user's question can be calculated5.</Paragraph> </Section> <Section position="3" start_page="36" end_page="38" type="sub_section"> <SectionTitle> 5.3 Linguistic and Extra-linguistic Context </SectionTitle> <Paragraph position="0"> Not only linguistic but also extra linguistic context plays an important role in dialog systems. Thus, to understand the user in an open-domain dialog system it is important to know the extra-linguistic context of the utterances. If there is a context module or component in the system it can give information on the discourse domain, time and location of the user. This information can be used as a support for a search on the internet. E.g. the location of the user when searching for, say Auerstein, is advantageous, as in the context of the city Heidelberg it has a different meaning than in the context of another city (Bunt, 2000), (Porzel et al., 2006).</Paragraph> <Paragraph position="1"> Part of the context information can be represented by the ontology as well as patterns for grouping a number of objects, processes and parameters for one distinctive context (Loos and Porzel, 2005).</Paragraph> <Paragraph position="2"> 5.4 Finding the appropriate hypernym on the internet For this, the unknown term as well as an appropriate context term (if available) needs to be applied for searching possible hypernyms on the Web. As mentioned before an example could be the unknown term Auerstein and the context term Heidelberg.</Paragraph> <Paragraph position="3"> 5E.g. with the single-source shortest path algorithm of Dijkstra (Cormen et al., 2001).</Paragraph> <Paragraph position="4"> For searching the internet different encyclopedias and search engines can be used and the corresponding results can be compared. After a distinction between different types of unknown terms, the search methods are described.</Paragraph> <Paragraph position="5"> Global versus local unknown terms: In the case of generally familiar proper nouns like stars, hotel chains or movies (so to say global unknown terms), a search on a topical encyclopedia can be quite successful. In the case of proper nouns, only common in a certain country region, such as Auerstein (Restaurant), Bierbrezel (Pub) and Lux (Cinema), which are local unknown terms, a search in an encyclopedia is generally not fruitful. Therefore, one can search with the help of a search engine.</Paragraph> <Paragraph position="6"> As one can not know the kind of unknown terms beforehand, the encyclopedia search should be executed before the one using the search engine. If no results are produced, the latter will deliver them (hopefully). In case results are retrieved by the former, the latter can still be used to test those.</Paragraph> <Paragraph position="7"> Encyclopedia Search: The structure of Encyclopedia entries is generally pre-assigned. That means, a program can know, where to find the most suitable information beforehand. In the case of finding hypernyms the first sentence in the encyclopedia description is often found to be the most useful. To give an example from Wikipedia6, here is the first sentence for the search entry Michael Ballack: (1) Michael Ballack (born September 26, 1976 in Grlitz, then East Germany) IS A German football player.</Paragraph> <Paragraph position="8"> With the help of lexico-syntactic patterns, the hypernym can be extracted. These so-called Hearst patterns (Hearst, 1992) can be expected to occur frequently in lexicons for describing a term. In example 1 the pattern X is a Y would be matched and the hypernym football player of the term Michael Ballack could be extracted.</Paragraph> <Paragraph position="9"> Title Search: To search only in the titles of web pages might have the advantage, that results can be 6Wikipedia is a free encyclopedia, which is editable on the internet: http://www.wikipedia.org (last access: 26th January 2006).</Paragraph> <Paragraph position="10"> generated relatively fast. This is important as real-time performance is an important usability factor in dialog systems. When the titles contain the hypernym it still is to be expected that they might not consist of full sentences, Hearst patterns (Hearst, 1992) are, therefore, unlikely to be found. Alternatively, only the nouns in the title could be extracted and their occurrences counted. The noun most frequently found in all the titles could then be regarded as the most semantically connected term.</Paragraph> <Paragraph position="11"> To aid such frequency-based approaches stemming and clustering algorithms can be applied to group similar terms.</Paragraph> <Paragraph position="12"> Page Search: For a page search Hearst patterns as in the encyclopedia search can almost certainly be applied. In contrast to encyclopedia entries the recall of those patterns is not so high in the texts from the The text surrounding the unknown term is searched for nouns. Equal to the title search the occurrence of nouns can then be counted. With the help of machine learning algorithms a text mining can be done to ameliorate the results.</Paragraph> <Paragraph position="13"> 5.5 Mapping text to knowledge by term narrowing and widening As soon as an appropriate hypernym is found in a text the corresponding concept name should be determined. For term narrowing, the term has to be stemmed to its most general form. For the term widening, this form is used to find synonyms. Those are, in turn, used for searching ontological concept names in the ontology integration phase. If the hypernym found is in a language other than the one used for the ontology, a translation of the terms has to take place as well.</Paragraph> </Section> <Section position="4" start_page="38" end_page="38" type="sub_section"> <SectionTitle> 5.6 Integration into an ontology </SectionTitle> <Paragraph position="0"> After the mapping phase newly learned concepts, instances or relations can be integrated into any domain-independent or even foundational ontology.</Paragraph> <Paragraph position="1"> If no corresponding concept can be found the next more general concept has to be determined by the techniques described above.</Paragraph> </Section> <Section position="5" start_page="38" end_page="38" type="sub_section"> <SectionTitle> 5.7 Evaluation </SectionTitle> <Paragraph position="0"> An evaluation of such a system can be divided into two types: one for the performance of the algorithms before the deployment of the system and one, which can be performed by a user during the run time of the system.</Paragraph> <Paragraph position="1"> Methodological evaluation Before integrating the framework into a dialog system or any other NLU system an evaluation of the methods and their results should take place. Therefore, a representative baseline has to be established and a gold-standard (Grefenstette, 1994) created, depending on the task which is in the target of the evaluation. The ensuing steps in this type of evaluation are shown in Figure stances and relations into the system's ontology. null Depending on the three steps the most adequate baseline method or algorithm for each of them has to be identified. In step 1 for the extraction of hypernyms a chance baseline as well as a majority class baseline will not do the job, because their performance would be too poor. Therefore, a well established algorithm which, for example applies a set of standard Hearst patterns (Hearst, 1992) would constitute a potential candidate. For the mapping from text to knowledge (see step 2) the baseline could be a established by standard stemming combined with string similarity metrics. In case of different source and goal languages an additional machine translation step would also become necessary. For the base-line of ontology evaluation a task-based framework as proposed by (Porzel and Malaka, 2005) could be employable.</Paragraph> <Paragraph position="2"> Evaluation by the user As soon as the framework is integrated into a dialog system the only way to evaluate it is by enabling the user to browse the ontological additions at his or her leisure and to decide whether terms have been understood correctly or not. In case two or more hypernyms are scored with the same - or quite similar - weights, this approach could also be quite helpful. An obvious reason for this circumstance is, that the term in question has more than one meaning in the same context. Here, only a further inquiry to the user can help to disambiguate the unknown term. In the Auerstein example a question like &quot;Did you mean the hotel or the restaurant?&quot; could be posed. Even though the system would show the user that it did not perfectly understand him/her, the user might be more contributory and less annoyed than with a question like &quot;What did you mean?&quot;. The former question could also be posed by a person familiar with the place, to disambiguate the question of someone in search for Auerstein and would therefore mirror a human-human dialogs, which in turn would furthermore lead to more natural human-computer dialogs.</Paragraph> </Section> </Section> class="xml-element"></Paper>