File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/w06-3505_relat.xml
Size: 5,709 bytes
Last Modified: 2025-10-06 14:15:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3505"> <Title>Scaling Natural Language Understanding via User-driven Ontology Learning Berenike Loos</Title> <Section position="3" start_page="33" end_page="34" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> The capability to acquire knowledge exactly at the time it is needed can be regarded as an important stepping stone towards scalable natural language understanding systems. The necessity of scalability in NLU became more and more obvious in open-domain dialog systems, as the knowledge base integrated into those can never be complete. Before the emergence of open-domain systems, more or less complete ontologies were modeled manually for the domain needed in the NLU system and were therefore not scalable to additional domains, unless modeled in advance in a manual fashion or by means of off-line ontology learning. Nonetheless, numerous off-line ontology learning frameworks exist, which alleviate the work of an ontology engineer to construct knowledge manually (Maedche, 2002), (Schutz and Buitelaar, 2005), (Cimiano et al., 2005).</Paragraph> <Paragraph position="1"> Most of these frameworks apply hybrid methods to optimize their learning results.</Paragraph> <Paragraph position="2"> For example, the ontology population method OntoLearn (Navigli et al., 2004) is based on text mining and other machine learning techniques and starts with a generic ontology like WordNet and documents in a given domain. The result is a domain extended and trimmed version of the initial ontology. For this, the system applies three phases to learn concepts: * First, a terminology extraction method, using shallow techniques that range from stochastic methods to more sophisticated syntactic approaches, is applied, which extracts a list of domain terms (mostly nouns and proper nouns) from a set of documents representative for a given domain.</Paragraph> <Paragraph position="3"> * Second, a semantic interpretation takes place which makes use of a compositional interpretation and structural semantic interconnections. * After these two phases the extending and trimming of the initial ontology takes place. With the help of the semantic interpretation of the terms they can be organized in sub-trees and appended under the appropriate node of the initial ontology applying linguistic rules.</Paragraph> <Paragraph position="4"> The text understanding system SYNDICATE (SYNthesis of DIstributed Knowledge Acquired from Texts) uses an integrated ontology learning module (Hahn and Marko, 2002). In this approach new concepts are learned with the help of text understanding, which applies two different sources of evidence, namely, the prior knowledge of the topic domain of the texts and grammatical constructions in which unknown lexical items occur in the texts.</Paragraph> <Paragraph position="5"> In an incremental process a given ontology is updated as new concepts are acquired from real-world texts. The acquisition process is centered on the linguistic and conceptual &quot;quality&quot; of various forms of evidence underlying the generation and refinement of concept hypotheses. On the basis of the quality of evidence, concept hypotheses are ranked according to credibility and the most credible ones are selected for assimilation into the domain knowledge base.</Paragraph> <Paragraph position="6"> The project Disciple (Stanescu et al., 2003) builds agents which can be initially trained by a sub-ject matter expert and a knowledge engineer, in a way similar to how an expert would teach an apprentice. A Disciple agent applies two different methods for ontology learning, i.e. exception-based and example-based ontology learning. The exception-based learning approach consists of four main phases: * First, a candidate discovery takes place, in which the agent analyzes a rule together with its examples, exceptions and the ontology and finds the most plausible types of extensions of the latter that may reduce or eliminate the rule's exceptions.</Paragraph> <Paragraph position="7"> * In the second phase the expert interacts with the agent to select one of the proposed candidates.</Paragraph> <Paragraph position="8"> * Afterwards the agent elicits the ontology extension knowledge from the expert and finally a rule refinement takes place, in which the agent updates the rule and eliminates its exceptions based on the performed ontology extension.</Paragraph> <Paragraph position="9"> * When the subject matter expert has to specify a fact involving a new instance or new feature in the agent teaching process, the example-based learning method is invoked. In this process the agent tries to find example sentences of the words next to a new term through various heuristics. For instance, he finds out that X is member of Y, and consequently can ask the expert. If he affirms, the new term can be memorized. null All of the approaches described above exhibit theoretical as well as practical (in the light of the task undertaken herein) shortcomings. The theoretical problems that have not been resolved in a satisfactory manner by the works described above (as well as numerous others) are: * a clear separation of the linguistic and ontological subtasks involved in the overall ontology learning endeavor * systematic ways and methods for evaluating the individual learning results * rigorously defined baselines against which to evaluate the ensuing learning approaches.</Paragraph> <Paragraph position="10"> In the following I will describe how these issues can be addressed within the user-driven ontology learning framework proposed herein.</Paragraph> </Section> class="xml-element"></Paper>