File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/91/w91-0217_abstr.xml
Size: 7,358 bytes
Last Modified: 2025-10-06 13:47:22
<?xml version="1.0" standalone="yes"?> <Paper uid="W91-0217"> <Title>PROPERTY NATURE: STRUCTURE: ORIGIN: STATE: TASTE: SMELL:</Title> <Section position="2" start_page="0" end_page="189" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> The paper focuses on the description of the approach, taken within the ESPRIT BRA project ACQUILEX, towards: i) acquisition of semantic information from several machine- readable dictionaries (in four languages), and ii) its representation in a common Lexical Knowledge Base. Knowledge extraction is guided by a) empirical observations and b) theoretical hypotheses. As for representation, we stress the convergence of a) and b) towards the possibility of organizing the information extracted from MRDs in the form of 'meaning types' or 'templates', where a common meta-language is used to encode conceptual and relational information. Examples taken from two Italian monoUngual dictionaries and from LDOCE are given. Different uses of these templates (e.g. as guides in the semantic analysis of the definitions, as a structure for comparing, unifying, merging, integrating information coming from different sources and different languages, as a tool for correcting 'incoherences' in dictionaries, etc.) are described.</Paragraph> <Paragraph position="1"> Keywords: Computational Lexicography, Lexical Knowledge Base, Lexical Semantics. null 1 Large computational lexicons and the notion of &quot;reusability&quot; In order to cope with the task of building large computational lexicons where, to be able to process real texts, hundreds of thousands of words are necessary and, moreover, where also semantic information is made explicit for very large portions of the lexicon, the notion of &quot;reusability&quot; has become a central notion in the field of computational lexicography. This concept came out at the Grosseto Workshop (1986) on &quot;Automating the Lexicon&quot;, sponsored by the EC (see Walker, Zampolli, Calzolari, forthcoming), where, among the set of recommendations, there was that of designing &quot;large reusable, multifunctional, precompetitive, multilingual linguistic resources&quot;.</Paragraph> <Paragraph position="2"> Reusable must be interpreted in two main senses: reusable_l: to exploit and reuse lexical information implicitly or explicitly present in preexisting lexical resources (MRDs, terminological DBs, textual corpora, etc.) as an aid to construct large computational lexicons of the type reusable_2; reusable_2: to construct Computational Lexicons in such a way that various users (different NLP systems - in different theoretical frameworks and for different applications, but also human users such as lexicographers, linguists, common users) can extract - with appropriate interfaces - relevant lexical information.</Paragraph> <Paragraph position="3"> Current work on Computational Lexicons can be divided into two major types, each corresponding to the two meanings above: reusable_l and reusable_2.</Paragraph> <Paragraph position="4"> ACQUILEX, an ESPRIT BRA project, can be seen as the prototype of the first of these main streams of research, linked with the notion 'reusable-l', while other projects as e.g. Eurotra-7 insert themselves in the second sense of the 'reusability' concept. 2 MRDs as implicit Knowledge Bases ACQUILEX (see Boguraev et al. 1988) focuses its research effort in developing techniques and methodologies for utilising and interpreting existing machine-readable dictionaries (MRD) to construct components for NLP systems. The main focus of the project is in the extraction of lexical -- syntactic and semantic -- information from multiple machine-readable dictionaries in a multilingual context with the overall goal of constructing a single multilingual lexical knowledge base (LKB). The dictionaries we are actually using in the project are: two monolingual English, two Italian, one Dutch, one Spanish, one bilingual Italian - English, one Dutch - English.</Paragraph> <Paragraph position="5"> The information extracted is not only the information which is already explicit in MRDs (word-lists, part-of-speech, etc.), but mainly the information which in MRDs is only implicitly present and not directly and immediately accessible (mostly semantic information, such as semantic taxonomies, other semantic relations, argument structures, etc.). In the final LKB prototype it will be possible to &quot;navigate&quot; within the lexicon with access also through concepts and semantic relations.</Paragraph> <Paragraph position="6"> In this approach it is considered possible a procedural exploitation of the full range of semantic information implicitly contained in MRDs. The dictionary is therefore considered in this framework as a primary source of &quot;basic general knowledge&quot;, and main objectives are word-sense acquisition and knowledge organization. The main sources of this information are natural language definitions. The reasons of their use can be found in the following aspects: i) the lexicographic tradition has exerted a (usually unconscious) control over the defining vocabulary (statement made really explicit only in LDOCE) and the schemata of defining formulas; ii) the texts of definitions do not describe singular objects or events but &quot;typical&quot; ones; iii) lexicographers have translated the concepts in their mind into definitions, and we can try to move back along this path from definitions to concept acquisition; iv) the definitions incorporate a naif view of the semantic and world-knowledge information attached to lexical entries.</Paragraph> <Paragraph position="7"> The goal of ACQUILEX is the formalization of this basic general knowledge (which can also be considered as a prerequisite to domain-specific knowledge) in the form of concepts and semantic relations. The method is heuristic and mainly inductive, through progressive generalization from the common elements.</Paragraph> <Paragraph position="8"> The main themes of research connected to this goal of knowledge acquisition are the following: the design of procedures for the extraction of superordinates from natural language definitions, for their disambiguation, and for the construction of taxonomies all over the lexicon; the design of procedures for the (linguistic and computational) analysis of natural language definitions with the aim of extracting all the implicit semantic information; the study of ways of formally representing the semantic information which is extracted -- concepts, attributes, and relations between concepts -- e.g. in the form of 'typed feature structures'; * the study of how to link and unify taxonomies and conceptual or relational information coming from different sources, either monolingual or multilingual; * the design and implementation of basic software for the creation, access and processing of lexical databases and a lexical knowledge base.</Paragraph> <Paragraph position="9"> These research themes tackled within ACQUILEX are aimed at meeting one of the major bottlenecks of natural language processing, i.e. the availability of &quot;large&quot; computational lexicons with particular emphasis on making also semantic information explicit and accessible.</Paragraph> </Section> class="xml-element"></Paper>