File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1905_metho.xml
Size: 10,265 bytes
Last Modified: 2025-10-06 14:08:43
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1905"> <Title>RDF Instantiation of ISLE/MILE Lexical Entries</Title> <Section position="3" start_page="3" end_page="3" type="metho"> <SectionTitle> 2 The MILE Lexical Model </SectionTitle> <Paragraph position="0"> The MILE Lexical Model (MLM) consists of two primary components: a mono-lingual component and a multi-lingual component. The mono-lingual It should be noted that this architecture is analogous to other data models, including ER diagrams and various knowledge representation schemes.</Paragraph> <Paragraph position="1"> We have in fact produced a version of the prototype ISLE lexical entry in an XML format instantiating the proposed ISO pivot format (Ide and Romary, Vassar/LORIA internal document).</Paragraph> <Paragraph position="2"> component comprises three layers: morphological, syntactic, and semantic. The overall architecture is shown in Figure 1.</Paragraph> <Paragraph position="3"> Within each of the MLM layers, two types of objects are defined: 1. MILE Lexical Classes (MLC): the main building blocks of lexical entries. They formalize the basic lexical notions for each layer defined in the ISLE project (Calzolari et al. 2003). The MLM defines each class by specifying its attributes and the relations among them. Classes represent notions like syntactic feature, syntactic phrase, predicate, semantic relation, synset, etc. Instances of MLCs are the MILE Data Categories (MDC). So for instance, NP and VP are data category instances of the class <Phrase>, and SUBJ and OBJ are data category instances of the class <Function>. Each MDC is identified by a URI. MDC can be either user- defined or reside in a shared repository.</Paragraph> <Paragraph position="4"> 2. lexical operations: special lexical entities which allow users to state conditions and perform complex operations over lexical entries. They will for instance allow lexicographers to establish multilingual conditions, link the slots within two different syntactic frames, link semantic arguments with syntactic slots, etc.</Paragraph> <Paragraph position="5"> The MLM is described with Entity-Relationship (E-R) diagrams defining the entities of the lexical model and the way they can be combined to design an actual lexical entry. As such, the MLM does not correspond to a specific lexical entry, but is rather an entry schema corresponding to a lexical metaentry. This means that different possible lexical entries can be designed as instances of the schema provided by the MLM. Instance entries might therefore differ for the type of information they include (e.g. morphological, syntactic, semantic, monolingual or multilingual, etc.), and for the depth of lexical description.</Paragraph> <Paragraph position="6"> Figure 2 depicts the MLM classes and relations for the syntactic layer (SynU for &quot;syntactic unit&quot;). Full definitions for the MLM can be found in the ISLE</Paragraph> </Section> <Section position="4" start_page="3" end_page="4" type="metho"> <SectionTitle> 3 RDF instantiation </SectionTitle> <Paragraph position="0"> We have created an RDF schema for the syntactic layer of the ISLE/MILE lexical entry and instantiated one entry in several alternative forms to explore its potential as a representation for lexical data that can be integrated into the Semantic Web. The following describes the various components.</Paragraph> <Paragraph position="1"> 3.1.1 RDF schema for ISLE lexical entries An RDF schema defines classes of objects and their relations to other objects. It does not in itself comprise an instance of these objects, but simply specifies the properties and constraints applicable to objects that conform to it.</Paragraph> <Paragraph position="2"> The RDF schema for the syntactic layer of ISLE lexical entries can be accessed at http://www.cs.vassar.edu/~ide/rdf/isle-schemav.6. The classes and relations (properties) defined in the schema correspond to the ER diagrams in Calzolari et al. (2003). The schema indicates that there is class of objects called Entry; a property declaration indicates that the relation hasSynU holds between Entry objects and SynU objects. Note that classes can be defined to be subclasses of other classes, in which case properties associated with the parent class are inherited. In the ISLE schema, for example, the objects Self and SlotRealization are defined to be sub-classes of PhraseElement, and the hasPhrase property holds between any object of type PhraseElement (including its sub-classes) and objects of type Phrase.</Paragraph> <Paragraph position="3"> The ISLE RDF schema and entries have been validated using the ICS-FORTH Validating RDF Parser (VRP v2.1), which analyzes the syntax of a given RDF/ XML file according to the RDF</Paragraph> <Section position="1" start_page="3" end_page="4" type="sub_section"> <SectionTitle> Model and Syntax Specification </SectionTitle> <Paragraph position="0"> and checks whether the statements contained in both RDF schemas and resource descriptions satisfy the semantic constraints derived by the RDF Schema description for &quot;eat&quot;, instantiated as RDF objects. The first is a &quot;full&quot; version in which all of the information is specified, including atomic values (strings) at the leaves of the tree structure. The second two versions, rather than specifying all information explicitly, rely on the existence of a Data Category Registry (DCR) in which pre-defined lexical objects are instantiated and may be included in the entry by a direct reference.</Paragraph> <Paragraph position="1"> The potential to develop a Data Category Registry in which lexical objects are instantiated in RDF is one of the most important for the creation of multi-lingual, reusable lexicons. It allows for the following: 1. specification of a universally accessible, standard set of morphological, syntactic, and semantic information that can serve as a reference for lexicons creators; 2. a fully modular specification of lexical entities that enables use of all or parts of the lexical information in the repository as desired or appropriate, to build more complex lexical information modules; 3. a template for data category description that lexicon creators can use to create their own data categories at any level of granularity; 4. means to reuse lexical specifications in entries sharing common properties, thereby eliminating redundancy as well as providing direct means to identify lexical entries or sub-entries with shared properties; 5. a universally accessible set of lexical information categories that may be used in applications or resources other than lexicons. Note that the existence of a repository of lexical objects, instantiated and specified at different levels of complexity, does not imply that these objects must be used by lexicon creators. Rather, it provides a set of &quot;off the shelf&quot; lexical objects which either may be used as is, or which provide a departure point for the definition of new or modified categories.</Paragraph> <Paragraph position="2"> The examples in Appendix A provide a general idea of how a repository of RDF-instantiated lexical objects can be used. Sample repositories at three different levels of granularity, corresponding to the examples in Appendix A, are given in Appendix B: 1. a repository of enumerated classes for lexical objects at the lowest level of granularity; this comprises a definition of sets of possible values for various lexical objects. Any object of this type must be instantiated with one of the listed values.</Paragraph> <Paragraph position="3"> 2. a repository of phrase classes which instantiate common phrase types, e.g., NP, VP, etc.</Paragraph> <Paragraph position="4"> 3. a repository of constructions containing instantiations of common syntactic constructions (e.g., for verbs which are both transitive and intransitive, as shown in the example).</Paragraph> <Paragraph position="5"> The example entries demonstrate three different possibilities for the use of information in the repositories: 1. Entry 1 uses only the enumerated classes in the LDCR for SynFeatureName and SynFeatureValue. Note that in this case, the LDCR only provides a closed list of possible values, from which the assigned value in the entry must be chosen.</Paragraph> <Paragraph position="6"> 2. Entry 2 refers to instances of phrase objects in the LDCR rather than including them in the entry; this enables referring to a complex phrase (Vauxhave in the example) rather than including it directly in the entry, and provides the potential to reuse the same instance by reference in the same or other entries (this is done with NP in the example).</Paragraph> <Paragraph position="7"> 3. Entry 3 takes advantage of construction instances in the LDCR, thus eliminating the full specification in the entry and, again, allowing for reuse in other entries.</Paragraph> </Section> </Section> <Section position="5" start_page="4" end_page="4" type="metho"> <SectionTitle> 5 Summary </SectionTitle> <Paragraph position="0"> This exercise is intended to exemplify how RDF may be used to instantiate lexical objects at various levels of granularity, which can be used and reused to create lexical entries within a single lexicon as well as across lexicons. By relying on the developing standardized technologies underlying the Semantic Web, we ensure universal accessibility and commonality.</Paragraph> <Paragraph position="1"> Ultimately, lexical objects defined in this way can be used not only for lexicons, but also in language processing and other applications.</Paragraph> <Paragraph position="2"> This example serves primarily as a proof of concept that may be refined and modified as we consider in more depth the exact RDF representation that would best serve the needs of lexicon creation. However, the potential of exploiting the developments in the Semantic Web world for lexicon development should be clear. More importantly, by situating our work in the context of W3 standards, we are in step with ISO TC37/SC4 vision of a Linguistic Annotation Framework that includes a Data Category Registry of the type we describe here.</Paragraph> </Section> class="xml-element"></Paper>