File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1617_intro.xml

Size: 12,846 bytes

Last Modified: 2025-10-06 14:03:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1617">
  <Title>Exploiting OWL Ontologies in the Multilingual Generation of Object Descriptions</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A strand of work in Natural Language Generation (NLG) has been devoted to the generation of textual descriptions of objects from symbolic information in ontologies and databases.</Paragraph>
    <Paragraph position="1"> An example of such work is ILEX [O'Donnell et al., 2001], which was demonstrated mostly in the museums domain, where it could produce personalised English descriptions of exhibits; the Power system [Dale et al., 1998] is another example from the same domain. More recently, the M-PIRO project [Isard et al., 2003] developed a multilingual extension of ILEX, which has been tested in a variety of domains, including museum exhibits and items for sale.1 A major problem in this and many other NLG subareas is the difficulty of obtaining source symbolic information in forms compatible with the requirements of the language generators. This issue has mainly been addressed so far by extracting source information from structured and semi-structured data [Dale et al., 1998], and by developing authoring tools that help in the creation of source information and domain-dependent linguistic resources. Such tools were developed, for example, in GIST [Power and Cavallotto, 1996], DRAFTER [Hartley and Paris, 1997], ITRI's WYSIWYM systems [Van Deemter and Power, 2003], and M-PIRO [Androutsopoulos et al., 2002].</Paragraph>
    <Paragraph position="2"> In recent years, considerable effort has been invested in the Semantic Web, which can be seen as an attempt to develop mechanisms that will allow computer applications to reason more easily about the semantics of the resources (documents, services, etc.) of the Web. A major target is the development of standard representation formalisms, that will allow ontologies to be published on the Web and be shared by different 1M-PIRO was an IST project of the European Union. It ran from 2000 to 2003. Its partners were: the University of Edinburgh, ITCirst, NCSR &amp;quot;Demokritos&amp;quot;, the National and Kapodistrian University of Athens, the Foundation of the Hellenic World, and System Simulation. This paper includes additional work, carried out at the Athens University of Economics and Business and NCSR &amp;quot;Demokritos&amp;quot;. computer applications. The emerging standard for specifying ontologies is OWL, an extension of RDF.2 In NLG systems that describe objects, pre-existing OWL ontologies can provide much of the required source information, reducing the authoring effort and providing a common standard representation to generate from.3 We discuss the role that OWL ontologies can play in M-PIRO's authoring process, and report on progress we made towards extending M-PIRO's authoring tool to support OWL. We argue that the benefit from using OWL would be greater, if the ontologies included the domain-dependent linguistic resources and user modelling information that NLG systems need. This would allow content to be published on the Sematic Web in the form of OWL ontologies, with different NLG engines acting as browsers responsible for rendering the content in different natural languages and tailoring it to the interests and interaction history of the users. A challenge for the NLG community, then, is to agree upon standards on how linguistic resources and user modelling information should be embedded in OWL ontologies.</Paragraph>
    <Paragraph position="3"> Section 2 below introduces briefly M-PIRO and its authoring tool. Section 3 then shows how M-PIRO's ontologies can be expressed in OWL, and presents facilities we have added to the authoring tool to export ontologies in OWL. Among other benefits, this allows machine-generated texts to be published on the Web along with the ontology they were generated from, and to be annotated with OWL entries that express their semantics in terms of the ontology, making the semantics accessible to computer applications. Section 4 subsequently discusses how existing OWL ontologies can be imported into the authoring tool, and the benefits that this brings. Our import facilities currently support only a subset of OWL; part of section 4 is devoted to problems that remain to be solved. Section 5 focuses on the need to establish standards to embed linguistic resources and user modelling information in OWL ontologies, and how this would allow NLG engines to become the browsers of the Semantic Web. Section 6 concludes and summarises directions for future research.</Paragraph>
    <Paragraph position="4">  mains, to modify the domain-dependent resources: the ontology, some language resources, and the end-user stereotypes.</Paragraph>
    <Paragraph position="5"> M-PIRO generates texts from an ontology that provides information on the entities of a domain (e.g., the statues and artists in a museum), the relationships between the entities (e.g., the association of statues with their artists), and the entities' attributes (e.g., their names or dimensions). Entities are not necessarily physical objects; they may be abstract concepts (e.g, historical periods). They are organized in a taxonomy of entity types, as illustrated in Figure 1, where 'exhibit' and 'historical-period' are basic entity types, i.e., they have no super-types. The 'exhibit' type is further subdivided into 'coin', 'statue', and 'vessel'. The latter has the sub-types 'amphora', 'kylix', and 'lekythos'. Each entity belongs to a particular type; e.g., 'exhibit22' belongs to 'kylix', and is, therefore, also a 'vessel' and an 'exhibit'. For simplicity, M-PIRO adopts single inheritance, i.e., a type may not have more than one parents, and an entity may not belong to more than one types.4 This introduces some problems when importing OWL ontologies; related discussion follows.</Paragraph>
    <Paragraph position="6"> Relationships are expressed using fields. It is possible to introduce new fields at any entity type, which then become available at all the entities of that type and its subtypes. In Figure 1, the fields 'painting-technique-used', 'paintedby', and 'potter-is' are introduced at the type 'vessel'. (The top right panel shows the fields of the type selected in the left panel.) Hence, all entities of type 'vessel' and its subtypes, i.e., 'amphora', 'kylix', and 'lekythos', carry these fields. Furthermore, entities of type 'vessel' inherit the fields 'creation-period', 'current-location', etc., up to 'references', which are introduced at the 'exhibit' type. (The 'images' field is used to associate images with entities.) The fillers of each field, i.e., the possible values, must be entities of a particular type. In Figure 1, the fillers of 'potter-is' are of type 'potter'; hence, the entities 'sotades' and 'aristos' are the only possible values. To represent that a particular 'vessel' entity was cre4M-PIRO's core language generator actually supports some forms of multiple inheritance, but the authoring tool does not.</Paragraph>
    <Paragraph position="7">  ated during the classical period by 'aristos', one would fill in that entity's 'creation-period' with 'classical-period', and its 'potter-is' with 'aristos'. Figure 2 shows the fields of entity 'exhibit22', and the resulting English description. M-PIRO supports English, Greek, and Italian; descriptions can be generated in all three languages from the same ontology.</Paragraph>
    <Paragraph position="8"> The 'Many' column in Figure 1 is used to mark fields whose values are sets of fillers of the specified type. In the 'made-of' field, this allows the value to be a set of materials (e.g., gold and silver). It is, thus, possible to represent many-to-one (e.g., only one material per exhibit) and many-to-many relationships (many materials per exhibit), but not one-to-one relationships (e.g., a unique social security code per person).</Paragraph>
    <Paragraph position="9"> OWL, in contrast, supports one-to-one relationships.</Paragraph>
    <Paragraph position="10"> Fields are also used to represent attributes of entities (e.g., their names or dimensions). Several built-in data-types are available ('string', 'number', 'date', etc.), and they are used to specify the possible values of attribute-denoting fields. The 'Many' column also applies to attributes. In Figure 1, the values of 'references' and 'exhibit-purpose' are strings. The two fields are intended to hold canned texts containing bibliographic references and descriptions of what a particular exhibit was used for; e.g., &amp;quot;This statue honours the memory of Kroissos, a young man who died in battle&amp;quot;. Information can be stored as canned text in string-valued fields when it is difficult to represent in symbolic form. The drawback is that canned texts have to be entered in all three languages.</Paragraph>
    <Paragraph position="11"> The authoring tool also allows the authors to specify user types, i.e., types of end-users the texts are intended for (e.g., 'average-adult', 'child'), and stereotypes. The latter assign, for each user type, values to parameters that control, for example, the length of the texts, or the extent to which aggregating clauses to form longer sentences is allowed. The stereotypes also specify how interesting each field is for each user type; this allows the system to tailor the content of the descriptions to the users' interests. M-PIRO employs additional personal user models, where it stores the interaction history of each particular end-user, allowing, for example, the system to generate comparisons to previously seen objects.</Paragraph>
    <Paragraph position="12"> M-PIRO uses systemic grammars, one for each language, to convert sentence specifications to surface text. The grammars can be used in a variety of object description applications without modifications, and, hence, can be treated as domain-independent for M-PIRO's purposes. However, a part of the lexicon that the grammars employ, known as the domain-dependent lexicon, has to be filled in by the authors when the system is ported to a new application. The domain-dependent lexicon contains entries for nouns and verbs; when moving to a new application, it is initially empty. The authors enter the base forms of the nouns and verbs they wish the system to use, and there are facilities to generate the other forms automatically. Noun entries are linked to entity types, to allow, for example, the system to generate referring noun phrases; in Figure 1, the entity type 'vessel' is associated with the lexicon entry 'vessel-noun' (see the area next to 'Edit nouns'). The entries are trilingual; e.g., 'vessel-noun' contains the nouns &amp;quot;vessel&amp;quot;, &amp;quot;aggepsilon1iprimeo&amp;quot;, and &amp;quot;vaso&amp;quot; of the three languages. For each field and each language, the authors have to provide at least one micro-plan, that specifies how the field can be expressed as a clause in that language. Following ILEX, M-PIRO's primary form of micro-plans are clause plans, where the author specifies the clause to be generated in abstract terms, by selecting the verb to be used (from the domain-dependent lexicon), the voice and tense of the resulting clause, etc. As with nouns, verb-entries are trilingual; e.g., the 'paint-verb' entry of the clause plan of Figure 1 contains the base verb forms &amp;quot;paint&amp;quot;, &amp;quot;zographiprimezo&amp;quot;, and &amp;quot;dipingere&amp;quot;. By default, the entity that carries the field becomes the subject of the resulting clause, and the filler of the field the object. The clause plan of Figure 1 leads to clauses like &amp;quot;This vessel was painted by Eucharides&amp;quot;. Appropriate referring expressions, e.g., &amp;quot;Eucharides&amp;quot;, &amp;quot;a painter&amp;quot;, &amp;quot;him&amp;quot;, are generated automatically. Alternatively, micro-plans can be specified as simplistic templates, i.e., sequences of canned strings and automatically generated referring expressions; see [Androutsopoulos et al., 2002] for details.</Paragraph>
    <Paragraph position="13"> Unlike ILEX, M-PIRO allows multiple micro-plans to be specified per field, and this allows greater variety in the generated texts. Furthermore, the user stereotypes can be used to indicate that particular micro-plans are more appropriate to particular user types, and this allows the system to tailor the expressions it produces. When planning the text, M-PIRO attempts to place clauses that convey more interesting fields towards the beginning of the text. It is also possible for the authors to specify particular orderings; otherwise, M-PIRO's text planner is domain-independent.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML