File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1418_metho.xml

Size: 20,160 bytes

Last Modified: 2025-10-06 14:07:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1418">
  <Title>Optimising text quality in generation from relational databases</Title>
  <Section position="4" start_page="133" end_page="135" type="metho">
    <SectionTitle>
3 The Structure of a Relational
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="133" end_page="134" type="sub_section">
      <SectionTitle>
Database
</SectionTitle>
      <Paragraph position="0"> Databases vary widely in form, so we have assumed a fairly&amp;quot; standard relational database format.</Paragraph>
    </Section>
    <Section position="2" start_page="134" end_page="134" type="sub_section">
      <SectionTitle>
3.1 Entity Files
</SectionTitle>
      <Paragraph position="0"> :.The database consists of.a number:.:of ~ntity files, each file providing the records for a different entity type. Each record (row) in the entity file defines a unique entity. The columns define attributes of the entities. In a museum domain, we might have an entity file for museum artifacts, another for people involved with the artifacts (designers, owners, etc.), another for locations, etc. See figure 2 for a sample entity file for the Jewellery domain. Given the wide .range of database formats..a~vailable, !LEX ~sumes a tab-delimited format for database files.</Paragraph>
      <Paragraph position="1"> ILEX imposes two requirements on the entity files it uses: 1. Single field key: while relational databases often use multiple attributes to form a unique key (e.g., name and birthdate), ILEX requires that each entity have a unique identifier in a single attribute. This identifier must be under a field labelled ID.</Paragraph>
      <Paragraph position="2"> 2. Typing of entities: ILEX depends strongly on a type system. We require that each entity record provides a type for the entity in a field labelled Class.</Paragraph>
      <Paragraph position="3"> Some other attribute labels are reserved by the system, allowing ILEX to deal intelligently with them, including Name, Short-Name and Gender.</Paragraph>
    </Section>
    <Section position="3" start_page="134" end_page="134" type="sub_section">
      <SectionTitle>
3.2 Link Files
</SectionTitle>
      <Paragraph position="0"> In some cases, an entity will have multiple fillers of an attribute, for instance, a jewellery piece may be made of any number of materials. Entity files, with fixed record structure, cannot handle such eases.</Paragraph>
      <Paragraph position="1"> The standard approach in relational databases is to provide a link file for each case where multiple fillers are possible. A link file consists of two columns only, one identifying the entity, the other identifying the filler (the name of the attribute is provided in the first line of the file, see figure 3).</Paragraph>
      <Paragraph position="2"> We are aware that the above specification represents an impoverished view of relational databases.</Paragraph>
      <Paragraph position="3"> Many relational databases provide far more than simple entity and link files. However, by no means all relational databases provide more than this, so we have adopted the lowest common denominator.</Paragraph>
      <Paragraph position="4"> Most relational databases can be exported in a form which meets our requirements.</Paragraph>
    </Section>
    <Section position="4" start_page="134" end_page="134" type="sub_section">
      <SectionTitle>
3.3 Terminology
</SectionTitle>
      <Paragraph position="0"> In the following discussion, we will use the following  terminology: * Predicate: each column of an entity file defines a predicate. Class, Designer and Date are thus predicates introduced in figure 2. Each link file also defines a predicate.</Paragraph>
      <Paragraph position="1"> (r) Record: each row of an entity table provides the attributes of a: single.,entity.: The row is termed a record in database terminology.</Paragraph>
      <Paragraph position="2"> (r) Fact: each entry in a record defines what we call a fact about that entity, a A fact consists of three parts: its predicate name, and two arguments, being the entity of the record, and the filler of the slot.</Paragraph>
      <Paragraph position="3"> (r) ARC1: the first argument of a fact, the entity the fact is about.</Paragraph>
      <Paragraph position="4"> . ARC2: the second argument of a fact, the filler of the attribute for the entity.</Paragraph>
      <Paragraph position="5"> 4 Specifying the Semantics of the</Paragraph>
    </Section>
    <Section position="5" start_page="134" end_page="135" type="sub_section">
      <SectionTitle>
Database
</SectionTitle>
      <Paragraph position="0"> A database itself says nothing about the nature of the contents of each field in the database. It might be a name, a date, a price, etc. Similarly for the field label: the field label names a relation between the entity represented by the record and the entity represented by the filler. However, without further specification, we do not know what this relationship entails, apart from the label itself, e.g., 'Designer'.</Paragraph>
      <Paragraph position="1"> Before we can begin to process a database intelligently, we need to define the 'semantics' of the database. This section will outline how this is done in the ILEX case. There has been some work on automatic acquisition of database semantics, such as in the construction of taxonomies of domain entity types (see Dale et al. (1998) for instance). However, it is difficult to perform this process reliably and in a domain-independent manner, so we have not attempted to in this case. The specification of domain semantics is still a manual process which has to be undertaken to link a database to the text generator.</Paragraph>
      <Paragraph position="2"> To use a database for generation, additional information of several kinds needs to be provided:  1. Taxonomic organisation: supplying of types for each database entity, and organisation of these types into taxonomies; 2. Taxonomic lexification: specif~'ing how each domain type is lexified; 3. Data type off attribute fillers: telling the system to expect the filler of a record slot to be an entity-id, a string, a date, etc.</Paragraph>
      <Paragraph position="3"> 4. Domain type specification:specifying What do- null main type the slot filler can be assumed to be.</Paragraph>
      <Paragraph position="4"> Each of these aspects of domain specification will be briefly described below.</Paragraph>
      <Paragraph position="5"> 3Excepting the first column, which provides the entity-id for tile record.</Paragraph>
    </Section>
    <Section position="6" start_page="135" end_page="135" type="sub_section">
      <SectionTitle>
4.1 Taxonomic Organisation
</SectionTitle>
      <Paragraph position="0"> ILEX requires that the entities of the domain are organised under a domain taxonomy. The user defines a basic type (e.g., jewellery), and then defines the sub-types of the basic-type, and perhaps further subclassification. Figure 4 shows the lisp forms defining a basic type in the jewellery domain, and the sub-classification of this type. The basic type is also mapped onto a type (or set of types) in the concept ontology used for sentence generation, a version of Penman's Upper Model (Bateman, 1990). This allows the sentence generator to reason about the objects it expresses.</Paragraph>
      <Paragraph position="1"> Taxonomic organisation is important for several reasons, including among others: 1. Expressing Entities: each type can be related to lexical items'to use,to-express that type (e.g., linking the type brooch to a the lexical item for &amp;quot;brooch&amp;quot;. If no lexical item is defined for a type, a lexical item associated with some super-type can be used instead. Other aspects of the expression of entities may depend on the conceptual type, for instance pronominalisation, deixis (e.g., mass or count entities), etc.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="135" end_page="136" type="metho">
    <SectionTitle>
2. Supporting Inferences and Generalisations:
</SectionTitle>
    <Paragraph position="0"> ILEX allows the user to assert generalisations about types, e.g., that Arts and Crafts jewellery tends to be made using enamel (see section 5.4).</Paragraph>
    <Paragraph position="1"> The type hierarchy is used to check whether a particular generalisation is appropriate for any given instance.</Paragraph>
    <Paragraph position="2"> The earlier version of ILEX, Ilex2.0, allowed the full representational power of the Systemic formalism for representing domain taxonomies, including cross-classification, and multiple inheritance (both disjunctive and conjunctive). However, our experiences with non-linguists trying to define domain models showed us that the more scope for expression, the more direction was needed. We thus simplified the formalism, by requiring taxonomies to be simple, with no cross-classification or multiple inheritance. We felt that the minor loss of expressivity was well balanced by the gain in simplicity for domain developers.</Paragraph>
    <Section position="1" start_page="135" end_page="136" type="sub_section">
      <SectionTitle>
4.2 Type Lexification
</SectionTitle>
      <Paragraph position="0"> To express each database entity, it is essential to be able to map from its defined type, to a noun to use in a referring expression, e.g., this brooch.</Paragraph>
      <Paragraph position="1"> Ilex comes with a basic lexicon already provided.</Paragraph>
      <Paragraph position="2"> covering the commonly occurring words. Each entry defines the svntactic and morphological information required for sentence generation. For these items, the domain developer needs to provide a simpl e mapping from domain type to lexical item, for instance, the following lisp form specifies that the domain type location should be lexified by the lexical item whose id is location=noun: (lexify location location-noun) For those lexical items not already defined, the domain developer needs to provide in addition lexical item definitions for the nouns expressing the types in their domain. A typical entry has the form shown in figure 5.</Paragraph>
    </Section>
    <Section position="2" start_page="136" end_page="136" type="sub_section">
      <SectionTitle>
4.3 Data Type of Slot Fillers
</SectionTitle>
      <Paragraph position="0"> Each field in a database record contains a string of characters. It is not clear whether this string is an identifier for another domain entity, a string (e.g., someone's surname), a date, a number, a type in the type hierarchy, etc.</Paragraph>
      <Paragraph position="1"> ILEX requires, for each entity file, a statement as to how the field fillers should be interpreted. See figure 6 for an example.</Paragraph>
      <Paragraph position="2"> Some special filler types have been provided to facilitate the import of structured data types. This includes both :date and :dimension in the current example. Special code has been written to convert the fillers of these slots into ILEX objects. Other special filler types are being added as needed.</Paragraph>
    </Section>
    <Section position="3" start_page="136" end_page="136" type="sub_section">
      <SectionTitle>
4.4 Domain Type of Slot Fillers
</SectionTitle>
      <Paragraph position="0"> The def-predicate form allows the domain developer to state what type the fillers of a particular field should be. This not only allows for type checking, but also allows the type of an entity to be inferred if not otherwise provided. For instance, by asserting that fillers of the Place field should of type city, the system can infer that &amp;quot;London&amp;quot; is a city even if</Paragraph>
    </Section>
    <Section position="4" start_page="136" end_page="136" type="sub_section">
      <SectionTitle>
4.5 Summary
</SectionTitle>
      <Paragraph position="0"> ..... '.:~With:just chisvmuch-semantics~specified,. ILEX e-an generate very poor texts, but texts which convey the content of the database records. In the next section, we will outline the extensions to the domain semantics which are needed to improve the quality of the text produced by ILEX.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="136" end_page="138" type="metho">
    <SectionTitle>
5 Extending Domain Semantics for
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="136" end_page="136" type="sub_section">
      <SectionTitle>
Improved Text Quality
</SectionTitle>
      <Paragraph position="0"> So far we have discussed only the simplest level of domain semantics, which allows a fairly direct expression of domain information. ILEX allows the domain developer to provide additional domain semantics to improve the quality of the text.</Paragraph>
    </Section>
    <Section position="2" start_page="136" end_page="137" type="sub_section">
      <SectionTitle>
5.1 Expression of Facts
</SectionTitle>
      <Paragraph position="0"> Unless told otherwise, ILEX will express each fact in a simple regular form, such as The designer of this brooch is Jessie M. King, using a template form4: The &lt;predicate&gt; of &lt;entity-expression&gt; is &lt;filler-expression&gt;.</Paragraph>
      <Paragraph position="1"> However, a text consisting solely of clauses of this form is unnatural, and depends on the predicate label being appropriate to the task (labels like given-by will produce nonsense sentences).</Paragraph>
      <Paragraph position="2"> To produce better text, ILEX can be told how to express facts. The domain developer can provide an optional slot to the &amp;f-predicate form as shown in figure 8. The expression specification first of all defines which verb to use in the expression. By default, the ARG1 element is mapped onto the Subject, and the ARG2 onto the Object. Default values are assumed for tense, modality, polarity, voice. finiteness, quantification, etc., unless otherwise specified. So, using the above expression specification, the Class fact of a jewel would be expressed by a clause like: This item is a brooch.</Paragraph>
      <Paragraph position="3"> To .produce less .standard expressions, we need to modify some of the defaults. A more complex expression specification is shown in figure 9, which would result in the expression such as: For further information, see Liberty Style Guide No. 326: 4ILEX3.0 borrowed this use of a default expression template from the POWER system (Dale et al., 1998). In previous versions of ILEX, all facts were expressed by full NLG as explained below.</Paragraph>
      <Paragraph position="4">  The expression form is used to construct a partial syntactic specification, which is then completed using the sentence generation module of the WAG sentence generator (O'Donnell, 1996).</Paragraph>
      <Paragraph position="5"> With the level of domain semantics specified so far, ILEX is able to produce texts such as the two below, which provides an initial page describing data-base entity BUNDY01, and then a subsequent page when more information was requested (this from the Personnel domain (Nowson, 1999)): o Page 1: Alan Bundy is located in room F1, which is in South Bridge. He lectures a course called Advanced Automated Reasoning and is in the Institute for Representation and Reasoning.</Paragraph>
      <Paragraph position="6"> He is the Head of Division and is a professor.</Paragraph>
      <Paragraph position="7"> * Page 2: As already mentioned, Alan Bundy lectures Advanced Automated Reasoning. AAR is lectured to MSc and AI4.</Paragraph>
      <Paragraph position="8"> This expression specification form has been designed to limit the linguistic skills needed for domain developers working with the system. Given that the domain developers may be museum staff, not computational linguists, this is necessary. The notation however allows for a wide range of linguistic expressions if the full range of parameters are used.</Paragraph>
    </Section>
    <Section position="3" start_page="137" end_page="137" type="sub_section">
      <SectionTitle>
5.2 User Adaption
</SectionTitle>
      <Paragraph position="0"> To enable the system to adapt its content to the type of user, the domain developers can associate information with each predicate indicating the system's view of the predicate's interest, importance, etc., to the user. This information is added to the d@predicate form, as shown in figure 10.</Paragraph>
      <Paragraph position="1"> The user annotations allowed by ILEX include:  1. Interest: how interesting does the system judge the information to be to the user; 2. Importance: how important is it to the system that the user reads the information; 3. Assimilation: to what degree does the system judge the user to already know the infornlation:  tem believe the user will absorb the information when presented (is one presentation enough?). This information influences what content will be expressed to a particular user, and in what order (more relevant on earlier pages). Information already assimilated will not be delivered, except when relevant for other purposes (e.g., when referring to the entity). If no annotations are provided, no user customisation will occur.</Paragraph>
      <Paragraph position="2"> The values in ILEX's user models have been set intuitively by the implementers. While ideally these values would be derived through user studies, our purpose was purely to test the adaptive mechanism, and demonstrate that it works. We .leave the development of real user models for later work.</Paragraph>
      <Paragraph position="3"> ILEX has opted out of using adaptive user modelling, whereby the user model attributes are adapted as a result of observed user choices in the web interface. We leave this for future research.</Paragraph>
    </Section>
    <Section position="4" start_page="137" end_page="138" type="sub_section">
      <SectionTitle>
5.3 Comparisons
</SectionTitle>
      <Paragraph position="0"> When describing an object, it seems sometimes useful to compare it to similar articles already seen.</Paragraph>
      <Paragraph position="1"> With small addition to the domain specification, ILEX can compare items (an extension by Maria Milosavljevic), as demonstrated in the following text: This item is also a brooch. Like the previous item, it was designed by King. However, it differs from the previous item in that it is made of gold and enamel, while the previous brooch was made of silver and enamel.</Paragraph>
      <Paragraph position="2"> For ILEX to properly compare two entities, it needs to Mmw how the various.attributes of the entity can be compared (nominal, ordinal, scalar, etc.). Again, information can be added to the d@predicate for each predicate to define its scale of comparability. See Milosavljevic (1997) and (1999) for more detail. Figure 11 shows the additions for the Designer predicate. Comparisons introduce several RST relations to the text structure, including rst-contrast, rst-similarity and rst-whereas.</Paragraph>
    </Section>
    <Section position="5" start_page="138" end_page="138" type="sub_section">
      <SectionTitle>
5.4 Generalisations
</SectionTitle>
      <Paragraph position="0"> We found it useful to allow facts about general types of entities to be asserted, for instance, that Arts and Crafts jewellery tend to be made of enamel. These generalisations can then be used to improve the quality of text, producing object descriptions as in the following: This brooch is in the Arts and Crafts style.</Paragraph>
      <Paragraph position="1"> Arts and Crafts jewels tend to be made of enamel. However, this one is not.</Paragraph>
      <Paragraph position="2"> These generalisations are defined using defeasible implication - similar to the usual implication, but working in terms of few, many, or most rather than all or none. They are entered in a form derived from first order predicate calculus, for instance, see figure 12 which specifies that most Arts and Crafts jewellery uses enamel.</Paragraph>
      <Paragraph position="3"> ILEX find each instance which matches the general type (in this case, instances of type jewellery which have Arts and Crafts in the Style role). If the fact about the generic object has a corresponding fact on the instantial object, an exemplification relation is asserted between the facts. Otherwise, a *concession relation is asserted. See Knott et al.</Paragraph>
      <Paragraph position="4"> (1997) for more details on this procedure.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML