XML Viewer - a97-1040

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1040_metho.xml
Size: 34,059 bytes
Last Modified: 2025-10-06 14:14:30
<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1040">
  <Title>Multilingual Generation and Summarization of Job Adverts: the TREE Project</Title>
  <Section position="3" start_page="269" end_page="269" type="metho">
    <SectionTitle>
2 Overall design
</SectionTitle>
    <Paragraph position="0"> The TREE system stores job ads in a partly language-independent schematic form, and is accessed by job-seeking users who can specify a number of parameters which are used to search the job database, and who can also customize the way the information retrieved is presented to them. A second type of user is the potential employer who provides job announcements to the system in the form of free text via an e-mail feed or, it is planned, via a form-filling interface (though we shall not discuss this latter input mode here).</Paragraph>
    <Paragraph position="1"> The initial prototype system currently implemented can store and retrieve job ads in three languages - English, Flemish and French - regardless of which of these three languages the job was originally drafted in.</Paragraph>
    <Paragraph position="2"> The system has four key components which are the subject of this paper. Telematics, HCI and certain other issues such as maintenance of the system (deleting old ads, user training, legality of texts in different countries) and the information retrieval aspects of the system will not be discussed in this paper. null The four components which we discuss here are: (a) the schema data structure for storing the job ads, and the associated terminological and lexical databases; (b) the analysis module for converting job ads received into their schematic form; (c) the query interface to allow users to specify the range of job ads they wish to retrieve; and (d) the generator, which creates a customised selective summary of the job ads retrieved in HTML format. To a great extent, the design of each of these modules is not especially innovative. However, the integration of all these functions is, from a methodological point of view, a good example of how a variety of techniques can be combined into a real application with a real use in the real world.</Paragraph>
  </Section>
  <Section position="4" start_page="269" end_page="269" type="metho">
    <SectionTitle>
3 Data Structures
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="269" end_page="269" type="sub_section">
      <SectionTitle>
3.1 Job ad representation schema
</SectionTitle>
      <Paragraph position="0"> Job ads are stored in the system in a &amp;quot;schema&amp;quot;, which is a typed feature structure consisting of named slots and fillers. The slots, some of which have a simple internal structure of their own, identify elements of the job ad. Many, though not all of the slots can be specified as part of the search, and all of them can be generated as part of the job summary.</Paragraph>
      <Paragraph position="1"> The fillers for the slots may be coded language-independent references to the terminological database, source-language strings which can nevertheless be translated on demand with reference to the &amp;quot;lexicon&amp;quot;, or literal strings which will not be translated at all. The stylised partial example of a filled schema in Figure 1 gives an impression of the data structure. The distinction between terms and items in the lexicon is discussed below, but we consider first the design and implementation of the schema database.</Paragraph>
      <Paragraph position="2">  Slot names are shown in CAPITALS, fillers in quote marks are stored as strings; other fillers are coded.</Paragraph>
      <Paragraph position="3"> JOB: waiter JOBCODE: 92563 NUMBER_OF_JOBS: several LOCATION: &amp;quot;Urmston&amp;quot; WORKTIME: 2 SKILLS:EXPERIENCE:essential APPLICATION:PHONE: 224 8619</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="269" end_page="271" type="metho">
    <SectionTitle>
CONTACT NAME: &amp;quot;Andrea&amp;quot;
</SectionTitle>
    <Paragraph position="0"> ORIGINAL_TEXT: &amp;quot;Urgent!!! P/T Waiters required, Urmston area. Experience essential. Phone Andrea on 224 8619.&amp;quot; The main aim of the schema is to represent in a consistent way the information which the analysis module extracts from the job ads, which the query module searches, and from which the generation module produces text. Note that the example shown in Figure 1 is rather simplified for the purposes of illustration. The schema module provides a database of job schema instances (Onyshkevych, 1990). The analysis and design phases were conducted using the OMT (Rumbaugh, 1990) object-oriented methodology. Since the system currently treats three languages (with the prospect of extension to more), we decided to codify in a language-neutral fashion the information extracted from the ads, converting equivalent linguistic terms into codes and vice versa via the analysis and generation modules described below.</Paragraph>
    <Section position="1" start_page="269" end_page="271" type="sub_section">
      <SectionTitle>
3.2 Terminology
</SectionTitle>
      <Paragraph position="0"> The terminology module has been designed with the general aim of supporting all the common functionalities shared by the analysis, generation and query modules and of supporting a language-independent term bank to permit multilingual handling of the schema database contents. We have focused on domain-specific terms and classifications, not covering generic language issues nor providing a general lexicon and thesaurus.</Paragraph>
      <Paragraph position="1"> Different kinds of domain-specific information can be found as slot fillers, depending on the intended meaning of schema slots. The most relevant information is obviously job types. Existing job classifications have been established for example by the European Commission's Employment Service (EU RES, 1989), by the ILO (ILO, 1990) and several individual companies; each provides a hierarchical classification of jobs, specifying, for each term, a distinct code, a description of the job, one or more generic  terms commonly used to refer to the specific job, and possibly a set of synonyms. The description of the job ranges, depending on the classification, from a quite broad one to greatly detailed ones, sometimes highlighting differences existing in different countries (e.g. according to the EURES classification, a &amp;quot;waiter&amp;quot; in some EU states is also required to act as a barman while in others is not). Job classifications therefore provide at least three different kinds of information: * Definition of recognized job types, with a (more or less) precise definition of what the job is; chef is a recognized item, as well as pizza chef, while chef specializing in preparing hors d'oeuvres is not; classifications are obviously arbitrary as long the boundary between whether a specific job is a recognized one or simply an &amp;quot;unrecognized&amp;quot; classification simply depends on the level of granularity the classifier decides to use.</Paragraph>
      <Paragraph position="2"> * Classification of job types along Is.,, hierarchies (e.g. a wine waiter IS), type of waiter).</Paragraph>
      <Paragraph position="3"> * Linguistic information about commonly used terms and synonyms used in a given language (or more than one) to refer to the specific term.</Paragraph>
      <Paragraph position="4"> Accordingly, job classification terms are classified, coded (i.e. a distinct code identifying the term is associated with each term) and a list of standard &amp;quot;names&amp;quot; as well as recognized synonyms is associated with them. The classification and coding schema of VDAB, one of the end-user partners in the project, is used, but extensions deriving from other schema could obviously be envisaged. Translation tables are provided for each term, containing the names used in the different languages. Alignments across different languages are kept whenever possible. Problems due to missing equivalent terms in different languages, or to slightly different meanings, are handled, at least in the first stage, simply by providing terms nearer in meaning. An example of some job titles is shown in Figure 2: the hierarchical nature of the titles, and also the existence of some synonyms, is suggested by the numbering scheme, and is more or less self-explanatory.</Paragraph>
      <Paragraph position="5">  91205 chef de cuisine # ehef-kok # chief cook 91236 cuisinier de regime # dieetkok # diet cook 91237 cuisinier de cantine # kok grootkeuken # canteen cook 91241 commis de cuisine # keukenhulp # kitchen assistant 91241 commis de cuisine # keukenpersoneel # kitchen staff 91241 commis de cuisine # keukenhulp # catering assistant 91241 aide-cuisinier # hulpkok # assistant cook 91260 second cuisinier # hulpkok # second chef Codes are used as slot fillers in the schema  database. This makes the schema neutral with respect to analysis, query and generation languages.</Paragraph>
      <Paragraph position="6"> For example, when searching for a job, the classification hierarchies inherent in the terminology database allow the user to express general search constraints (e.g. looking for a job as a chef), even though individual jobs are coded for specific types of chef (pastrycook, pizza chef etc., and of course in different languages (e.g. Bakkersgast).</Paragraph>
      <Paragraph position="7"> Although the job titles themselves provide an obvious area of terminology, we handle various other areas of vocabulary in a similar way. There are two criteria for &amp;quot;terminological status&amp;quot; in our system, either of which is sufficient: (i) hierarchical structure, and (ii) standardization. An example of &amp;quot;standardized vocabulary&amp;quot; in our domain is words like fulltime, part-time, which have an agreed meaning, or adjectives like essential as applied to requirements such as experience, or a driving licence. Of more interest perhaps is vocabulary which can be structured, since this provides us with an opportunity to allow more sophisticated searching of the database.</Paragraph>
      <Paragraph position="8"> One example is types of establishment, e.g. hotel, restaurant, cafe, pub etc. Although such terms do not necessarily figure in recognized terminological thesauri, it is obvious that some structure can be imposed on these terms, for example to enable a user who is looking for a job in an eating establishment to be presented with jobs in a variety of such places. Some hierarchies are trivially simple, for example full-time/part-time. A more interesting example is geographical location. Most job ads express the location of the work either explicitly or implicitly in the contact address. But often, these locations are the names of towns or districts, whereas a user might want to search for jobs in a wider area: a user looking for work in Flanders, for example, should be presented with jobs whose location is identified as Antwerp. This is not as simple as it seems however, since the kind of &amp;quot;knowledge&amp;quot; implicated in this kind of search facility is (literally!) &amp;quot;realworld knowledge&amp;quot; rather than linguistic knowledge: short of coding an entire gazeteer on the off-chance that some place-name appeared in a job ad, we must rather rely on the user trials envisaged later in our project to identify the extent to which geographical information needs to be included in the system.</Paragraph>
    </Section>
    <Section position="2" start_page="271" end_page="271" type="sub_section">
      <SectionTitle>
3.3 Lexicon
</SectionTitle>
      <Paragraph position="0"> Not all the vocabulary that the system needs to recognize and handle can be structured in the way just described, so we recognize a second type of lexical resource which, for want of a better term, we call simply &amp;quot;the lexicon&amp;quot;. These are words which we often find in job ads, associated with specific slots, which we would like to translate if possible, but which do not have the status of terms, since they are neither structured nor standardized. Examples are adjectives used to describe suitable applicants (e.g. young, energetic, experienced), phrases describing the location (e.g. busy, near the seaside) or the employer (e.g. world-famous) and so on.</Paragraph>
      <Paragraph position="1"> Job ads that appear in newspapers and journals can be roughly classified according to their length (short, medium, long) with slightly different lexieal and syntactic features accordingly (Alexa &amp; BPSrcena, 1992), the details of which need not concern us here. Some of the phrases found in typical job ads serve to signal specific slots (e.g. EM-PLOYER:NAME is seeking JOB-TITLE), but these linguistic items do not appear in the lexicon as such.</Paragraph>
      <Paragraph position="2"> Such elements are regarded as being properly part of the analysis and generation modules, and we describe below how they are handled there.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="271" end_page="272" type="metho">
    <SectionTitle>
4 Analysis
</SectionTitle>
    <Paragraph position="0"> The system design permits users offering jobs to submit via an e-mail feed job ads more or less without restrictions. The system converts these texts as far as possible into schematic representations which are then stored in the jobs database. The analysis technique that we have chosen to implement falls into the relatively new paradigm of analogy- or example-based processing. In the following paragraphs we explain the analysis process and discuss our reasons for preferring this over a more traditional string matching or parsing approach.</Paragraph>
    <Paragraph position="1"> The input that the TREE system will accept is partially structured, but with much scope for free-text input. One possible way of analysing this would be to employ a straightforward pattern-matching approach, searching for &amp;quot;trigger phrases&amp;quot; such as EM-PLOYER:NAME iS seeking JOB-TITLE, with special processors for analysing the slot-filler portions of the text. This simple approach has certain advantages over a more complex approach based on traditional phrase-structure parsing, especially since we are not particularly interested in phrase-structure as such.</Paragraph>
    <Paragraph position="2"> Furthermore, there is a clear requirement that our analysis technique be quite robust: since the input is not controlled in any way, our analysis procedure must be able to extract as much information as possible from the text, but seamlessly ignore - or at least allocate to the appropriate &amp;quot;unanalysable input&amp;quot; slot - the text which it cannot interpret.</Paragraph>
    <Paragraph position="3"> However, both these procedures can be identified as essentially &amp;quot;rule-based&amp;quot;, in the sense that linguistic data used to match, whether fixed patterns or syntactic rules, must be explicitly listed in a kind of grammar, which implies a number of disadvantages, which we will mention shortly. An alternative is suggested by the paradigm of &amp;quot;example-based&amp;quot; processing (Jones, 1996), now becoming quite prevalent in MT (Sumita et al., 1990; Somers, 1993), though in fact the techniques are very much like those of the longer established paradigm of case-based reasoning.</Paragraph>
    <Section position="1" start_page="272" end_page="272" type="sub_section">
      <SectionTitle>
4.1 A flexible approach
</SectionTitle>
      <Paragraph position="0"> In the example-based approach, the &amp;quot;patterns&amp;quot; are listed in the form of model examples. Semi-fixed phrases are not identified as such, nor are there any explicit linguistic rules. Instead, a matcher matches new input against a database of already (correctly) analysed models, and interprets the new input on the basis of a best match (possibly out of several candidates); robustness is inherent in the system, since &amp;quot;failure&amp;quot; to analyse is relative.</Paragraph>
      <Paragraph position="1"> The main advantage of the example-based approach is that we do not need to decide beforehand what the linguistic patterns look like. To see how this works to our advantage, consider the following.</Paragraph>
      <Paragraph position="2"> Let us assume that our database of already analysed examples contains an ad which includes the following: Knowledge of Dutch an advantage, and which is linked to a schema with slots filled roughly as follows: null SKILLS : LANGUAGE : LANG : nl SKILLS : LANGUAGE: KEQ : &amp;quot;an advantage&amp;quot; Now suppose we want to process ads containing the following texts: Knowledge of the English language needed. (1) Some knowledge of Spanish would be helpful. (2) Very good knowledge of English. (3) In the rule-based approach, we would probably have to have a &amp;quot;rule&amp;quot; which specifies the range of (redundant) modifiers (asuming our schema does not store explicitly the level of language skill specified), that fillers for the REQ slots can be a past-participle, a predicative adjective or a noun, and are optional, and so on. Such rules carry with them a lot of baggage, such as optional elements, alternatives, restrictions and so on. The biggest baggage is that someone has to write them.</Paragraph>
      <Paragraph position="3"> In the example-based approach, we do not need to be explicit about the structure of the stored example or the inputs. We need to recognize Dutch, English and Spanish as being names of languages, but these words have &amp;quot;terminological status&amp;quot; in our system. If the system does not know would be helpful, it will guess that it is a clarification of the language requirement, even if it may not be able to translate it. Furthermore, we can extend the &amp;quot;knowledge&amp;quot; of the system simply by adding more examples: if they contain &amp;quot;new&amp;quot; structures, the knowledge base is extended; if they mirror existing examples, the system still benefits since the evidence for one interpretation or another is thereby strengthened.</Paragraph>
    </Section>
    <Section position="2" start_page="272" end_page="272" type="sub_section">
      <SectionTitle>
4.2 The matching algorithm
</SectionTitle>
      <Paragraph position="0"> The matcher, which has been developed from one first used in the MEG project (Somerset al., 1994), processes the new text in a linear fashion, having first divided it into manageable portions, on the basis of punctuation, lay-out, formatting and so on.</Paragraph>
      <Paragraph position="1"> The input is tagged, using a standard tagger, e.g.</Paragraph>
      <Paragraph position="2"> (Brill, 1992). There is no need to train the tagger on our text type, because the actual tags do not matter, as long as tagging is consistent.</Paragraph>
      <Paragraph position="3"> The matching process then involves &amp;quot;sliding&amp;quot; one phrase past the other, identifying &amp;quot;strong&amp;quot; matches (word and tag) or &amp;quot;weak&amp;quot; (tag only) matches, and allowing for gaps in the match, in a method not unlike dynamic programming. The matches are then scored accordingly. The result is a set of possible matches linked to correctly filled schemas, so that even previously unseen words can normally be correctly assigned to the appropriate slot.</Paragraph>
      <Paragraph position="4"> The approach is not without its problems. For example, some slots and their fillers can be quite ambiguous: cf. moderate German required vs. tall German required (!), while other text portions serve a dual purpose, for example when the name of the employer also indicates the location. However, the possibility of on-line or e-mail feedback to the user submitting the job ad, plus the fact that the matcher is extremely flexible, means that the analysis module can degrade gracefully in the face of such problems.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="272" end_page="273" type="metho">
    <SectionTitle>
5 Query engine
</SectionTitle>
    <Paragraph position="0"> The query engine takes users' specifications of their employment interests to identify those job ads held in the database that match their specification. Input is provided from an HTML form consisting of a number of fields which correspond to job-schema object attributes (e.g. job-title, location etc.). Data entered for any given object attribute is then encoded in the same format used to encode job ad information. Since both (searchable) job ad information and query data are represented in a language-independent format, matches will be made regardless of the language in which the data was entered.</Paragraph>
    <Paragraph position="1"> Symbolic case-based reasoning techniques are used to quantify the extent to which users' queries match database objects, allowing the &amp;quot;ranking&amp;quot; of query results.</Paragraph>
    <Section position="1" start_page="272" end_page="272" type="sub_section">
      <SectionTitle>
5.1 Encoding data
</SectionTitle>
      <Paragraph position="0"> Input entered by the user must be encoded using the same method adopted by the analysis module.</Paragraph>
      <Paragraph position="1"> There are two means by which this can be achieved.</Paragraph>
      <Paragraph position="2"> One method is to restrict the options available to the user for any given field to a number of possible values for a given object attribute (i.e. provide the user with a Boolean choice). The alternative is to allow users to enter a string which is passed to the terminology module to retrieve the appropriate code. If the string does not return a code, it is considered invalid and the user is requested to enter an alternative.</Paragraph>
    </Section>
    <Section position="2" start_page="272" end_page="273" type="sub_section">
      <SectionTitle>
5.2 Applying case-based reasoning
</SectionTitle>
      <Paragraph position="0"> User-entered information is used to construct a job-schema object which can be considered as the user's &amp;quot;ideal&amp;quot; job. Symbolic case-based reasoning techniques are then applied to quantify the difference between the user's ideal job and jobs held within the database in order to identify those jobs most closely resembling the user's ideal job.</Paragraph>
      <Paragraph position="1"> The purpose of using case-based reasoning techniques is to quantify the difference (as a metric value) between any two instances of a job-schema object. That object must be capable of being defined by one or more parameters, with the further requirement that comparison operations upon any two parameter values must yield a numeric value reflecting the semantic difference between the values.</Paragraph>
      <Paragraph position="2"> Thus, objects can be seen as being located within an n-dimensional parameter space where n is the number of defining parameters of the object.</Paragraph>
      <Paragraph position="3"> The parameters which are used to define job ads for TREE are given by the job schema definition, described above. The distance between two values for a specific parameter will be dependent upon the method of encoding but any distance function ~ for a given parameter must define the geometric distance between its two arguments (Salzberg &amp; Cost, 1993).</Paragraph>
      <Paragraph position="4"> That is: a value must have a distance of zero to itself (4), a positive distance to all other values (5), distances must be symmetric (6) and must obey the triangle inequality (7). A further proviso is added that the maximum difference between any two parameter values must be 1, which ensures that all parameters have an equivalent maximal difference (8).</Paragraph>
      <Paragraph position="6"> For example, a distance function for the job-title parameter (as represented by job-title codes illustrated in Figure 2) could be given by (9),</Paragraph>
      <Paragraph position="8"> where a and b are job codes, f(x) returns the number of digits of its argument, and n is the number of digits in the job codes (i.e. n = 5). 6(a, b) evaluates to 1 if the job code arguments differ on the first digit, 0.8 if they differ on the second digit and so on. The job codes are hierarchically ordered so job-title codes that differ over the first digit will refer to greatly different jobs. As such we can see that this parameter distance function would reflect common-sense judgements on the associated job-titles.</Paragraph>
      <Paragraph position="9"> The total distance between any two job instances is simply a measure of the distances between indi- null vidual parameter distances and is given by (10),</Paragraph>
      <Paragraph position="11"> where A is the instance distance function, ~i is the distance function for parameter i, N is the total number of parameters by which A and B are defined, and ai and bl are the values of parameter i for instances A and B respectively.</Paragraph>
      <Paragraph position="12"> Equation (10) provides a measure of the total distance between two instances by summing the distances between all the constituent parameters. Using (10) and a set of parameter distance functions that conform to the properties given as (4)-(8), it is possible to quantify the difference between any job-schema instance held in the database and the &amp;quot;ideal&amp;quot; job-schema object specified by the user. Those parameters for which no value has been specified will exactly match every possible parameter value, and as such the database search is only constrained by those values which users enter.</Paragraph>
      <Paragraph position="13"> Since information on job ads is represented in a language-independent format, a search profile in one language will retrieve job ad information entered in any of languages supported. Database queries are conducted by matching the &amp;quot;ideal&amp;quot; job as specified by the user against job-schemas held in the database.</Paragraph>
      <Paragraph position="14"> The matching process yields a numeric result representing the &amp;quot;distance&amp;quot; between two objects. Identified jobs can then be ranked according to how closely they resemble the user's ideal job. The results of a database query are then fed to the generation module for subsequent presentation in the language specified by the user.</Paragraph>
      <Paragraph position="15"> Future plans include increasing the number of fields over which the search can be conducted and permitting users to specify the relative importance of each parameter to the search. The query interface will also keep a record of user &amp;quot;profiles&amp;quot;, so that regular users can repeat a previous search the next time they use the system.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="273" end_page="274" type="metho">
    <SectionTitle>
6 Generation
</SectionTitle>
    <Paragraph position="0"> The purpose of the TREE generator module is to generate HTML documents in different languages from job database entries (i.e. filled or partially filled schemas), on demand. For several reasons, the approach to generation adopted in the TREE system can be termed &amp;quot;integrated&amp;quot;. First, it integrates canned text, templates, and grammar rules into a single grammar formalism. Second, it integrates conditions on the database with other categories in the bodies of grammar rules. Third, it integrates the generation of sentences and the generation of texts and hypertexts in a simple, seamless way. Finally, generation involves just one single, efficient process which is integrated in the sense that no intermediate structures are created during processing. null</Paragraph>
    <Section position="1" start_page="273" end_page="274" type="sub_section">
      <SectionTitle>
6.1 Formalism
</SectionTitle>
      <Paragraph position="0"> In our integrated approach to generation, a grammar rule has the format (11), Co~So ) SS1 .... , SSn ~: Conditions (11) where each SSI has the format Ci, the format CJSi, or the format \[W1,...,W,~\]. Here, C/ denotes a syntactic category, Si denotes a semantic value, and W/ a word. The slash symbol &amp;quot;/&amp;quot; is used to separate the syntax from the semantics. The symbol &amp;quot;~&amp;quot; separates the grammar body from a set of conditions on the database. If the set of conditions is empty, the symbol &amp;quot;~&amp;quot;, and what follows it, may simply be omitted.</Paragraph>
      <Paragraph position="1"> 6.2 Canned text, templates, or grammar? Suppose a system &amp;quot;knows&amp;quot; something, on which we want it to report; suppose it knows that both the Cafe Citrus and the Red Herring Restaurant want to hire chefs, facts which could be captured by the following (logical interface to the) job database: it em (e I ,xl ,yl).</Paragraph>
      <Paragraph position="2"> job(yl, 91202).</Paragraph>
      <Paragraph position="3"> company(xl, 'Care Citrus ' ).</Paragraph>
      <Paragraph position="4"> item(e2,x2,y2).</Paragraph>
      <Paragraph position="5"> job (y2,91202).</Paragraph>
      <Paragraph position="6"> company (x2, 'Red Herring Rest aurant ' ).</Paragraph>
      <Paragraph position="7"> We can imagine setting up our system in such a way that when the system sees facts of this kind, a rule such as the followings/E --&gt; \['Care Citrus' ,advertises,as,vacant,a, posit ion, as, chef\] # {item (E, X, Y), job (Y, 91202), company (X, ' Care Citrus ' ) }, - will be triggered, and the system will produce the sentence Care Citrus adverlises as vacant a position as chef. This is a canned-text approach. It is trivial to implement, but the disadvantage is, of course, that we would have to store one rule for each utterance that we would like our system to produce. As soon as a sentence must be produced several times with only slight alterations, a template-based approach is more appropriate. Let us modify the above rule as follows: s/E --&gt; pnlX'name (X, C), \[advert ises, as, vacant, a,position, as, chef\]</Paragraph>
      <Paragraph position="9"> The following rule is needed to tell the system that it is allowed to realize the value of the feature &lt;company&gt; as the value itself (i.e. the value is the name of the company).</Paragraph>
      <Paragraph position="10">  pn/X'name (X,Name) --&gt; \[Name\].</Paragraph>
      <Paragraph position="11"> Thus, here too, given the above job database entry, the sentence Cafe Citrus advertises as vacant a position as chef can be generated. Furthermore, Red Herring Restaurant advertises as vacant a position as chef can be generated as well.</Paragraph>
      <Paragraph position="12"> It is not hard to see that the two rules above form the beginning of a grammar. Such a grammar may be further elaborated as follows:</Paragraph>
      <Paragraph position="14"> v/Y'X'E&amp;quot; item (E, X, Y) --&gt; \[advertises ,as ,vacant, a,position,as\]. Now, the above sentences, plus many other sentences, may be generated, given appropriate database entries.</Paragraph>
      <Paragraph position="15"> Our approach is based on the idea that canned-text approaches, template-based approaches and grammar-based approaches to natural language generation - while they are often contrasted - may in fact be regarded as different points on a scale, from the very specific to the very general. In a sense, templates are just generalized canned texts, and grammars are just generalized templates. Indeed, the possibility of combining these different modes of generation has recently been highlighted as one of the keys to efficient use of natural language generation techniques in practical applications (van Noord &amp; Neumann, 1996; Busemann, 1996).</Paragraph>
    </Section>
    <Section position="2" start_page="274" end_page="274" type="sub_section">
      <SectionTitle>
6.3 Processing
</SectionTitle>
      <Paragraph position="0"> Let us now indicate how the rules are meant to be used by the generator module. Traditionally, the process of generation is divided into two steps: generation of message structure from database records (what to say), and generation of sentences from message structures (how to say it). One way of characterizing the integrated approacfi to generation is to say that we go from database records to sentences in just one step. The process of computing what to say, and the process of computing how to say it, are, in the general case, interleaved processes. The process of generating from a set of grammar rules, given a particular job database entry, will simply involve picking the rules the conditions of which (best) match the entry, and using them to generate a document. null</Paragraph>
    </Section>
    <Section position="3" start_page="274" end_page="274" type="sub_section">
      <SectionTitle>
6.4 Generating hypertext
</SectionTitle>
      <Paragraph position="0"> The TREE system provides its output in the form of hyper~ext. This approach has several advantages: first, as argued by (Reiter &amp; Meltish, 1993), the generation of hypertext can obviate the need to perform high-level text structuring, such as assembling paragraphs into documents. &amp;quot;The basic idea is to use hypertext mechanisms to enable users to dynamically select the paragraphs they wish to read, and therefore in essence perform their own high-level textplanning&amp;quot; (P~eiter &amp; Mellish, 1993), p.3. Second, but related to the first point, the hypertext capabilities are also a mild form of tailoring to the needs of different users. Users are expected to explore only links containing information that they need.</Paragraph>
      <Paragraph position="1"> Hypertext is generated by means of rules that are very similar to the grammar rules described above, but are formulated on a meta-level with respect to sentence/text rules. HTML code &amp;quot;wrappers&amp;quot; can be simply generated around the text. It is fairly straightforward to extend the grammar to other HTML constructions, such as headers, styles, lists, and tables. Using such rules in combination with other rules enables us to produce simple HTML documents, or, if required, quite complex and deeply nested documents incorporating links to other ads, or buttons to expand information, or clarify terminology (e.g. to get a definition of an unfamiliar jobtitle). null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML