File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1424_metho.xml
Size: 15,853 bytes
Last Modified: 2025-10-06 14:15:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1424"> <Title>The Multex generator and its environment: application and development 1</Title> <Section position="2" start_page="230" end_page="231" type="metho"> <SectionTitle> 3. The Meaning Base </SectionTitle> <Paragraph position="0"> Recent advances in Systemic Functional theory attempts to define the entire systemic linguistic model in a relatively small set of theoretical concepts.. This set of theoretical concepts, known as the systemic metalanguage, outlines the structure of a linguistic system and provides principles and methodologies for modularizing the linguistic resources, for analysing and interpreting language instances with grammar, and for modelling linguistic processes such as understanding and generation (cf. Matthiessen & Nesbitt 1996, Halliday & Matthiessen in press: Section 1.9, for the conception of the metalanguage). Figure 2 shows a simple taxonomy of the systemic metalanguage. The notion of metalanguage can be usefully applied to NLP systems for two reasons: (1) it provides a comprehensive and theory-motivated map of the resources available in a linguistic system. Resource developers can use this map to structure and develop fragments of linguistic resources, and to reason about the properties of the linguistic resources; (2) linguistic processes can be defined with respect to the necessary resources it draws on.</Paragraph> <Paragraph position="1"> The meaning base is the implementation of the systemic metalanguage plus the linguistic * resources maintained by Multex. It is the linguistic engine of Multex. The * metalanguage concepts are made operational by being implemented as Java classes. Access to linguistic resources and all reasoning about the linguistic resources are defined as methods in the classes representing the metalanguage concepts. A Java-based Meaning Base Application Programming Interface (MB API), which consists of around 60 metalanguage concepts and over 400 methods, is available for programmers to create NLP processes. In a sense, the systemic metalanguage is the protocol a NLP process talks with the meaning base. Linguistic resources are specified in a formalism called the Meaning Base Modelling Language (MBML). When the meaning base is being loaded, the linguistic resources are compiled, optimised and stored as objects in the meaning base. Space does not permit us to provide the details of MBML, but we will present some examples.</Paragraph> <Paragraph position="2"> slot (disease :type disease) slot(cause :type animal) slot (range : type place) slot(cases :type human :unify-with victim) slot (fatality :type human :unify-with victim slot (medical-investigate : type investigate) slot (trend : type disease-outbreak-profile) ConstrueStrategy Brief-report() ( ideationObj { ID(?general-report) type(addition) slots(:nuclear ?report-incidence :satellite *.trend) } ideationObj { ID(?report-incidence) type (addition) slots(:nuclear ?report-outbreak :satellite *.medical-investigate) } This example defines a domain concept called communicable-disease-outbreak. In addition to specifying ISA relations and slots, one can define any number of construe strategies for a domain concept. A construe strategy is in fact a set of parameterized semantic objects that construe a given domain situation as meaning in a specific communicative context. Let's consider another example:</Paragraph> <Paragraph position="4"> * Figure 4 defines an interstratal mapping pattern. It encodes the following linguistic knowledge: for the register communicable disease report, the conjunctive relation cause-effect at the semantic level can be realized lexicogrammatically as !'X causes Y&quot; in English, .the temporal and spatial circumstance of the cause event should be realized as the Time-loc and Space,loc functions of the &quot;X causes F' clause.</Paragraph> <Paragraph position="5"> Figure 5 defines a lexicogrammatical system outbreak. The Fro~ove clause indicates *that this * system prototypically realizes the semantic concept break-out. The :map-~rom actor term specifies that the Qualifier function is mapped from the Actor slot of * the break-out event (i.e. what breaks out).</Paragraph> <Paragraph position="6"> The meaning base has many other important features, eg. management of multilingual and multimodal resources. In addition, each metalanguage concept is associated with a visualizer (although this is only partially implemented), which enables a meaning base visualization tool to be easily constructed.</Paragraph> </Section> <Section position="3" start_page="231" end_page="235" type="metho"> <SectionTitle> 4. The Text Planner </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="231" end_page="232" type="sub_section"> <SectionTitle> 4.1 The text planning architecture </SectionTitle> <Paragraph position="0"> The Multex text planner is structured into two layers~ the plan control layer and the meaning agent layerl The plan control layer implements a general-purpose constraint-based planner, and the meaning agent layer maintains a number of processes specializing in-creating certain kinds of meaning. This architecture is inspired by the work on meta-planning (Hayes-Roth & Hayes-Roth 1978, Stefik 1981).</Paragraph> <Paragraph position="1"> The plan control layer consists of the processes for: (1) creating goals to be solved by the meaning agents; (2) spawning, scheduling and starting meaning agents; (3) introspecting on the local plans generated by the.meaning agents. Plan introspection includes deciding when the planning should stop, assimilating local plans into a global plan, posting constraints entailed by sub-plans to the global plan.</Paragraph> <Paragraph position="2"> Herep/an andsub-plan refer to a set of partially specified linguistic objects generated by meaning agents.</Paragraph> <Paragraph position="3"> The meaning agent layer consists of a number of meaning agents. A meaning agent is a self-contained * process that creates a specific kind of meaning in the form of semantic and lexicogrammatical objects by instantiating resources in the meaning base. Table 4.1 summarises the meaning agents available in Multex as well the the meaning base resources they rely on.</Paragraph> <Paragraph position="5"> In fact, the meaning agents Construe and Realize alone suffice for the whole text generation process, because from a theoretical point of view, text generation consists of exactly two steps: mapping contextual situation to meaning and mapping meaning to wording. The first step is carried out by the Construe agent and the second step, by the Realize agent. The rest of the meaning agents provide additional functionality that is designed for specific applications.</Paragraph> <Paragraph position="6"> Moreover, a meaning agent has to implement a protocol, called meaning agent protocol, in order to be administered by the plan control layer. The protocol includes methods for determining whether goal has achieved, for inferring more goals to achieve, for searching the meaning base for appropriate resources and for turning them into a plan, etc.</Paragraph> </Section> <Section position="2" start_page="232" end_page="235" type="sub_section"> <SectionTitle> 4.2 An example of text planning </SectionTitle> <Paragraph position="0"> Here we will give a brief example of text planning in Multex. The input to the text generator is provided as a meaning request, which is passed from the information consumer either as a stream, or as an object. Figure 6 shows a meaning request in the form of a stream.</Paragraph> <Paragraph position="1"> The space limit does not permit us to give a *detailed trace of the text planning process. We can only list some salient points in the generation process in Table 2,</Paragraph> <Paragraph position="3"> spawn a Realize meaning agent for each semantic object in the semantic network.</Paragraph> <Paragraph position="4"> The text planner performs a topological sorting on the semantic network so that the less dependent nodes get realized first, eg. the decision for realizing ?Reportoutbreak is made earlier than the decision for realizing ?Disease-outbreak and ?cases=and-fatality &quot; R ..... The realization pattern in Figure 4 is instantiated. An &quot;Xcauses Y&quot; clause is added to .e::t~e~^:~ePdeg , the partially generated text.</Paragraph> <Paragraph position="5"> - briar,) Multex finally generates the *following passage from the meaning request in Figure 6: &quot;An outbreak of ebola disease, which was caused by rat, in Kikwit, Zaire has led to 189 cases and 59 deaths in April 1995. The world health organization investigated the disease on 10 May 1995. Incidence of ebola disease increased in 1995.&quot; Multex's generation is robust. For example, all the slots in Figure 6, except the disease slot, can be totally or partially omitted, and Multex can still produce coherent text. If all optional slots are missing, Multex generates the text there is an outbreak of disease.</Paragraph> <Paragraph position="6"> 5. Multex working with production applications: HINTS Multex has been designed to be able to work together with other NLP systems in an integrated system capable of various &quot;information processing&quot; tasks in addition to generation. One such integrated system is the HINTS system currently being developed by DSTO, Canberra, with contributions by the Systemic Meaning Modelling * Group a t Macquarie University and by the team working on the Fact Extractor at DSTO, Adelaide. (HINTS may be compared with MAGIC, a system capable of generating multimodal healthcare briefings \[McKeown, Jordan & Allen, 1997\]; but whereas MAGIC is intended to produce multimodal briefings about particular patients for &quot;time-pressured caregivers&quot;, HINTS is a resource for health officers who monitor communicable diseases around the world based on collected documents. HINTS is a system developed to process information concerning communicable diseases, it has been designed for health officers of various kinds to help them cope with the fast flow of information and the daunting demand for regular reports and briefings of various kinds. HINTS integrates a number of Systems that it can make demands on for different kinds of information processing services. From the point of view of Multex, HINTS constitutes an information production, for which Multex provides a service in the form of multimodal communicable disease reports. In addition, Multex provides a resource that is used by other components of the HINTS system -- the meaning base.</Paragraph> <Paragraph position="7"> Let us describe HINTS first in terms of the general work flow and then discuss its significance for the Multex generator. Users interact with HINTS through friendly GUIs; they have all been designed jointly with representative users. A user will start a HINTS session by retrieving documents according to a certain retrieval template -- at present, this is just a collection of key words. For example, the user might want to retrieve all documents that are concerned with (outbreaks of) Ebola in a certain region over a certain period of time. These documents are retrieved either from an existing collection of documents or from on-line sources via the Internet.</Paragraph> <Paragraph position="8"> Once the relevant documents have been retrieved, the user can ask HINTS for a summary. The summarizer that HINTS calls upon at present operates at fairly low levels of abstraction; it relies on aspects of the layout of a document (eg. the subject header of e-mail messages), on paragraph initial placement, on conjunctive markers such as in summary, and the like. it does not engage in any lexicogrammatical or semantic processing of the texts.</Paragraph> <Paragraph position="9"> The user can also ask HINTS to extract &quot;facts&quot; from the collection of documents. HINTS uses the Fact Extractor (FE) developed by Peter Wallis and his team (e.g., Wallis & Chase, 1997). FE operates with a set of templates for extracting information about communicable diseases. These templates include dates, locations, cases and disease outbreaks. They consist of slots or roles tha~ have to be filled by FE with values extracted from the collection of documents. They are all derived from Multex's meaning base and are represented within FE by means of regular expressions. Once FE has extracted the relevant values, it fills in &quot;'forms&quot; based on the templates and if the user wants to generate reports based on the information extracted, a meanin.g request is generated from the templates and passed over to Multex.</Paragraph> <Paragraph position="10"> Multex then construes the information in terms of its domain model of communicable diseases. Since the templates used by FE are derived from the Multex meaning base, all the information they provide Can be classified according to existing domain types. However, Multex will have to draw on domain knowledge to expand the information to the point where it can support generation. Once Multex has processed the information it receives from FE, it starts the incremental generation process sketched above. This will include not only decisions controlling Multex's generation process but also opportunities for the user to include quoted material from any of the documents that have been retrieved and to add his/her own text. The latter option means that users can add information that embodies a fair amount of interpretation. In the register of communicable disease reports, this information has an interpersonal orientation: either it represents the user's expert evaluation of the information produced automatically by Multex ('how common?, how likely?') or it represents the user's recommendation (actions that should be taken by health authorities based on the information produced by Multex). This is a case where the prototypical Construe meaning agent is not *adequate, a VisualConstrue meaning agent is hence supplied to meet the additional demand of HINTS. When the user is satisfied that the generation process has finished, Multex produces a document in HTML format and it is handed over to a browser for display.</Paragraph> <Paragraph position="11"> AS this brief description indicates, HINTS is an interesting, information-rich environment for exploring multimodal generation. In particular, it is worth noting that Multex receives information that has been extracted from written* documents, but it produces presentations that may include charts and labelled maps. For example, Multex is able to retrieve a relevant map from the meaning base based on the spatial information in the meaning request and then label some hot spots on the map with the text fragments it produces. It is .also worth noting that in the HINTS environment, the *process of generation is very much a collaborative effort. The user exercises control over information sources and s/he can make decisions during the incremental generation process. This means that in this environment Multex functions as a writer's tool; and it can be used in preparing regular briefings or web pages. Further, although Multex's multilingual capability is not presently deployed in HINTS, Multex is able to generate the multimodal reports in languages other than English. For example, users could extract information from English documents and use Multex to generate* a multimodal report in Chinese. This capability can be of considerable value, as is demonstrated by the TREE project (Somers et al, 1997): the TREE system can search the Intemet for job ads and then summarize these in various languages.</Paragraph> </Section> </Section> class="xml-element"></Paper>