File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1061_metho.xml
Size: 22,020 bytes
Last Modified: 2025-10-06 14:11:48
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1061"> <Title>TAILORING IMPORTANCE EVALUATION TO READER'S GOALS: A CONTRIBUTION TO DESCRIPTIVE TEXT SUMMARIZATION</Title> <Section position="4" start_page="0" end_page="256" type="metho"> <SectionTitle> BASIC ARCHITECTURE OF THE EVALUATOR </SectionTitle> <Paragraph position="0"> The importance evaluator is one of the fundamental subsystems of SUSY, and it is specifically devoted to the task of ranking different parts of a text according to their importance (Fum, Guida, and Tasso 1985b).</Paragraph> <Paragraph position="1"> Several reasons have supported the choice of implementing the importance evaluator by means of a rule-based approach (Waterman and Hayes-Roth, 1978). First of all, the multiplicity and heterogeneity of the knowledge involved in the process of importance evaluation has to be mentioned: linguistic knowledge (both structural and semantic), world knowledge (including both common sense and domain specific knowledge), knowledge about reader's goals, meta-knowledge about how to use lingnistic knowledge, world knowledge, and goals in the process of importance evaluation. Second, the concept of importance seems to escape a simple, explicit, algorithmic definition. A conceptual unit of a text can be considered important, for example, because it helps understanding discourse coherence, or because it relates to the discourse topic or topic-focus articulation of discourse, or because it refers to semantically important concepts in the subject domain, or, finally, because it refers to a given reader's goal.</Paragraph> <Paragraph position="2"> A rule-based approach comprising a set of rules that can assign relative importance values to the different conceptual units of a text seems therefore more viable than a traditional deterministic solution, as it can supply all the conceptual and computational tools needed for taking into account in a flexible mad natural way the variety of knowledge sources and processing activities that are involved in importance evaluation.</Paragraph> <Paragraph position="3"> The overall architecture of the evaluator is shown in Figure 1.</Paragraph> <Paragraph position="4"> Basically, it is constituted by the standalzl modules of a rule-based system (Waterman and Hayes-Roth, 1978) with the addition of a specialized module, namely the goal interpreter, devoted to take into consideration the reader's goal.</Paragraph> <Paragraph position="5"> The evaluator receives in input the internal representation of a natural language text (supplied by another SUSY subsystem, namely the parser) expressed in the ELR (Extended Linear Representation) formalism (Fum, Guida, and Tasso, 1984), and an explicit representation of a goal to be taken into account for importance evalnation. It produces in output a new (*) also with: Dipartimento dell'Edueazione, Universita' di Trieste, Italy. (+) also with: Progctto di Intelligcnz.a Anificialc, Dipartimcnto di Elettronica, Politecnicc di Milano, Italy. (') also with: CISM - International Center for Mechanical Sciences, Udinc, Italy. repl~sentation called HPN (IIierarchical Propositional Network), where integer importance values are assigned to tile basic conceptual units of the ELR (concepts and propositions), in snch a way as to account for tile different importance of the constituents of the text.</Paragraph> <Paragraph position="6"> Two main knowledge bases are available to tl~e evaluator: - the importance rule base, flint contains knowledge on the mechanisms that are supposed to be used by human readers in evaluating importance, expressed through W-THEN rules; - the encyclopedia, that contains specific world knowledge on the subject domain (mostly of structured, taxonomic, descriptive nature), represented through a network of frames.</Paragraph> <Paragraph position="7"> Tile W-part of a ride contains conditions that are evaluated with respect to tile current HPN (initially the ELR) contaiued in the working memory.</Paragraph> <Paragraph position="8"> The THEN-part specifies either an importance evaluation action or an action to be performed to further the analysis (e.g., a strategic choice conceming rule activation, a criterion to solve conflicting ewduations, the activation of a frame of the encyclopedia, etc.). Botll parts of a rule may refer to frames of the encyclopedia.</Paragraph> <Paragraph position="9"> The importance evaluation action contained in the THEN-part of a rule takes usually the form of an assignment of an importance value to a concept or proposition of the ELR. Snch an assignment may be absolute (e.g., w(X)=9) or relative (e.g. w(X)=w(X)-3). All successive assigmnents to a given concept or proposition are not directly executed, but they are stored in a list together with the number of the rules from which they originate, and only at the end of the importance evaluation activity riley are globally considered in order to obtain a unique importance value.</Paragraph> <Paragraph position="10"> Ttle importmlce mle base includes several classes of nlles which account for the different skills used in importance evaluation. Namely (Fnm, Guida, and Tasso, 1985b): - referential-structural (RS) rules can derive importance values fi'om the strncture of references among conceptual units of tile text, - rhetoric-structural ('1S) rules derive importance relation l?om rhetoric predicates of the ELR, - structural-semantic (SS) rules rely on the analysis of specific structural features of tile text that have a definite semantic role, such as ISA relations and macro-predicates of tile ELR, - semantic-encyclopedic (SE) rules refer to world knowledge contained in the encyclopedia, - explicit evahtation (EEl rules take into account explicit statements con- null cerning importance sometimes purposedly inserted in tile text by file author, and, finally, - metarules (MT) embody strategic knowledge that concerns reasoning about importance rules aud their use.</Paragraph> <Paragraph position="11"> The encyclopedia is the second knowledge somve employed by the evaluatot and it contains domain specitic knowledge in form of frames. The frames of file encyclopedia embody, in addition to a header two kinds of slots: - knowledge slots, that contain domain specific knowledge, represented in a form homogeneous with tile ELR language; - reference slots, containing Ix)inters to other frames that deal with related topics in the subject domain.</Paragraph> <Paragraph position="12"> The operation of tile evaluator obeys the basic recognize-act cycle of a rule-based system. Mo~e specifcally, it is controlled by a forward chaining mechanism which continually updates the working memory, thus U'anslbrming it in the final HPN form. The marcher is responsible for recognizing ELR patterns in the working memory which satisfy the W-part of importance rules. The W-part of a rule may contain a specific reference to the goal interpreter when it is needed to evaluate tile relevance of a given concept or proposition (belonging to tile ELR or to the encyclopedia) to the cmxent goal. In such a case the marcher resorts to the goal interpreter, which is able to compute file required relevance degree, expressed through a real value in the range (0,1).</Paragraph> <Paragraph position="13"> When the conflict set has been identified, the conflict resolutor selects the unique rule whosn THEN-part will later be executed. System operation ends when the conflict set is empty, i.e. when all available resources for importance evaluation have been used. Several strategies are utilized for performing conflict resolution, and they basically obey two paradigms, namely refraction and ordering (Brownston, Farrel, Kant, and Martin, I985).</Paragraph> <Paragraph position="14"> Refraction implies that a rule can not be executed more than once on file same data. On the contrary, it has to be noticed that a single rule can be fired several times on different data during the process of importance evaluation. Iu fact, the ELR can possibly contain several instances of the patterns conlorming to the specific criteria of importance evaluation caplured by a ride.</Paragraph> <Paragraph position="15"> Rule ordering implies that each importance rule is attached a weight (au integer value) in such a way as to define an ordering relation among rules. During conflict resolution this ordering is used in two different ways: for selecting the rule with the highest weight, or for discarding rules below a given threshold. Weights are initially assigned statically and ,are later updated dynamically at run-time. The static ordering is provided when a rule is created, encoding in snch a way general selection criteria regarding the priority of using some rules rather than others at the beginning of the importance evaluation process. Dynamic updating of weights allows later on the evaluator to conform to different conflict resolution criteria, adapting its behavior to the actual course of the step-wise transformation of the ELR iuto the HPN. Null weights are utilized to prevent the possible unwanted execution of a rule. Each time the evaluator is utilized on a new text, the weights arc reset to their original static vah, es.</Paragraph> </Section> <Section position="5" start_page="256" end_page="257" type="metho"> <SectionTitle> ROLE AND REPRESENTATION OF GOALS </SectionTitle> <Paragraph position="0"> It is apparent that reader's goals have a major role in evalnating importance of a written text. Goals that: exist a-priori in the reader's mind, i.e.</Paragraph> <Paragraph position="1"> before reading a text, can affect his judgemental activity in two quite distinct ways. First, the existence of goals can trigger an evaluation mechanism that tends to identify as important those parts of the text which arc relevant to the current goal (goal-directed evaluation). As goal-directed evaluation strategies coexist with other strategies which are indepeudent of tile existence of goals, it is necessary that tlley would appropriately fit together in such a way as to achieve a correct balance between goal-dependent and goal-independent judgements. Secoud, goals can have a major role in directing the retrieval and use of encyclopedic knowledge relewmt to the current importance evaluation activity (selective focusing).</Paragraph> <Paragraph position="2"> In fact, a human reader generally utilizes a lot of specific world kuowledge when evaluating tile importance of a text, and tile reminding from loug term memory of the pieces of knowledge to be used in a given context is often triggered by his a-priori goals. In this case, goals do not directly contribute to the importance evaluation process, but can affect it in an indirect way throngh identification of pertinent world knowledge to be used by other goal-independent importance evaluation strategies.</Paragraph> <Paragraph position="3"> Using goals in importance evaluation poses two classes of problems to the system designer: - How to represent goals? - Itow to match goals with pieces of tllc ELR or encyclopedia for implementing the mechanisms of goal-directed evaluation and selective focusing? null The former of the above points will be dealt with in the sequel of this section, while the latter will be the subject of the next section.</Paragraph> <Paragraph position="4"> Several kinds of reader's goals are possible with respect to their generality, level of abstraction, articulation of content, richness of details, etc. It is apparent that goals, according to their different nature, may range from a light emphasis of the reader's intentions to a quite specific que~3'. More precise and articulated the goals are, more focused is the attention on well defined and specific objects, and, accordingly, goal-dependent importance ev.'duation strategies become more appropriate and useful. Moreover, as goals become more and more specific and rich, importance evaluation teuds to mingle with information retrieval and question-answering.</Paragraph> <Paragraph position="5"> As a basic design choice, we restrict our attention (at least for the moment) to classes of goals which are reasonably general and simple (but not necessarily explicit, univocal, clear, or easy to interpret!), in such a way as to keep focusing on importance evaluation without intermixing too much our model with different issues, such as information retrieval and qnestion-answering.</Paragraph> <Paragraph position="6"> This decision has heavy implications on the design of the goal representation language. Although in principle nothing less tban tile full ELR forrealism should be nsed tbr expressing goals, we restrict our attention to a largely simplified subset of it.</Paragraph> <Paragraph position="7"> Let us consider a goal vocabulary (GV) containing a collection of key-words relevant 'a a given subject domain, and assume as an adequate representation for a goal a propositional expression over GV made up using and, or, and not connectives. Note that the goal vocabulary GV may be redundant, i.e. it may contain several words that refer to partially overlapping concepts. Also it is implicitly assumed that the general topic of discourse is fixed and always tacitly understood: words of GV only specify a facet, a viewpoint or a detail of interest, but they can not change or modify (e.g., through limitations or specifications) the topic of discourse. Moreover, it is assumed that the size of GV can be kept reasonably small, although each time a new interesting concept has to be included in the coverage of the goal representation language, GV must be enlarged accordingly.</Paragraph> </Section> <Section position="6" start_page="257" end_page="258" type="metho"> <SectionTitle> INTERPRETING GOALS </SectionTitle> <Paragraph position="0"> After having introduced a representation language to be used for specifying goals, we tackle in this section the problem of how it is possible to obtain goal dependent importance evaluation.</Paragraph> <Paragraph position="1"> The first possibility that comes to mind consists in labeling a-priori each frame of the encyclopedia with words of the goal vocabulary GV and, then, to match words used for expressing goals with such labels, taking appropriately into account the logical connectives and, or, and not. Tbis solution is only possible for the encyclopedia, as it is quite impossible to label a-priori unknown pieces of ELR. It shares some basic features with the approach proposed by DeJong (1979) that assumes an a-priori definition of the concept of importance coded into an appropriate set of scripts. This solution, however, has several shortcomings: (a) it is rigid, as any change in GV necessitates that the labeling of the encyclopedia is changed accordingly; (b) it hiddens the reasons and criteria adopted for labeling the frames of the encyclopedia, thus preventing any further use of this information (for example, in generating justifications of the evaluation produced); (c) it makes the encyclopedia heavily dependent on the specific nse of importance evaluation and on the particular goal vocabulary currently considered. null A better solution, that can cover both cases of comparison with the encyclopedia and ELR and does not require any preliminary labeling procedure, is direct matching of words of GV appearing in the goal specification with frames of the encyclopedia or pieces of the ELR, taking appropriately into account the meaning of logical connectives. This possibility, however, woold also be largely unsatisfactory. It does not allow taking into account, for example, the diversity of terminology, i.e. the fact that the same concept may be referred to by means of different words in the goal specification and in the piece of knowledge to be matched; it does not allow dealing with concepts at different levels of abstraction, and, more importantly, it does not allow expressing different degrees of relevance.</Paragraph> <Paragraph position="2"> The above analysis of inadequacies of some preliminary design proposals allows stating the following requirements for the goal interpreter: - it should allow to keep the goal vocabulary GV and the encyclopedia independent from each other, in such a way as changes in die former do not affect the latter, and they can be designed and updated separately; - it should support an explicit representation of the conceptual connection between the goal vocabulary GV and the encyclopedia and ELR, in such a way as the role of goals in importance evaluation can be easily controlled by the system designer and, if necessary, explained and justified to the user; - it should allow dealing with diversity of terminology, expression, context, and level of abstraction; - it should allow dealing with a full range of relevance degrees.</Paragraph> <Paragraph position="3"> We propose here a first step towards the design of a goal interpreter satisfying the above mentioned requirements. Such an interpreter takes in input a goal specification, expressed in the goal representation language, and a fragment of ELR taken from the internal representation of the text or from the frames of the encyclopedia. Its task is to compute the relevance degree of the ELR fragment according to the given goal. To this purpose, the goal interpreter utilizes a referential knowledge base, i.e., a semantic network whose nodes are either atomic concepts that represent basic items in the subject domain or definitional concepts, i.e., structures that are used in order to define the meaning of atomic concepts. The are~ of the network connect pairs of nodes linked together by some conceptual relation such as synonymy, antonymy, generalization, specification, definition, attribute, etc. Each arc is tagged by a label, indicating the conceptual relation linking the two concepts, and by a real number in the range (0 - 1) which represents the relation degree that characterizes the link between the two concepts. The referential knowledge base represents the main knowledge source utilized by the goal interpreter in evaluating the relevance degree of an ELR fragment to a given goal. General knowledge regarding the sub-ject domain and, more specifically, concerning the discourse topic is thus wired in the referential knowledge base.</Paragraph> <Paragraph position="4"> The semantic network which constitutes the referential knowledge base is accessed starting from the ELR fragment in parallel. At this point a bidirectional search process, aimed at finding a path connecting the two entry points, begins. The search is complicated by the fact that some nodes of the network are constituted by atomic concepts whereas other nodes are definitional. It is possible to proceed from a definitional node onwards if and only if all the concepts constituting the definition can be matched, directly or through intermediate nodes, with ELR expressions. The search process terminates when a path connecting the goal and the ELR fragment is found. An appropriate function taking into account the relation degrees of the arcs in the path is computed and the result represents the relevance degree of the ELR fragment to the given goal. Whenever possible an optimum path (i.e. a path with the highest relevance degree) should be looked for, bnt such a search generally poses hard problems from a computational point of view.</Paragraph> </Section> <Section position="7" start_page="258" end_page="258" type="metho"> <SectionTitle> AN EXAMPLE </SectionTitle> <Paragraph position="0"> This section is devoted to present an example of operation of the goal interpreter. Let us consider the following fragment of text, taken from Christian (1983: 11): &quot;... The UNIX system is a moderately complex operating system. It is far simpler than the operating systems that run on maxicomtmters, but it has much more capabilities than most operating systems that run on microcomputers. For example, the UNIX system allows several programs to run simultaneously....&quot; null The purpose of this example is to show how the goal interpreter is able to identify that the last sentence of the text is to be considered important if evaluated with reference to the goal USE. By applying usual referentialstructural rules, the concept UNIX is stressed as important since it is highly referenced in the text. The ELR representation of the last sentence The rationale behind rule SE26 is that a sentence concerning an important concept is also considered important when its predicate is of kind PERFORM and its second argument (i.e., what is predicated about the important concept) is relevant to the current goal.</Paragraph> <Paragraph position="1"> The first two clauses of the IF-parr of the rule match proposition 180, since ALLOW ISA PERFORM and w(UNIX)=high. For what concerns the third clause, a deeper analysis involving the goal interpreter is needed. More specifically, the relevance degree of proposition 190, which in turn involves also proFositions 200, 210, and 220, has to be evaluated with reference to the goal USE. The portion of referential knowledge utilized by the goal interpreter in this specific case is shown in Figure 2. The network is entered through the word USE, corresponding to the goal, and the nodes RUN, PROGRAM, and SIMULTANEOUSLY. By moving through the netwolk from both entries, the path drawn in bold lines in Figure 2 is identified. The definitional node corresponding to the MULTI-TASKING concept is entered from the ELR lhrougb multiple (namely, three) arcs. The overall relevance degree of the path is computed by multiplying the relation degrees of its arcs and the result 0.58 is obtained. It should be noted that, among the several arcs entering a definitional node, only that with the lowest relation degree is considered for the computation. In this way, rule SE26 can be applied and, consequently, the importance value of proposition 180 is set to high.</Paragraph> </Section> class="xml-element"></Paper>