File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1020_metho.xml
Size: 11,387 bytes
Last Modified: 2025-10-06 14:11:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1020"> <Title>NATURAL LANGUAGE ACCESS TO STRUCTURED TEXT</Title> <Section position="3" start_page="0" end_page="127" type="metho"> <SectionTitle> THE TEXT ACCESS COMPO~NT </SectionTitle> <Paragraph position="0"> In the text access component, a user's request is translated into logical form by SRI's DIALOGIC system, described in another paper submitted to this conference \[Grosz et al, 1982\]. This logical expression is then turned over to the lnferencing component DIANA \[Hobbs, 1980\], where various discourse problems are solved and a match with the Text Structure is sought.</Paragraph> <Paragraph position="1"> As an illustration of this process, consider the following example query: During what period is immunopropbylaxis appropriate follo~rlng exposure to type B hepatitis? DIALOGIC translates the request into the following form:</Paragraph> <Paragraph position="3"> That is, during period ?X, the PSmmunoprophylaxis I of X1 against Y, where I follows an exposure event of X2 to hepatitis B, is appropriate.</Paragraph> <Paragraph position="4"> Two kinds of discourse problems are exemplified here. First, there is the problem of determining implicit arguments. We are not told explicitly what 128 J.R. HOBBS et al.</Paragraph> <Paragraph position="5"> Immunoprophylaxls is agalnst, only what exposure was --tdeg&quot; We need to draw the inference that exposure to solt~thlng is typlcally followed by Immunoprophylaxls against it. This problem must be solved if we are to retrieve the proper passages on immunization against hepatitis B virus (HBV) rather than some other agent. Similarly, we are not told expllcltly that the one who was exposed is the one who will receive immuooprophylaxls, that is, that X1 and X2 are the same Indlvldual. The second discourse problem illustrated here is that of metonymy. One may talk about both exposure to HBV and exposure to type B hepatitis. In the first case we are talking about exposure to a virus, in the second exposure to a disease. The Text Structure is expressed in canonical predicates in a standardized form, and one of the standardizations is in the class of entities that can be the argument of a predicate. We must decide, for each predicate, the type of arguments it can take. For example, is one exposed to a virus or a disease? For various reasons, we have decided that one is exposed to a virus and not to a disease. Thus the infereocing procedures have to analyze the actual query into one involving exposure to the virus causin~ type B hepatitis, or to HBV. This coercion is done by accesslng-lnformatlon in a knowledge base that &quot;expose&quot; requires a virus as its second argument, that type B hepatitis is caused by HBV, and that HBV is a virus.</Paragraph> <Paragraph position="6"> In order to match the request with the Text Structure, DIANA needs to translate the original request into the canonical predicates in which the Text Structure is expressed. For example, since &quot;immunoprophylaxis&quot; is not one of the canonical predicates, we need to use the axiom IHHUNOPROPHYLAXIS (i,p,v) iff It~JNIZg(i, p, PROPHYLAXIS(v)) that is, i is an immunoprophylaxis event of p against v if and only if I is an immunization event of p for prophylaxis against v. The result is a translation into the canonical predicates &quot;immunize&quot; and &quot;prophylaxis&quot;, which are used in the summaries of the relevant passages in the Text Structure.</Paragraph> </Section> <Section position="4" start_page="127" end_page="127" type="metho"> <SectionTitle> GENERATING TEXT STRUCTURE </SectionTitle> <Paragraph position="0"> Our work on the automatic generation of the Text Structure is at a more preliminary stage. Automatic summarization is a central aspect of this effort. A certain amount of work has been done in artlflclal intelllgence and psychology on the automatic construction of summaries, including work by Rumelhart \[1975\], Handler and Johnson \[1977\], Schank and his colleagues \[Schank et al.j 1980\], and Lehnert et al. \[1981\]. Host of this work has focused on narratives rather than expository discourse, however.</Paragraph> <Paragraph position="1"> There are two prlnclpal techniques that we have brought to bear on the problem.</Paragraph> <Paragraph position="2"> The most important involves a coherence analysis of the paragraph, in a manner described in detail in Hobbs \[1976, 1978\] and similar to work by Longacre \[1976\] and Grimes \[1975\].</Paragraph> <Paragraph position="3"> It can be argued that, in coherent discourse, one of a smull number of coherence relatlons~ such as ~arallel and elaboration, holds between successive segments of the text. The coherence relations can be defined in terms of the inferences that can be drawn from what is asserted by the segments being linked (called the assertions of the segments). Thus, very roughly~ two sentences are parallel if their assertions make the same predications about similar entities.</Paragraph> <Paragraph position="4"> These coherence relations allow one to build up a tree-like coherence structure for the whole text recurslvely, as follows: The coherence relations are defined between segments. A clause (perhaps elliptlcal) is a segment. When some coherence relation holds between two segments, the two together constitltute a co_.~posed segment, which can itself be related to other segments of the text.</Paragraph> <Paragraph position="5"> Since the coherence relations are defined in terms of the assertions of segments, we need to specify what the assertions of the composed segments are. For this purpose we use a number of heurlstlcs. For example, if two sentences are</Paragraph> </Section> <Section position="5" start_page="127" end_page="127" type="metho"> <SectionTitle> NATURAL LANGUAGE ACCESS TO STRUCTURED TEXT 129 </SectionTitle> <Paragraph position="0"> parallel, lt is because the same predication is made about similar entities. Then the assertion of the composed segment makes that same predication about the superset to which the similar entities belong. Thus, every node in the coherence structure has an asserti6n associated with it. Very frequently the assertion associated with the top node of the coherence structure of a passage can funbtion as the summary of the passage.</Paragraph> <Paragraph position="1"> As an illustration of this technique, consider the following passage: (PI) Blood probably contains the highest concentration of hepatitis B virus of any tissue except liver. Semen, vaginal secretions, and menstrual blood contain the agent and are infective. Saliva has lower concentrations than blood, and even hepatitis B surface antigen may be detectable in no more than half of infected individuals. Urine contains low concentrations at any given time.</Paragraph> <Paragraph position="2"> After a grammatical analysis, the sentences in this passage can be aligned as in Figure I. 1 Every clause considers some body material containing HBV in some concentration. They are thus linked by the parallel coherence relation, and the assertion (and the summary) of the passage is as follows:</Paragraph> </Section> <Section position="6" start_page="127" end_page="127" type="metho"> <SectionTitle> CONTAIN (BODY-MATERIAL, HBV, CONCENTRATION) </SectionTitle> <Paragraph position="0"> Many paragraphs we have analyzed in this way turn out to have a parallel structure, and thus their summaries can often be constructed in a similar manner.</Paragraph> <Paragraph position="1"> A second factor must also be taken into account in constructing the summarlzations. In addition to containing summaries of individual passages, the Text Structure contains a representation of the hierarchical organization of the document as a whole, as well as other aspects of its overall structure. The place of an individual passage within the hierarchical organization constrains what can function as a summary of the passage. A summary must distinguish a passage from other passages at the same level in the hierarchy. Top-down considerations frequently lead us to refine a Summary we arrive at solely by the bottom-up coherence analysis.</Paragraph> <Paragraph position="2"> As an example, consider the following passage: (P2) Generally blood donor quality is held high by avoiding commerclal donors, persons with alcoholic cirrhosis, and those practicing illlelt self-lnjectlon. Extremely careful selectlon of paid donors may provide safe blood sources in some instances.</Paragraph> <Paragraph position="3"> 1 This diagram is similar to the formats developed by Sager and her colleagues \[Sager, 1981\].</Paragraph> <Paragraph position="4"> A coherence analysis results in the structure show in Figure 2. &quot;Selection&quot; contrasts with &quot;avoiding,&quot; so we can say that the second sentence expresses an exception to the first conjunct of the first sentence. Because the second sentence is hedged very heavily, the assertion of the composed segment is the assertion of the initial conjunct of the first sentence--&quot;avoid commercial donors.&quot; The three assertions of the first sentence stand in a parallel relation since they imply the same proposition about similar entitles. They all imply (trivially) that certain classes of potential donors are to be avoided if blood quality is to be held highdeg Entities are similar if they share some common and reasonably specific property, that is, if they belong to some common and reasonably small superset. Our three classes of potential donors are similar in that they are all potential donors. The similarity would be stronger if there were some more specific property that characterized commercial donors, those with alcoholic cirrhosis, and illicit self-inJectors, but there does not seem to be such a property. The most we can say seems to be that they are potential donors, and we arrive at the following assertion for the paragraph as a whole.</Paragraph> <Paragraph position="5"> AVOID (DONOR I CONDITION (DONOR)) However, such a summary fails to distinguish this paragraph from its siblings in the hierarchical structure of the HKB as a whole. The nodes most immediately dominating this section in the hierarchy of the HKB correspond to sections about the quality of blood products under varying conditions, with respect to the risk of hepatitis in transfusion. There are two broad classes of conditions that are discussed, first, conditions characterizing the donor, and second, conditions characterizing the type of 51ood product. Among the conditions characterizing the donor are a history of hepatitis, recent transfusions, and positive results on serologic tests, as well as the conditions described in the example. Thus, the structure of the summaries In the paragraphs should be something like that shown in Figure 3.</Paragraph> <Paragraph position="6"> It is therefore not sufficient for us to characterize the paragraph as being about avoiding potential donors exhibiting some condition. Thus, top-down considerations lead us to reject the summary we came up with solely by the bottom-up coherence analysis. We need something more specific, and the best we can do is simply to have a disjunction of properties ss the condition characterizing the donors:</Paragraph> </Section> class="xml-element"></Paper>