File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0915_metho.xml
Size: 18,008 bytes
Last Modified: 2025-10-06 14:09:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0915"> <Title>Interpreting Communicative Goals in Constrained Domains using Generation and Interactive Negotiation</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> YOU ARE ALLERGIC TO ASPIRIN in the section </SectionTitle> <Paragraph position="0"> about product warnings.</Paragraph> <Paragraph position="1"> The communicative goal warning about the risk of Reye's syndrome in children is expressed in a long and complex sentence: Children and teenagers should not use this medicine for chicken pox or flu symptoms before a doctor is consulted about Reye syndrome, a rare but serious illness reported to be associated with aspirin. Considering the fact that no other communicative goals should be in competition with this one in this class of documents when Reye's syndrome is involved, its identification can be quite simple.2 In fact, it illustrates the fact that the interpretation of communicative goals within documents in constrained domains may not always require a very fine-grained semantic analysis, and that some indicators can already be quite informative.</Paragraph> <Paragraph position="2"> However, it is unquestionable that in general identifying communicative goals and comparing them to predefined communicative goals clearly requires high-level interpretation capabilities, which would normally be those of an expert of the domain. With our application to normalize documents as target, we have proposed an approach to extract the communicative content of documents in constrained domains automatically.</Paragraph> <Paragraph position="3"> Considering that we wanted to obtain a practical normalization system, we further defined an approach to allow a human expert identifying the correct communicative content of a document from the set of hypotheses produced automatically. null This task should not be confused with text paraphrasing, for example for rewriting into a 2We do not claim that this is necessarily true in expert medical terms. Nonetheless, the normalization model that we used only considered this communicative goal involving</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Reye's Syndrome. Drug Interaction Precautions: </SectionTitle> <Paragraph position="0"> Do not take this product if you are taking a prescription drug for anticoagulation (thinning the blood), diabetes or gout unless directed by a doctor.</Paragraph> <Paragraph position="1"> Warnings: Children and teenagers should not use this medicine for chicken pox or flu symptoms before a doctor is consulted about Reye syndrome, a rare but serious illness reported to be associated with aspirin. Do not take this product if you have asthma, an allergy to aspirin, stomach problems (such as heartburn, upset stomach, or stomach pain) that persist or recur, ulcers or bleeding problems, or if ringing in the ears or a loss of hearing occurs, unless directed by a doctor. Do not take this product for pain for more than 10 days unless directed by a doctor. If pain persists or gets worse, if new symptoms occur, or if redness or swelling is present , consult a doctor because these could be signs of a serious condition. As with any drug. If you are pregnant or nursing a baby, seek the advice of a health professional before using this product. It is especially important not to use aspirin during the last 3 months of pregnancy unless specifically directed to do so by a doctor because it may cause problems in the unborn child or complications during delivery. Keep this and all drugs out of the reach of children. In case of accidental overdose, seek professional assistance or contact a poison control center immediately.</Paragraph> <Paragraph position="2"> Alcohol Warning: If you consume 3 or more alcoholic drinks every day, ask you doctor whether you should take aspirin or other pain relievers or fever reducers. Aspirin may cause stomach bleeding.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> WARNINGS </SectionTitle> <Paragraph position="0"> Product warnings. DO NOT TAKE THIS DRUG IF YOU ARE ALLERGIC TO ASPIRIN. Do not take this product for more than 10 days unless directed by a health professional. Consult your doctor if pain persists or gets worse.</Paragraph> <Paragraph position="1"> Alcohol. Do not take alcohol when you take this drug or ask your doctor for an alternative pain reducer. Particular conditions. A doctor should be consulted before taking this drug if you have any of the following conditions: - asthma - stomach problems - ulcers - bleeding problems Children and teenagers. CONSULT A DOCTOR BEFORE ADMINISTERING THIS PRODUCT TO A CHILD OR A TEENAGER, AS IT CAN INCREASE THE RISKS OF A SERIOUS ILLNESS CALLED REYE'S SYNDROME.</Paragraph> <Paragraph position="2"> Pregnancy. Consult a doctor before taking this drug if you are pregnant. Using aspirin during the last 3 months of pregnancy may cause problems to the unborn child or complications during delivery. Overdose. Stop taking this drug immediately and call a poison control control center or a health professional if you have taken too much of this drug.</Paragraph> <Paragraph position="3"> controlled language (see e.g. (Nasr, 1996)). The main objective of our task is to identify which communicative goals from a given repertoire occur in a document, and to build a well-formed communicative structure that contains them.3 Because the speech acts conveying a communicative goal (such as one that says that a doctor should be consulted before taking a given drug in case of pregnancy) can be performed under a wide range of surface forms, text paraphrasing would have to transform very different surface forms into the same target normalized text.</Paragraph> <Paragraph position="4"> Through document normalization, we want to enforce 4 properties of document well-formedness that should be encodable into the normalization model used: + Well-formedness of the communicative structure of documents: sentences should be well articulated to form a coherent discourse.</Paragraph> <Paragraph position="5"> + Consistency of the communicative content: incompatible communicative goals should not coexist in the same document.</Paragraph> <Paragraph position="6"> + Completeness of the communicative content: communicative content imposed by some communicative goal must be present.</Paragraph> <Paragraph position="7"> + Comprehensibility and coherence of the language used: readers should be able to identify easily the communicative intentions across documents of the same class.</Paragraph> <Paragraph position="8"> Text paraphrasing into a controlled language at the level of the sentence would only enforce the last property, because if controlled language rules can enforce some level of semantic wellformedness, they cannot guarantee the three other properties.</Paragraph> <Paragraph position="9"> 3It is true, however, that document normalization of a given document with very particular properties relative to a normalization model could be achieved by text paraphrasing at the level of the sentence, but this is too specific to us.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Automatic analysis of the </SectionTitle> <Paragraph position="0"> communicative content of a document in constrained domain Several approaches have already been experimented to analyze the content of documents in constrained domains, which can vary depending on the amount of surface analysis of the text. One type of approach uses information extraction techniques such as pattern matching that use strong predictions on the content and attempt to fill templates derived from a model of the domain (e.g. (Blanchon, 2002)), thus not giving too much importance to syntactic structure. Another type of approach first performs a syntactic analysis of the text, from which semantic dependencies can be extracted. The system presented in (Brun and Hag`ege, 2003) derives normalized predicates encoding the meaning of documents from semantic dependencies found by a robust parser. This allows obtaining identical semantic interpretations for paraphrases such as ProductX is a colorless, non flammable liquid and ProductX is a liquid that has no colour and that does not burn easily.</Paragraph> <Paragraph position="1"> These approaches require an encoding of templates and extraction or normalization rules that may be difficult to build and to maintain. Furthermore, if they seem appropriate for extracting surface semantic information, interpreting communicative goals using these techniques may be more difficult. Indeed, communicative goals can be expressed with different surface texts carrying semantic differences that may not bear any significance for our purpose and may not always be considered as paraphrases. In the following examples from pain reducer leaflets, it may be acceptable that a particular normalization model consider the three following sentences as carrying one and only communicative goal: 1. This product should not be taken for more than 14 days without first consulting a health professional.</Paragraph> <Paragraph position="2"> 2. If pain persist after 14 days, consult your doctor before taking any more of this product.</Paragraph> <Paragraph position="3"> 3. If symptoms persist for 2 weeks, stop using this product and see a physician.</Paragraph> <Paragraph position="4"> In order to be able to identify communicative goals, we believe that it is important to consider them within a well-formed communicative structure. Therefore, we think that the central objects for analysis should be well-formed descriptions of document communicative content4, as it may be counterproductive to spend too much effort on the fine-grained analysis of surface text. If semantic dependencies can be expressed in these descriptions, then the space of possible contents will filter out incompatible communicative goals and thus disambiguate without always requiring a more fine-grained semantic analysis.</Paragraph> <Paragraph position="5"> We have proposed an approach for the deep content analysis of documents in contrained domain, fuzzy inverted generation (Max and Dymetman, 2002). Well-formed document content representations are produced for the class of the input document. From these representations, normalized texts are generated, and a score of semantic similarity taking into account common descriptors is computed between the normalized texts and the text of the input document. The underlying hypotheses are, as we said earlier on, that considering well-formed content representations can restrict the space of the communicative goals to consider, and that the presence of informative textual indicators can help identifying communicative goals.</Paragraph> <Paragraph position="6"> However, the space of content representations being potentially huge, a heuristic search can be performed to find the candidate representations with the best global scores. Moreover, in order to better cover the space of possible texts, the generation of the text can be done non-deterministically, so that several texts will compete over the input document from the same content representation.</Paragraph> <Paragraph position="7"> Figure 3 shows how several texts produced from a content representation can span several documents from the space of possible texts. The content representation that corresponds to the text with the 4This is under the assumption that the input documents are semantically well-formed and complete, but if they are not then the model used can indicate for what reasons they are ill-formed, and document normalization can be used to correct those documents so that they become valid relative to the normalization model.</Paragraph> <Paragraph position="8"> highest similarity score with the input document is then considered to be the most likely candidate.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Interactive validation of the correct </SectionTitle> <Paragraph position="0"> communicative content Relying solely on information retrieval techniques to associate a normalized content representation to an input document is unfortunately unlikely to yield good results, even if linguistically-oriented techniques can improve accuracy (Arampatzis et al., 2000). We have advocated an interactive approach to text understanding (Dymetman et al., 2003) where the input text is used as a source of information to assist the user in re-authoring its content. Following fuzzy inverted generation, an interactive negotiation can take place between the system and its hypotheses (the candidate content representations) on the one hand, and a human expert on the second. A naive way would be to let the expert choose which hypothesis is correct based on the normalized text associated with each one of them. But this would be a tedious and error-prone process. Rather, underspecifications from analysis can be found by building a compact representation of the candidates, and then used to engage in negotiations over local interpretation issues.</Paragraph> <Paragraph position="1"> Using interactive validation with generated texts has already been used in several domains: for example, (Blanchon, 1994) proposed disambiguation dialogues involving reformulations for dialogue-based machine translation; (Overmyer et al., 2001) proposed a text that can be used to inspect the domain object model automatically built from a text describing a software engineering domain model. In the following section, we introduce our implementation of a prototype system for interactive document normalization based on the two presented approaches.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Interactive document normalization </SectionTitle> <Paragraph position="0"> system Systems implementing controlled document authoring (Hartley and Paris, 1997) are based on an interaction with an author who makes semantic choices that define the content of a document, from which multilingual textual versions can be produced. Therefore, these systems integrate resources that can be used to represent document content and to generate textual versions of the documents. The MDA system developed at XRCE (Dymetman et al., 2000; Brun et al., 2000) uses a formalism inspired from Definite Clause Grammars (Pereira and Warren, 1980) that encodes both the abstract semantic syntax of well-formed documents and the concrete syntax for the documents in several languages.5 MDA grammars contain the definition of semantic objects of a given semantic type, which are used to build typed abstract semantic trees. Importantly, the formalism can encode the three levels for a normalization model that we described in our introduction: semantic objects can be of any granularity and can thus be communicative goals; the communicative structure is described by the abstract semantic syntax, which can be used to express semantic dependencies across subtrees; and the text generated is entirely under control, so normalized texts can be associated with communicative goals.</Paragraph> <Paragraph position="1"> formalism of MDA for our implementation. The architecture of our normalization system is shown on figure 4. Textual descriptors (WordNet synsets in our current implementation) are first extracted from the text of the input document to build the profile of the input document. The MDA grammar used was previously compiled offline in order to associate profiles to each semantic objects and types described in the grammar. Fuzzy inverted generation is then performed from the profile of the document and the profiled grammar. Details on the implementation using MDA grammars have been described elsewhere (Max, 2003a; Max, 2003b).</Paragraph> <Paragraph position="2"> The set of abstract semantic trees extracted by fuzzy inverted generation is then used to build a compact representation (a factorized abstract semantic tree) for interactive negotiation with an expert. The output of this phase is a single abstract semantic tree, such as the one shown on figure 5 that is used for interactive validation. The icon represents a semantic object that dominates a semantic subtree containing no underspecifications; the icon represents a semantic object that does not take part in any underspecification, but which dominates a subtree that contains at least one; the icon represents a semantic type that is underspecified, that is for which at least two semantic objects are in competition; finally, the icon denotes semantic objects in competition, which are ordered for a given type by decreasing score of plausibility.</Paragraph> <Paragraph position="3"> The MDA grammar used for analysis can then be used to produce the text associated with this tree, which corresponds to the normalized version of the input document that was validated by the expert.</Paragraph> <Paragraph position="4"> The interface of our system displays an enumer- null compact representation. They are ordered by decreasing score, where the score can indicate the average score of the objects in competition, or the inverse of the average number of candidates per object in competition. Therefore, the expert can choose to resolve first underspecifications that contain likely objects, or underspecifications that involve few candidates so that the validation of an object will prune more candidates from the compact representations. Clicking on an underspecification in the list triggers a negotiation dialogue similar to that on figure 6. The semantic type on that dialogue, specifies how links are shown, is not supported by any evidence in the input document. The expert can however choose a value for it.</Paragraph> </Section> class="xml-element"></Paper>