File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1417_metho.xml

Size: 15,230 bytes

Last Modified: 2025-10-06 14:10:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1417">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Generation of Biomedical Arguments for Lay Readers</Title>
  <Section position="5" start_page="114" end_page="116" type="metho">
    <SectionTitle>
3 Domain Model
</SectionTitle>
    <Paragraph position="0"> In a previous study of the corpus (Green, 2005), we identified a small set of categories (e.g. genotype, test result, symptom) with good inter-rater reliability that can be used to describe the biomedical content of a genetic counseling letter as a causal probabilistic network (Korb and Nicholson, 2004). A prototype domain model has been manually constructed covering representative genetic disorders using only these categories of variables.</Paragraph>
    <Paragraph position="1"> By restricting a domain model to these categories, the result should reflect the simplified conceptual model of genetics used by genetic counselors in communication with their lay clients; this facilitates generation since the generator will not have to distinguish what information in the domain model is appropriate to communicate to a lay audience.</Paragraph>
    <Paragraph position="2"> Another benefit of restricting a domain model in this way is that it reduces the knowledge acquisition effort of choosing variables and determining network topology; any genetic disorder in the scope of the coding scheme (over 4500 single-gene autosomal disorders) would be modeled in terms of a small number of variable types and a standard topology. Thus, it should be straightforward to semi-automatically construct a domain model covering many different genetic disorders.</Paragraph>
    <Paragraph position="3"> Figure 1 shows part of a domain model after it has been updated with information about a particular patient's case. The nodes labeled GJB2 (mother), GJB2 (father), GJB2 (child) are genotype variables, representing the mother's, father's, and child's GBJ2 genotype, respectively. (A genotype is a pair of alleles of a gene; one allele is inherited from each parent. An individual who has two mutated alleles of the GJB2 gene usually experiences hearing loss.) The nodes labeled hearing loss (child) and non-syndromic (child) are vari- null ables representing the child's symptoms. The node labeled test result (child) is a variable representing the results of testing the child's GJB2 genotype.</Paragraph>
    <Paragraph position="4"> The most likely states of the variables are shown beside the nodes in Figure 1; T  and T   represent the time at which the (experts') belief is held, before or after the child's genetic test results are known, respectively. The information recorded in the network about this particular case is that the child was observed to have hearing loss and no features of a genetic syndrome; the preliminary diagnosis, i.e. before testing, was that the cause of hearing loss is having two mutated alleles of GJB2; the test results were negative, however; thus, the current diagnosis is some other (unspecified) autosomal recessively inherited genetic condition, represented by the genotype variable labeled other genotype (child). In addition, the parents are hypothesized to be carriers (i.e. to each have one mutated allele) of that genotype, represented by the variables labeled other genotype (mother), other genotype (father).</Paragraph>
    <Paragraph position="5"> Although a causal probabilistic network used to perform diagnosis or risk calculation would require specification of numeric probabilities, the role of the network in our system is to qualitatively model the reasoning that the medical experts have performed outside of the system. Also, we found that in the corpus numeric probabilities were provided only when citing epidemiological statistics or risks calculated according to Mendelian inheritance theory (which does not require Bayesian probability computation). Thus, instead of using numeric probabilities for domain reasoning, the domain model uses qualitative constraints based upon formal relations of qualitative influence, product synergy, and additive synergy (Druzdzel and Henrion, 1993).</Paragraph>
    <Paragraph position="6"> In addition to being adequate for natural language generation, this approach greatly reduces knowledge acquisition effort; it should be straight-forward to semi-automatically acquire the qualitative constraints of a full-scale domain model due to regularities in this domain and the use of a restricted set of variable types as described above. For example, qualitative constraints between genotypes of parents and child would be determined by whether a genotype follows an autosomal dominant or recessive inheritance pattern.</Paragraph>
    <Paragraph position="7"> We now describe some of the qualitative domain constraints. An influence relation holds between a node in a causal graph and its direct descendant. A has a positive qualitative influence on B, written S  (state(A,2), state(B,yes)). Each arc in Figure 1 implicitly represents an S + relation.</Paragraph>
    <Paragraph position="8"> Product and additive synergy describe converging connections, i.e., the relation between a set of variables {A, B} and their direct descendant C in a graph. A and B have negative product synergy with respect to state V  makes it more likely that the state of C reaches V C . This type of relationship characterizes mutually exclusive alternative diagnoses that could account for the same symptom; it also characterizes autosomal dominant inheritance, an inheritance pattern where inheriting one mutated allele of a genotype (from either parent) is usually sufficient to cause health problems. In Figure 1, the possible alternative causes of the symptoms are indicated by the X  annotations. null On the other hand, autosomal recessive inheritance, an inheritance pattern where inheriting two mutated alleles (one from each parent) is usually necessary to cause health problems, is characterized by zero product synergy (X  makes it more likely that the state of C reaches V C . For example, if the mother's, father's, and child's genotype are represented by variables A, B, and C, respectively, then</Paragraph>
    <Paragraph position="10"> sent the constraint that if the child's genotype C has two mutated alleles, then one mutated allele must have come from each parent. In Figure 1, the autosomal recessive inheritance pattern of GJB2 and the other hypothesized genetic disorder are indicated by the X  annotations.</Paragraph>
    <Paragraph position="11"> Other qualitative constraints used in the domain model are based on negative qualitative influence  ity statements composed of variables used in the network, e.g., the frequency of hearing loss due to GJB2. This type of information can be used as backing in an argument (see section 5) but does not play a role in domain reasoning.</Paragraph>
  </Section>
  <Section position="6" start_page="116" end_page="116" type="metho">
    <SectionTitle>
4 Discourse Grammar
</SectionTitle>
    <Paragraph position="0"> A discourse grammar was written based upon our analysis of the corpus and a description of standard practice in genetic counseling (Baker et al., 2002).</Paragraph>
    <Paragraph position="1"> The current grammar is intended to cover letters on single-factor autosomal genetic disorders. Thanks to the regularities in this domain and in this genre, the grammar consists of a small number of rules.</Paragraph>
    <Paragraph position="2"> The starting rule of the grammar represents the main sections of a letter in their standard order: opening, referral, preliminary diagnosis, testing, final diagnosis, origin of genetic condition, inheritance implications, prognosis/treatment, and closing. One or more grammar rules describe each of these sections.</Paragraph>
    <Paragraph position="3"> Grammar rules may request the domain reasoner for case-specific information to be included in the letter. In addition, when the grammar provides a choice of rules, rule selection is based upon case-specific information provided by the domain reasoner. For example, one rule for reporting the final diagnosis handles cases in which the patient's test results confirm the preliminary diagnosis, and another rule those cases where the preliminary diagnosis has been disconfirmed by test results; the domain reasoner returns the information needed to choose between those two rules.</Paragraph>
    <Paragraph position="4"> The process described so far creates an initial outline of the information to be presented (in non-linguistic form), including various claims requiring an argument. Each of those claims is passed to the argument generator described in the next section.</Paragraph>
    <Paragraph position="5"> For example, the letter shown in Figure 2 contains seven claims labeled C  . The information returned by the argument generator is added to the outline, completing the structure that will be transformed by the linguistic realizer into text.</Paragraph>
  </Section>
  <Section position="7" start_page="116" end_page="117" type="metho">
    <SectionTitle>
5 Argument Generation
</SectionTitle>
    <Paragraph position="0"> Given a claim, the argument generator uses argument strategies to construct a normative argument for the claim from information provided by the domain reasoner. The strategies are non-domain-specific in the sense that they refer to formal properties of the qualitative causal probabilistic domain model rather than to genetics.</Paragraph>
    <Paragraph position="1"> According to Toulmin's model of normative argument structure (1998), an argument for a claim can be analyzed in terms of various functional components: the data, warrant, and backing. The data are the facts used to defend a claim. The warrant is a principle that licenses the claim given the data. An optional backing may be used to justify the warrant, e.g., by giving the facts upon which the warrant is based. To derive the argument strategies used in the system, we analyzed the arguments in the corpus in terms of Toulmin's model; the resulting strategies describe mappings from formal properties of the domain model to the data and warrant supporting a claim and to the backing of a warrant. Several strategies are paraphrased below for illustration.</Paragraph>
    <Paragraph position="2"> Strategy 1. Argument for belief in causal claim, based on effects: An argument for the claim that it is believed to some extent at time T</Paragraph>
    <Paragraph position="4"> Strategy 2. Argument for decrease in belief to unlikely that state of causal variable is at or over threshold value, based on absence of predicted effect: An argument for the claim that there has been a decrease in belief, from time T</Paragraph>
    <Paragraph position="6"> ) hold, and the  (newly acquired) data that it is unlikely that  sibility, based on effect. An argument for the claim that it is believed to some extent at time T</Paragraph>
  </Section>
  <Section position="8" start_page="117" end_page="117" type="metho">
    <SectionTitle>
6 Example
</SectionTitle>
    <Paragraph position="0"> This section gives an example of discourse generation for the case in section 3. An outline created by application of the discourse grammar to the domain model in Figure 1 would contain, in addition to basic information about the case not requiring an argument, several claims requiring further support to be provided by the argument generator.</Paragraph>
    <Paragraph position="1"> First, the claim that it was believed, before testing, that the child's hearing loss could be due to having two mutated alleles of GJB2 would be supported by an argument constructed using Strategy 1. The data of the argument is the presupposition that the child has hearing loss and the additional finding that she has no syndromic features. The warrant is the positive influence relations (S + ) linking the variable representing the child's GJB2 genotype to each of the two variables representing the child's symptoms. Note that if a reader questioned this argument, an interactive system could provide information on the source of the data or epidemiological statistics backing the warrant.</Paragraph>
    <Paragraph position="2"> Second, the claim that it is currently believed, after testing, that it is unlikely that the child's GJB2 genotype has two mutated alleles would be supported by an argument constructed using Strategy 2. The data of the argument is that the child's GJB2 test results were negative. The warrant is the positive influence relation (S + ) from the child's GJB2 genotype to the child's GJB2 test results, which predicts that if the child had this mutation, then the test results would have been positive. If a reader questioned this argument, an interactive system could provide information on the source of the data or back the warrant by providing information about the rate of false negatives.</Paragraph>
    <Paragraph position="3"> Third, the claim that it is currently believed, after testing, that it is possible that the child has some other genetic condition that is responsible for her hearing loss would be supported by an argument constructed using Strategy 3. The data of the argument is that she has hearing loss and the current belief that GJB2 is not likely responsible. The warrant is the negative product synergy relation</Paragraph>
    <Paragraph position="5"> ) between the child's GJB2 genotype and another genotype to hearing loss. If a reader questioned this argument, an interactive system could provide information on the proportion of cases of hearing loss that are due to other genetic conditions as backing for the warrant.</Paragraph>
    <Paragraph position="6"> Fourth, the claim that it is currently believed, after testing, that it is possible that the parents are carriers (i.e., each has one mutated allele) of the unspecified genotype claimed to be responsible for the child's hearing loss would be supported by an argument constructed using Strategy 4. The data of the argument is the presupposition that the child has two mutated alleles of the other genotype. The warrant is the zero product synergy relation (X  ) between the two parents' genotype for this alternative to GJB2 and the child's genotype for this same alternative. If a reader questioned this argument, an interactive system could provide an explanation of the warrant, which is based on the theory of Mendelian inheritance; or it could provide the argument for the data, i.e., the belief that the child has two mutated alleles of the other genotype.</Paragraph>
    <Paragraph position="7"> Finally, the claim that it is currently believed, after testing, that assuming they are both carriers there is a 25% probability that each future child that the two parents have together will inherit two mutated alleles of the other genotype would be supported by an argument constructed by a strategy not shown in section 5. The data is the assumption that the parents are both carriers, and the warrant is the same zero product synergy relation</Paragraph>
    <Paragraph position="9"> reader questioned this argument, an interactive system could provide an explanation of how the probabilities are determined by zero product synergy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML