File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0202_metho.xml
Size: 12,590 bytes
Last Modified: 2025-10-06 14:08:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0202"> <Title>Learning to Identify Student Preconceptions from Texta0</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Text Assessment Rule Language </SectionTitle> <Paragraph position="0"> The language we use to describe assessment rules consists of several types of constraints on the text required to match the rule. The constraints are applied on a word-by-word basis to the text being tested.</Paragraph> <Paragraph position="1"> a0 The most basic constraint is the term. A term is any string of alpha-numeric characters (typically a single word).</Paragraph> <Paragraph position="2"> a0 A term abstraction is defined as any regular expression that can be applied to and match a single word (i.e. that contains no whitespace). However, we primarily use term abstractions to represent lists of words that will be considered interchangeable for purposes of the pattern matching. Any term that matches any of the words in a term abstraction matches the term abstraction. Term abstractions are typically used to represent semantic classes that might usefully be grouped together, synonyms that students tend to use interchangeably, or words a teacher might substitute for a keyword in a question.</Paragraph> <Paragraph position="3"> Term abstractions are created manually.</Paragraph> <Paragraph position="4"> a0 An ordering constraint is a requirement that two terms (or term abstractions) occur in a particular order. null a0 In addition, an ordering constraint can have an optional distance requirement. The distance requirement limits the maximum number of intervening terms that can occur between the two required terms.</Paragraph> <Paragraph position="5"> a0 Finally any number of constraints can be combined in a conjunction. The conjunction requires all its constituent constraints to be met.</Paragraph> <Paragraph position="6"> For example, the requirement that &quot;fall&quot; comes before a class of words used to indicate a greater speed, such as &quot;faster&quot;, &quot;quicker&quot;, etc., with at most two interceding words (e.g. &quot;a lot&quot;), and that the string also contains the word &quot;gravity&quot; would appear as follows.</Paragraph> <Paragraph position="7"> fall a1 2 TA fast a2 gravity where a1 2 is an ordering constraint requiring that its arguments occur in the specified order, with at most two words separating them and TA fast is a term abstraction covering the set of words &quot;faster&quot;, &quot;quicker&quot;, etc.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Relationship to Regular Expressions </SectionTitle> <Paragraph position="0"> The text assessment rule language is a subset of regular expressions. Terms are translated into regular expressions in a straightforward manner, with the term followed by one or more non-word separator characters. Term abstractions are simply an alternation of a set of terms. Ordering constraints can be achieved by concatenation. If a distance requirement is present, then that can be represented with a regular expression for matching a single, arbitrary word, repeated the appropriate number of times using the a3 min,max a4 convention for constrained repetition. The conversion of conjunctions requires a potential exponential expansion in the size of the regular expression, as each possible ordering must be represented as a separate possibility in an alternation. The rule shown above can be represented by the following regular expression. null</Paragraph> <Paragraph position="2"> That is, a regular expression matching strings in which &quot;falls&quot; appears before either &quot;fast&quot; or &quot;quick&quot; with at most two intervening words, and &quot;gravity&quot; may appear ment Rule Language. Note that this represents the rules that can be obtained by successively generalizing from an example with just two terms. This is only a portion of the entire generalization lattice. For example Term1 a1 Term3 is more general than a0 , and more specific than Term1 alone, but unordered with respect to all the hypotheses shown between those two.</Paragraph> <Paragraph position="3"> before &quot;falls&quot;, after &quot;fast&quot; or &quot;quick&quot;, or as one of the words between them.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Learning Text Assessment Rules </SectionTitle> <Paragraph position="0"> The text assessment rule learner is based on Mitchell's version spaces algorithm (Mitchell, 1982). In that framework, the set of all consistent hypotheses is represented and is updated as new examples are seen. In order to efficiently represent the potentially large number of consistent hypotheses, the hypothesis space is organized into a hierarchical lattice. The lattice is a partial ordering over hypotheses, usually defined in terms of generality. This allows the set of all consistent hypotheses to be represented by storing just the boundaries, that is, the most general and most specific consistent hypotheses. Each time a new positive example is presented, any hypothesis in the specific boundary set that is inconsistent with that example is generalized to the most specific generalization that covers the new example. Conversely, a negative example causes hypotheses in the general boundary set to be minimally specialized to exclude the example. If the specific and general boundary sets ever cross, then the version space is said to collapse. In order to implement this algorithm, a generalization hierarchy must be defined over the language being learned.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Generalization Hierarchy </SectionTitle> <Paragraph position="0"> The version spaces algorithm requires a partial order over hypotheses. The Text Assessment Rule Language generalization hierarchy is shown in figure 1.</Paragraph> <Paragraph position="1"> The figure shows the possible generalization steps that may be taken when an initial example consisting of two words is presented. If a subsequent example contains both words at a greater distance, the distance constraint may be relaxed. If the distance passes a fixed threshold, the distance constraint is removed completely. An example containing both words in the opposite order will cause the ordering constraint to be replaced by a conjunction.</Paragraph> <Paragraph position="2"> Given a conjunction, examples containing only some of the conjuncts will result in the removal of those that don't occur. If an example doesn't contain a term that appears in a rule, but does contain another term that is covered by the same term abstraction, the term in the rule is replaced with the term abstraction.</Paragraph> <Paragraph position="3"> The initial most specific hypothesis that will match any example is the conjunction of the pairwise ordering constraints over all pairs of words in the example. Starting from that initial hypothesis, the generalization process can traverse up the partial lattice shown in figure 1 for each of these pairwise ordering constraints separately.</Paragraph> <Paragraph position="4"> Generalization of terms to term abstractions can also occur at any time. For example &quot;A1 B C&quot; results in the hypothesis A1 a1 0 B a2 A1 a1 1 C a2 B a1 0 C. If the next example is &quot;C A2 D B&quot; and A1 and A2 are both in term abstraction TA A, then this will result in the hypothesis C a2 TA A a1 1 B. Thus the conversion of A1 to a term abstraction, the relaxing of the distance requirement between A and B and the removal of ordering constraints on C all happen simultaneously.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Disjunctions and Negative Examples </SectionTitle> <Paragraph position="0"> The Text Assessment Rule Language is not disjunctive, but it is reasonable to expect that students may express the same concept in a variety of ways. For example, a student with an improper understanding of the law of gravity might state that a big block will fall faster than a small block, or that the small block will fall more slowly than the big block. Merely ignoring the order of &quot;big&quot; and &quot;small&quot; or creating a term abstraction to match both fast and slow will not work. The concept is essentially disjunctive. In order to handle this situation, we use a technique we call lazy disjunction. We maintain a list of version spaces, each one essentially representing one disjunct. When a new example is encountered, we attempt to add it to each version space in turn. If any version space can incorporate the example without collapsing, then that version space is updated. If no such version space can be found, then we create a new one and seed it with the example. Thus we only create disjunctions when no other form of generalization is available. This technique is similar to one used in (Baltes, 1992). He allows at most three disjuncts and starts generalizing after the third example.</Paragraph> <Paragraph position="1"> He uses a similarity metric to determine which disjunct to generalize for subsequent examples.</Paragraph> <Paragraph position="2"> One disadvantage of lazy disjunction is that it is order dependent. If two examples can be generalized, they will be. That generalization will mean the exclusion from the resulting hypothesis, H, of terms that do not appear in both examples. A later example containing one of those terms may not be generalizable with hypothesis H even though it contains terms in common with one of the examples leading to H. This order dependence can be problematic. Essentially, generalization continues until an example with no terms in common with all prior examples is seen, since shared terms would allow for generalization. At that point, a new disjunct is created and the process continues. This results in learning rules with disjuncts that contain one or two very common words.</Paragraph> <Paragraph position="3"> While we eliminate stop words in preprocessing, there remain common content words that appear in many examples but don't relate to the concept we are trying to learn. Examples that, conceptually, form separate disjuncts are united by these red herrings. Furthermore, examples that might lead to useful generalization can be separated into different disjuncts by their coincidental similarities and dissimilarities. Our solution to this involves reducing over-generalization by using negative examples.</Paragraph> <Paragraph position="4"> Typically, the version space algorithm maintains specific and general boundary sets and updates the appropriate one depending on the class of the training example. However, because the open-ended text domain is essentially infinite, and our rule language doesn't allow directly for either disjunction or negation, the general boundary set is unrepresentable (Hirsh, 1991). Instead, we use a variant of a method proposed by Hirsh (1992) and Hirsch et al. (1997) for maintaining a list of negative examples instead of a general boundary set. Negative examples are stored explicitly. Members of the specific boundary set that match any negative example are discarded. If no members remain in the specific boundary set, then the version space has collapsed. Without negative examples, we often see rules containing a single frequently occurring word. This precludes more useful generalization over disparate disjuncts. However, since common words are likely to appear in negative examples as well as positive ones, such red herring rules are ruled out. Essentially, by lowering the bar before a version space would collapse, negative examples help reduce over-generalization.</Paragraph> <Paragraph position="5"> In order to classify a new example, it is first tested against the specific boundary set. If all the hypotheses classify it as positive then the example is classified as positive. Otherwise, an attempt is made to add the example to the version space, on the assumption that it is positive. If that causes the version space to collapse, then the assumption is false and the example is classified as being negative. Otherwise, the version space is unable to classify the example with certainty.</Paragraph> </Section> </Section> class="xml-element"></Paper>