XML Viewer - c04-1027

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1027_metho.xml
Size: 14,424 bytes
Last Modified: 2025-10-06 14:08:42
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1027">
  <Title>Learning theories from text</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Domain Theory for Company
Succession Events
</SectionTitle>
    <Paragraph position="0"> We found that the most successful method, given the absence of negative data, was to use WARMR to learn association rules from the positive data. As with all types of association rule learning, WARMR produces a huge number of rules, of varying degrees of coverage. We spent some time writing filters to narrow down the output to something useful. Such filters consist of constraints ruling out patterns that are definitely not useful, for example patterns containing a verb but no arguments or attributes. An example of such a restriction is provided below:  pattern_constraint(Patt):member(verb(_,E,_A,_,_),Patt), null</Paragraph>
    <Paragraph position="2"> \+constraint_on_attr(Patt,Attr)).</Paragraph>
    <Paragraph position="3"> If pattern constraint/1 succeeds for a pattern Patt, then Patt is discarded. Basically, this says that a rule isn't useful unless it contains a verb and one of its attributes that satisfies a certain constraint. A constraint might be of the following form: constraint_on_attr(Patt, Attr) :member(class(_,Attr), Patt).</Paragraph>
    <Paragraph position="4"> The above states that there should be a classification of the attribute Attr present in the rule. A useful pattern Patt will satisfy such constraints. Some of the filtered output, represented in a more readable form compatible with the examples above are as follows (note that the first argument of the verb/2 predicate refers to an event):</Paragraph>
    <Paragraph position="6"> While there are many other rules learned that are less informative than this, the samples given here are true generalisations about the type of events described in these texts: unremarkable, perhaps, but characteristic of the domain. It is noteworthy that some of them at least are very reminiscent of the kind of templates constructed for Information Extraction in this domain, suggesting a possible further use for the methods of theory induction described here.</Paragraph>
    <Paragraph position="7"> 4 Learning weighted finite state automata While this experiment was reasonably successful, in that we were able to induce plausible looking domain generalisations, the process of selecting these from the output of WARMR requires further supervision of the learning process. We therefore tried to devise a method of taking the output directly from WARMR and processing it in order to automatically produce domain knowledge. Presenting the data as weighted FSAs serves the twofold purpose of reducing the amount of rules output from WARMR, thanks to minimization techniques, while providing a more visualisable representation. Weighted FSAs can also be seen as a simple kind of probabilistic graphical model. We intend to go on to produce more complex models of this type like Bayesian Networks, which are easier to use in a more robust setting, e.g. for disambiguation purposes, than the traditional symbolic knowledge representation methods presupposed so far.</Paragraph>
    <Paragraph position="8"> Before explaining the conversion to FSAs we look in more detail at the representation of the WARMR output.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Representing WARMR Output
</SectionTitle>
    <Paragraph position="0"> Each of the numerous patterns resulting from WARMR consists of a list of frequently associated predicates, found in the flat quasi-logical forms of the input sentences. An example of such a pattern is provided by the following:</Paragraph>
    <Paragraph position="2"> 0.1463).</Paragraph>
    <Paragraph position="3"> The first argument of the predicate freq/3 shows the level of the algorithm at which the pattern/query was acquired (DeHaspe, 1998). The fact that the pattern was acquired at the sixth level means it was created during the sixth iteration of the algorithm trying to satisfy the constraints input as settings to the system. This pattern satisfied four constraints, two of them twice. The second argument of freq/3 is the query itself and the third is its frequency.</Paragraph>
    <Paragraph position="4"> What is meant by frequency of the query in this instance is the number of times it succeeds (i.e. the number of training examples it subsumes), divided by the number of training examples. To illustrate the meaning of such a pattern one needs to reconstruct the predicate-argument structures while maintaining the flat format. Thus, the above pattern is converted to the following: list(529,0.1463,[elect(A,B,C), cperson(C), succeed(D,C,E), cperson(E)]).</Paragraph>
    <Paragraph position="5"> It is now easier to understand the pattern as :'A person C who is elected succeeds a person E'. However, it is still not straightforward how one can evaluate the usefulness of such patterns or indeed how one can incorporate the information they carry into a system for disambiguation or reasoning. This problem is further aggravated by the large number of patterns produced. Even after employing filters to discard patterns of little use, for example ones containing a verb but no classification of its arguments, over 26,000 of them were obtained. This is because many of the patterns are overly general: the training set consists of only 372 verb predicates and a total of 436 clauses. Such overgeneration is a well known problem of data mining algorithms and requires sound criteria for filtering and evaluation. Most of the patterns generated are in fact variants of a much smaller group of patterns. The question then arises of how it is possible to merge them so as to obtain a small number of core patterns, representative of the knowledge obtained from the training set. Representing the patterns in a more compact format also facilitates evaluation either by a human expert or through incorporation into a pre-existing system to measure improvement in performance.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 FSA conversion
</SectionTitle>
    <Paragraph position="0"> Given the large amount of shared information in these outputs, we decided to try to represent it as a set of Finite State Automata, where each transition corresponds to a literal in the original clauses. Since all the literals in the raw output are simply conjoined, the interpretation of a transition is simply that if one literal is true, the next one is also likely to be true. Our aim was to be able to use standard FSA minimisation and determination algorithms (Aho et al., 1986),(Aho et al., 1974) to reduce the large set of overlapping clauses to something manageable and visualisable, and to be able to use the frequency information given by WARMR as the basis for the calculation of weights or probabilities on transitions.</Paragraph>
    <Paragraph position="1"> To convert our patterns into FSAs (and in particular recognizers), we used the package FSA Utilities (version FSA6.2.6.5)(van Noord, 2002), which includes modules for compiling regular expressions into automata (recognizers and transducers) by implementing different versions of minimisation and determinisation algorithms. The package also allows operations for manipulating automata and regular expressions such as composition, complementation etc. As the FSA Utilities modules apply to automata or their equivalent regular expressions, the task required converting the patterns into regular expressions. To do this we treat each literal as a symbol. This means each verb and attribute predicate with its respective arguments is taken to denote a single symbol. The literals are implicitly conjoined and thus ordering does not matter. Thus we chose to impose an ordering on patterns, whereby the main verb appears first, followed by predicates referring to its arguments. Any other verbs come next, followed by predicates describing their arguments.</Paragraph>
    <Paragraph position="2"> This ordering has the advantage over alphanumeric ordering that it allows filtering out alphabetic variants of patterns where the predicates referring to the arguments of a verb precede the verb and the variables are thus given different names which results in different literals. This ordering on patterns is useful as it allows common prefixes to be merged during minimisation. Since variable names play an important role in providing co-indexation between the argument of a verb and a property of that argument, designated by another predicate, terms such as 'elect(A,B,C)' and 'elect(D,E,F)' are considered to be different symbols. Thus a pattern like: list(768,0.07,[elect(A,B,C),cperson(C), chairman(C,D),old(C,E,F), of(D,G),ccompany(G)]).</Paragraph>
    <Paragraph position="3"> was converted to the regular expression:  'ccompany(G)']).</Paragraph>
    <Paragraph position="4"> The first argument of the macro/2 predicate is the name of the regular expression whereas the second argument states that the regular expression is a sequence of the symbols 'elect(A,B,C)','cperson(C)','chairman(C,D)' and so on. Finally, the entire WARMR output can be compiled into an FSA as the regular expression which is the union of all expressions named via an xnumber identifier. This is equivalent to saying that a pattern can be any of the xnumber patterns defined.</Paragraph>
    <Paragraph position="5"> We took all the patterns containing 'elect' as the main verb and transformed them to regular expressions, all of which started with 'elect(A,B,C)'. We then applied determinisation and minimisation to the union of these regular expressions. The result was an automaton of 350 states and 839 transitions, compared to an initial 2907 patterns.</Paragraph>
    <Paragraph position="6"> However, an automaton this size is still very hard to visualize. To circumvent this problem we made use of the properties of automata and decomposed the regular expressions into subexpressions that can then be conjoined to form the bigger picture.</Paragraph>
    <Paragraph position="7"> Patterns containing two and three verbs were written in separate files and each entry in the files was split into two or three different segments, so that each segment contained only one verb and predicates referring to its arguments. Therefore, an expression such as: macro(x774,[elect(A,B,C),cperson(C), resign(D,E,F),cperson(E), succeed(G,C,E)]).</Paragraph>
    <Paragraph position="8"> was transformed into: macro(x774a,['elect(A,B,C)', 'cperson(C)']).</Paragraph>
    <Paragraph position="9"> macro(x774b,['resign(D,E,F)', 'cperson(E)']).</Paragraph>
    <Paragraph position="10"> macro(x774c,['succeed(G,C,E)']).</Paragraph>
    <Paragraph position="11"> One can then define the automaton xpression1, consisting of the union of all first segment expressions, such as x774a, the automaton resign2, conisting of all expressions where resign is the second verb and succeed3. The previous can be combined to form the automata [xpression1,resign2] or [xpression1,resign2,succeed3] and so on. The automaton [xpression1,resign2] which represents 292 patterns, has 32 states and 105 transitions and is much more manageable.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 Adding weights
</SectionTitle>
    <Paragraph position="0"> The FSA rules derived from the WARMR patterns would be of more interest if weights were assigned to each transition, indicating the likelihood of any specific path/pattern occurring. For this we needed to obtain weights, equivalent to probabilities for each predicate-argument term. Such information was not readily available to us. The only statistics we have correspond to the frequency of each entire pattern, which is defined as: Freq = number of times the pattern matched the training datanumber of examples in the training set We took this frequency measure as the probability of patterns consisting of single predicates (e.g. 'elect(A,B,C)', which is equivalent to 'B elects C') whereas the probabilities of all other pattern constituents have to be conditioned on the probabilities of terms preceding them. Thus, the probability of 'cperson(C)', given 'elect(A,B,C)' is defined by the following:</Paragraph>
    <Paragraph position="2"> where P('elect(A,B,C)',' cperson(C))' is the frequency of the pattern ['elect(A,B,C)',' cperson(C)'] and</Paragraph>
    <Paragraph position="4"> That is, the probability of P('elect(A,B,C)') is the sum of all the probabilities of the patterns that contain 'elect(A,B,C)' followed by another predicate. If such patterns didn't exist, in which case the sum would be equal to zero, the probability would be just the frequency of the pattern 'elect(A,B,C)'.</Paragraph>
    <Paragraph position="5"> In principle the frequency ratios described above are probabilities but in practice, because of the size of the dataset, they may not approximate real probabilities. Either way they are still valid quantities for comparing the likelihood of different paths in the FSA.</Paragraph>
    <Paragraph position="6"> Having computed the conditional probabilities/weights for all patterns and constituents, we normalized the distribution by dividing each probability in a distribution by the total sum of the probabilities. This was necessary in order to make up for discarded alphabetic variants of patterns.</Paragraph>
    <Paragraph position="7"> We then verified that the probabilities summed up to 1. To visualise some of the FSAs (weighted recognizers) we rounded the weights to the second decimal digits and performed determinization and minimization as before. Rules obtained can be found in Figures 1 and 2 (see figures on last page): The automaton of Figure 1 incorporates the following rules:  1. 'If a person C is elected, another person E has resigned and C succeeds E' 2. 'If a person C is elected director then another person F has resigned and C succeeds F' 3. 'If a person C is elected and another person E  pursues (other interests) C succeeds E' The automaton of Figure 2 provides for rules such as: 'If a person is elected chairman of a company E then C succeeds another person G'.</Paragraph>
    <Paragraph position="8"> At each stage, thanks to the weights, it is possible to see which permutation of the pattern is more likely.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML