File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/e99-1047_metho.xml
Size: 4,958 bytes
Last Modified: 2025-10-06 14:15:22
<?xml version="1.0" standalone="yes"?> <Paper uid="E99-1047"> <Title>Transformation-Based Learner</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 #-TBL Rules &= Representations </SectionTitle> <Paragraph position="0"> The point of departure for TBL is a tagged initial-state corpus and a correctly tagged training corpus. Assuming the part-of-speech tagging task, corpus data can be represented by means of three kinds of clauses: wd(P,W) is true iff the word W is at position P in the corpus tag(P,A) is true iff the word at position P in the corpus is tagged A tag(A,B,P) is true iff the word at P is tagged A and the correct tag for the word at P is B Although this representation may seem a bit redundant, it provides exactly the kind of indexing into the data that is needed3 A decent Prolog system can deal with millions of such clauses. 1 Assuming a Prolog with first argument indexing. The #-TBL systems are implemented in SICStus Prolog. null The object of TBL is to learn an ordered sequence of transformation rules. Such rules dictate when - based on the context - a word should have its tag changed. An example would be &quot;replace tag vb with nn if the word immediately to the left has a tag dr.&quot; Here is how this rule is represented in the #-TBL rule/template formalism: tag:vb>nn <- tag:dr@\[-1\].</Paragraph> <Paragraph position="1"> Conditions may refer to different features, and complex conditions may be composed from simpler ones. For example, here is a rule saying &quot;replace tag rb with j j, if the current word is &quot;only&quot;, and if one of the previous two tags is dr.&quot;: tag:rb>jj <- wd:only@\[O\] ~ tag:dt~\[-l,-2\].</Paragraph> <Paragraph position="2"> Rules that can be learned in TBL are instances of templates, such as &quot;replace tag A with B if the word immediately to the left has tag C&quot;, where A,</Paragraph> <Paragraph position="4"> Positive and negative instances of rules that are instances of this template can be generated by means of the following clauses:</Paragraph> <Paragraph position="6"> Tied to each template is also a procedure that will apply rules that are instances of the template:</Paragraph> <Paragraph position="8"/> </Section> <Section position="4" start_page="0" end_page="279" type="metho"> <SectionTitle> 3 The #-TBL Template Compiler </SectionTitle> <Paragraph position="0"> To write clauses such as the above by hand for large sets of templates would be tedious and prone to errors. Instead, Prolog's term expansion facility, and a couple of DCG rules, can be used to compile templates into Prolog code, as follows:</Paragraph> <Paragraph position="2"/> </Section> <Section position="5" start_page="279" end_page="279" type="metho"> <SectionTitle> 4 The #-TBL Lite Learner </SectionTitle> <Paragraph position="0"> Given corpus data, compiled templates, and a value for Threshold, the predicate tbl/1 implements the /~-TBL main loop, and writes a sequence of rules to the screen: The call to the setof-bagof combination generates a frequency listing of all positive instances of all templates, based on which the call to bestof/4 then selects the rule with the highest score, tbl/1 terminates if the score for that rule is less than the threshold, else it applies the rule and goes on to learn more rules from there.</Paragraph> <Paragraph position="1"> To compute the rule with the highest score, bestof/4 traverses the frequency listing, keeping track of a leading rule and its score. The score of a rule is calculated as the difference between the number of its positive instances and its negative instances. When the list of rules is empty or the number of positive instances of the most frequent rule in what remains of the list is less than the leading rules score, the leader is declared winner. The following procedure implements the counting of negative instances in an efficient way: The learner was benchmarked on a 250Mhz Sun Ultra Enterprise 3000, training on Swedish corpora of three different sizes, with 23 different tags, and the 26 templates that Brill uses in his context-rule learner 2. In each case, the accuracy of the resulting sequence of rules was measured on a test corpus consisting of 40k words, with an initial-state accuracy of 93.3~. The following table summarizes the results: context-rule learner 90 minutes, 185 minutes, and 560 minutes, respectively, to train on these corpora, producing similar sequences of rules. Thus #-TBL Lite is an order of magnitude faster than Brill's learner. The full #-TBL system presented in (Lager, 1999) is even faster, uses less memory, and is in certain respects more general. Small is beautiful, however, and the light version may also have a greater pedagogical value. Both versions can be downloaded from http ://www. ling. gu. se/~lager/mutbl, html.</Paragraph> </Section> class="xml-element"></Paper>