File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2139_metho.xml

Size: 13,009 bytes

Last Modified: 2025-10-06 14:07:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2139">
  <Title>ABL: Alignment-Based Learning</Title>
  <Section position="4" start_page="0" end_page="961" type="metho">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> I,e;wning metl,o(ls can t)e grouped into suitorvised and unsut)ervised nmthods. Sul)ervised methods are initial|seal with structured input (i.e. stru(:ture(\] sent(m(:es for grannnar learning methods), while mlsut)ervised methods learn l)y using mlstru(:tured data only.</Paragraph>
    <Paragraph position="1"> In 1)ractice, SUl)ervised methods outpertbrm mlsut)ervised methods, since they can adapt their output based on the structured exami)les in the initial|sat|on t)hase whereas unSUl)ervised lnethods emmet. However, it is worthwhile to investigate mlsupcrvised gramlnar learning methods, since &amp;quot;the costs of annotation are prohibitively time and ext)ertise intensive, and the resulting corpora may 1)e too suscet)tible to restri(:tion to a particular domain, apt)lication, or genre&amp;quot;. (Kehler and Stolcke, 1.999) There have 1)een several approaches to the unsupervised learning of syntactic structures. We will give a short overview here.</Paragraph>
    <Paragraph position="2">  Memory based learifing (MBL) keeps track of possible contexts and assigns word types based on that information (Daelemans, 1995). Redington et al. (1998) present a method that bootstraps syntactic categories using distributional information and Magerman and Marcus (1990) describe a method that finds constituent boundaries using mutual information values of the part of speech n-grams within a sentence.</Paragraph>
    <Paragraph position="3"> Algorithms that use the minimmn description length (MDL) principle build grammars that describe the input sentences using the minimal nunfl)er of bits. This idea stems from intbrnmtion theory. Examples of these systems can be found in (Grfinwald, 1994) and (de Marcken, 1996).</Paragraph>
    <Paragraph position="4"> The system by Wolff (1982) pertbrms a heuristic search while creating and Inerging symbols directed by an evaluation function.</Paragraph>
    <Paragraph position="5"> Chen (1.995) presents a Bayesian grammar induction method, which is tbllowed by a post-pass using the inside-outside algorithm (Baker, 1979; Lari and Young, 1990).</Paragraph>
    <Paragraph position="6"> Most work described here cmmot learn complex structures such as recursion, while other systems only use limited context to find constituents. However, the two phases in ABL are closely related to some previous work.</Paragraph>
    <Paragraph position="7"> Tim alignment learning phase is etlb.ctively a compression technique comparat)le to MDL or Bayesian grammar induction methods. ABL remembers all possible constituents, building a search space. The selection h;arning phase searches this space, directed by a probabilistic evaluation function.</Paragraph>
  </Section>
  <Section position="5" start_page="961" end_page="961" type="metho">
    <SectionTitle>
3 Algorithm
</SectionTitle>
    <Paragraph position="0"> We will describe an algorithm that learns structure using a corpus of plain (mlstructured) sentences. It does not need a structured training set to initialize, all structural information is gathered from the unstructured sentences.</Paragraph>
    <Paragraph position="1"> The output of the algorithm is a labelled, bracketed version of the inlmt corpus. Although the algorithm does not generate a (context-fl'ee) grammar, it is trivial to deduce one from the structured corpus.</Paragraph>
    <Paragraph position="2"> The algorithm builds on Harris's idea (1951) that states that constituents of the same type can be replaced by each oth, er. Consider the sen-Wh, at is a family fare Wh, at is th, e payload of an African Swallow Wh, at is &amp; family fare)x Wh, at is (the payload of an African Swallow)x  fences as shown in figure 1. 2 The constituents a .family fare and the payload of an African Swallow both have the same syntactic type (they are both NPs), so they can be replaced by each other. This means that when the constituent in the first sentence is replaced by the constituent in the second sentence, the result is a wflid sentence in the language; it is the second sentence. The main goal of the algorithm is to establish that a family .fare and the payload of art, African Swallow are constituents and have the same type. This is done by reversing Harris's idea: 'i1&amp;quot; (a group o.f) words car-,, be; replaced by each other, they are constituents and h.ave th, e same type. So the algorithm now has to find groups of words that can be replaced by each other and after replacement still generate valid sentences.</Paragraph>
    <Paragraph position="3"> The algorithm consists of two steps:</Paragraph>
  </Section>
  <Section position="6" start_page="961" end_page="963" type="metho">
    <SectionTitle>
1. Alignment Leanfing
2. Selection Learning
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="961" end_page="962" type="sub_section">
      <SectionTitle>
3.1 Alignment Learning
</SectionTitle>
      <Paragraph position="0"> The model learns by comparing all sentences in the intmt corpus to each other in pairs. An overview of the algorithm can be tbund in figure 2.</Paragraph>
      <Paragraph position="1"> Aligning sentences results in &amp;quot;linking&amp;quot; identical words in the sentences. Adjacent linked words are then grouped. This process reveals  .f,'o,,,. Sa,,. F,'a,.ci.,'co (to Dallas). ./'rout (Dallas to) |San Francisco 02 (Sa,, l.o) Dallas 02 O, DaUas #o Sa,,. J';'a,.cisco)2 * \[;1&amp;quot;0 ~II, .fF()Ii't (San Francisco), to (Dallas)2 (Dalla.gj to (Sa,,.</Paragraph>
      <Paragraph position="2">  1;t1(; groul)S of identical words, 1)ut it also llIlC()vers the groups of distinct wor(ls in the sentences. In figure 1 What is is the identical part of the sentences and a fam, ily J'a~v, and the payload of an A./ricau, Swallow are the distinct l)arts. The distinct parts are interchangeable, so they are (tetermilmd to 1)e constituents o17 the same I;yl)e. We will now Cxl)lain the stel)s in the alignmen |learning i)hase in more de, tail.  q\[b find the identi(:al word grouI)S in |;he sentences, we use the edit; distan(:e algorithm by Wagner and Fischer (197d:), which finds the minimum nmnl)er of edit operations (insertion, (lelei;ion and sul)stii;ul;ion) l;o change one sente, nce into the other, ld(mti(:al wor(ts in the sent(races can 1)e t'(mnd at \])\]a(;es W\]l(~,l'e lie edit operation was al)plied.</Paragraph>
      <Paragraph position="3"> The insl;antia,tiol~ of the algoril;hm that fin(is l;}le longest COllllllOll Slll)S(}(\]ll(}ll(;( ~, ill two Selltences sometimes &amp;quot;links&amp;quot; words that are, too far apart, in figure 3 when: 1)esides the o(:cm'rences of.from,, the ocem:rences of San }4&amp;quot;au, ci.sco or Dallas are linked, this results in unintended constituents. We woukt r;d;her have the lnodel linking to, resulting in a sl;1&amp;quot;u(;I;llre with the 1101111 phrases groul)ed with the same type corre(:tly. Linking San Francisco or Dallas results i~l constituents that vary widely in size. This stems from the large distance between the linked urords in the tirsi; sentence mid in th(; s(:cond sentence. This type of alignlnent can t)e ruled out by biasing the cost fimction using distances between words.</Paragraph>
      <Paragraph position="4">  An edit distance algorithm links identical words in two sentences. When adjacent wor(ls are linked in l)oth sentences, they can l)e grouped. A groul) like this is a part of a senten(:e that can also be tbmM in the other sentence. (In figure 1, What is is a group like this.) The rest of the sentences can also be grouped. The words in these grout)s arm words that are distinct in the two sentences. When all of these groups fl:om sentence, one would 1)e relflaced by the respective groups of sentence two, sentence two is generated. (a family fare and th, c payload of an African Swallow art: of this type of group in figure 1.) Each pair of these distinct groups consists of possilfle constil;uents Of the same type. :~ As can be, seen in tigure 3, it is possible that empty groups can lm learned.</Paragraph>
      <Paragraph position="5"> a.l.a Existing Constituents At seine 1)oint it may be t)ossible that the model lem'ns a co11stituent that was already stored. This may hal)l)en when a new sentence is compared to a senlaen(;e in the partially structured corpus. In this case,, no new tyl)e, is intro(hu:ed~ lint the, consti|;ucnl; in l;he new sentence gel;s l;he same type of the constituent in the sentence in the partially structm:ed corpus.</Paragraph>
      <Paragraph position="6"> It may even t)e the case that a partially si;ructured sentence is compared to another partially sl;rtlctllre(1 selll;elR,e. This occm:s whel~ a s(:nfence that (;onl;ains some sl;ructure, which was learner1 1)y COlnl)aring to a sentelme in the part;\]ally structure(l (;Ol;pllS~ is (;Olllt)ar(~,(\] 1;o allother (t)art;ially stru(:ture(t) sente, n(:e. When the ('omparison of these two se, nl;ence, s yields a constituent thai: was ah:ea(ly t)resent in both senten(:es, the tyl)es of these constitueld;S are merged. All constituents of these types are ut)dated, so the, y have the same tyl)e.</Paragraph>
      <Paragraph position="7"> By merging tyl)es of constituents we make t;he assuml)tion that co\]lstil;uents in a (:ertain context can only have one tyl)e. In section 5.2 we discuss the, imt)li(:atiolls of this assmnpl;ion and propose an alternative at)t)roach.</Paragraph>
    </Section>
    <Section position="2" start_page="962" end_page="963" type="sub_section">
      <SectionTitle>
3.2 Selection Learning
</SectionTitle>
      <Paragraph position="0"> The first step in the algorithm may at some point generate COllstituents that overlap with other constituents, hi figure 4 Give me all flights .from Dallas to Boston receives two overlal)ping structures. One constituent is learned 3Since the alger||Inn does not know any (linguist;|c) llalIICS for the types, the alger|finn chooses natural numbers to denote different types.</Paragraph>
      <Paragraph position="1">  by comparing against Book Delta 128 firm Dallas to Boston and the other (overlapl)ing) constituent is tbund by aligning with Give me help on classes.</Paragraph>
      <Paragraph position="2"> The solution to this problem has to do with selecting the correct constituents (or at least the better constituents) out of the possible constitnents. Selecting constituents can be done in several dittbrent ways.</Paragraph>
      <Paragraph position="3"> ABL:incr Assume that the first constituent learned is the correct one. This means that when a new constituent overlaps with older constituents, it can 1)e ignored (i.e. they are not stored in the cortms).</Paragraph>
      <Paragraph position="4"> ABL:leaf The model corot)rites the probability of a constituent counting the nmnber of times the particular words of the constituent have occurred in the learned text as a constituent, normalized by the total number of constituents.</Paragraph>
      <Paragraph position="6"> where C is the entire set: of constituents.</Paragraph>
      <Paragraph position="7"> ABL:braneh In addition to the words of the sentence delimited by the constituent, the model computes the probability based on the part of the sentence delimited by the words of the constituent and its non-terminal (i.e.</Paragraph>
      <Paragraph position="8"> a normalised probability of ABL:leaf).</Paragraph>
      <Paragraph position="10"> The first method is non-probabilistic and may be applied every time a constituent is found that overlaps with a known constituent (i.e. while learning).</Paragraph>
      <Paragraph position="11"> The two other methods are probabilistic. The model computes the probability of the constituents and then uses that probability to select constituents with the highest probability. These methods are ~pplied afl;er the aligmnent learning phase, since more specific informatioil (in the form of 1)etter counts) can be found at that time.</Paragraph>
      <Paragraph position="12"> In section 4 we will ewfluate all three methods on the ATIS and OVIS corpus.</Paragraph>
      <Paragraph position="13">  Since more than just two constituents can overlap, all possible combinations of overlapping constitueni;s should be considered when com-Imting the best combination of constituents, which is the product of the probabilities of the separate constituents as in SCFGs (cf. (Booth, 1969)). A Viterbi style algorithm optimization (1967) is used to etficiently select the best combination of constituents.</Paragraph>
      <Paragraph position="14"> When conll)uting the t)r()t)ability of a combination of constituents, multiplying the separate probabilities of the constituents biases towards a low nnmber of constituents. Theretbre, we comtmte the probability of a set of constituents using a normalized version, the geometric mean 4, rather than its product. (Caraballo and Charniak, 1998)</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML