XML Viewer - c00-2089

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2089_intro.xml
Size: 5,512 bytes
Last Modified: 2025-10-06 14:00:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2089">
  <Title>Tagging and Chunking with Bigrams</Title>
  <Section position="3" start_page="615" end_page="616" type="intro">
    <SectionTitle>
2 General Description of our
</SectionTitle>
    <Paragraph position="0"> Integrated approach to Tagging and Chunking We propose an integrated system (Figure 1) that combines different knowledge sources (lexical probabilities, LM for chunks and Contextual LM tbr the sentences) in order to obtain the corresponding sequence of POS tags and the shallow parsing (\[su WllC~W.~/c~ su\] W.~lC~ ... \[su W, lC,, su\]) from a certain input string (1'I:1,IY=.2, ...,I/l:n). Our system is a transducer composed by two levels: the upper one represents the Contextual LM for tile sentences, and the lower one modelize the chunks considered. The formalism that we have used in all levels are finite-state automata. To be exact, we have used models of bigrmns which are smoothed using the backoff technique (Katz, 1987) in order to achieve flfll coverage of the language. The bigrams LMs (bigram probabilities) was obtained by means of the SLM TOOLKIT (Clarksond and Ronsenfeld,  1997) from tile sequences of categories in the training set. Then, they have been rei)resented like finite-state automata.</Paragraph>
    <Section position="1" start_page="615" end_page="616" type="sub_section">
      <SectionTitle>
2.1 The learning phase.
</SectionTitle>
      <Paragraph position="0"> The models have been estimated from labelled and bracketed corpora. The training set is composed by sentences like: \[su w,/c,w.,/c., su\] w~/c~ ... \[su ~,~:,~/c,~ su\] ./. where Wi are the words, Ci are part-of-speech tags and SU are tile chunks considered.</Paragraph>
      <Paragraph position="1"> Tile models learnt are: * Contextual LM: it is a smoothed bigram model learnt from tile sequences of part-of speech tags (Ci) and chunk descrit)tors (XU) present in the training corpus (see Figure 2a).</Paragraph>
      <Paragraph position="2"> * Models for the chunks: they are smoothed bi-gram models learnt fl'om the sequences of part-of-speech tags eorrest)onding to each chunk of the training corpus (see Figure 2b).</Paragraph>
      <Paragraph position="3"> * Lexical Probabilities: they are estilnated from the word frequencies, tile tag frequencies and the word per tag frequencies. A tag dictionary is used which is built from the full corpus which gives us the possible lexical categories (POS tags) for each word; this is equivalent to having an ideal morphological analyzer. The probabilities for each possible tag are assigned from this information taking into account the obtained statistics. Due to the fact that the word cannot have been seen at training, or it has only been seen in some of the possible categories, it is compulsory to apply a smoothing mechanism. In our case, if the word has not previously been seen~ the same probability is assigned to all the categories given by the dietionary; if it has been seen, but not in all the  (b) LM for Chunks</Paragraph>
      <Paragraph position="5"> categories, the smoothing called &amp;quot;add one&amp;quot; is applied. Afterwards, a renormalization process is carried out.</Paragraph>
      <Paragraph position="6"> Once the LMs have been learnt, a regular substitution of the lower model(s) into the upper one is made. In this way, we get a single Illtegrated LM which shows the possible concatenations of lexical tags and syntactical uuits, with their own transition probabilities which also include the lexical probabilities ms well (see Figure 2c). Not(', that the models in Figure 2 are not smoothed).</Paragraph>
    </Section>
    <Section position="2" start_page="616" end_page="616" type="sub_section">
      <SectionTitle>
2.2 The Decoding Process: Wagging and
Parsing
</SectionTitle>
      <Paragraph position="0"> The tagging and shallow parsing process consists of finding out the sequence of states of maximum 1)robability on the Integrated LM tor an input sentence.</Paragraph>
      <Paragraph position="1"> Therefore, this sequence must be compatible with the contextual, syntactical and lexical constraints.</Paragraph>
      <Paragraph position="2"> This process can be carried out by Dynamic Progt'ammiitg using the Viterbi algorithm, which is conveniently modified to allow for (;ransitions between certain states of the autotnata without consmning any symbols (epsilon l;ransitious). A portion of the Dynamic Progranmfing trellis for a generic sentence using the Integrated LM shown in Figure 2c can be seen in Figure 3. The states of the automata that can be reached and that are compatible with the lexical constraints are marked with a black circle (i.e., fl'om the state Ck it is possible to reach the state Ci if the transition is in the automata and the lexical probability P(Wi\[Ci) is not null). Also, the transitions to initial and final states of the models for chunks (i.e., fl'om Ci to &lt; SU &gt;) are allowed; these states are marked in Figure 3 with a white circle and in this case no symbol is consumed. Ill all these cases, the transitions to initial and final produce transitions to their successors (the dotted lines in Figure 3) where now symbols must be consumed.</Paragraph>
      <Paragraph position="3"> Once the Dynamic Programing trellis is built, we can obtain the maximum probability path for the input sentence, and thus the best sequence of lexical tags and the best segmentation in chunks.</Paragraph>
      <Paragraph position="5"> based oil tile Integrated LM.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML