File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1016_intro.xml

Size: 5,787 bytes

Last Modified: 2025-10-06 14:06:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1016">
  <Title>Cascaded Markov Models</Title>
  <Section position="2" start_page="0" end_page="118" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Partial parsing, often referred to as chunking, is used as a pre-processing step before deep analysis or as shallow processing for applications like information retrieval, messsage extraction and text summarization. Chunking concentrates on constructs that can be recognized with a high degree of certainty. For several applications, this type of information with high accuracy is more valuable than deep analysis with lower accuracy.</Paragraph>
    <Paragraph position="1"> We will present a new approach to partial parsing that uses Markov Models. The presented models are extensions of the part-of-speech tagging technique and are capable of emitting structure. They utilize context-free grammar rules and add left-to-right transitional context information.</Paragraph>
    <Paragraph position="2"> This type of model is used to facilitate the syntactic annotation of the NEGRA corpus of German newspaper texts (Skut et al., 1997).</Paragraph>
    <Paragraph position="3"> Part-of-speech tagging is the assignment of syntactic categories (tags) to words that occur in the processed text. Among others, this task is efficiently solved with Markov Models. States of a Markov Model represent syntactic categories (or tuples of syntactic categories), and outputs represent words and punctuation (Church, 1988; DeRose, 1988, and others). This technique of statistical part-of-speech tagging operates very successfully, and usually accuracy rates between 96 and 97% are reported for new, unseen text.</Paragraph>
    <Paragraph position="4"> Brants et al. (1997) showed that the technique of statistical tagging can be shifted to the next level of syntactic processing and is capable of assigning grammatical functions. These are functions like subject, direct object, head, etc. They mark the function of a child node within its parent phrase.</Paragraph>
    <Paragraph position="5"> Figure 1 shows an example sentence and its structure. The terminal sequence is complemented by tags (Stuttgart-Tiibingen-Tagset, Thielen and Schiller, 1995). Non-terminal nodes are labeled with phrase categories, edges are labeled with grammatical functions (NEGRA tagset).</Paragraph>
    <Paragraph position="6"> In this paper, we will show that Markov Models are not restricted to the labeling task (i.e., the assignment of part-of-speech labels, phrase labels, or labels for grammatical functions), but are also capable of generating structural elements. We will use cascades of Markov Models. Starting with the part-of-speech layer, each layer of the resulting structure is represented by its own Markov Model. A lower layer passes its output as input to the next higher layer. The output of a layer can be ambiguous and it is complemented by a probability distribution for the alternatives.</Paragraph>
    <Paragraph position="7"> This type of parsing is inspired by finite state cascades which are presented by several authors.</Paragraph>
    <Paragraph position="8"> CASS (Abney, 1991; Abney, 1996) is a partial parser that recognizes non-recursive basic phrases (chunks) with finite state transducers. Each transducer emits a single best analysis (a longest match) that serves as input for the transducer at the next higher level. CASS needs a special grammar for which rules are manually coded. Each layer creates a particular subset of phrase types.</Paragraph>
    <Paragraph position="9"> FASTUS (Appelt et al., 1993) is heavily based on pattern matching. Each pattern is associated with one or more trigger words. It uses a series of non-deterministic finite-state transducers to build chunks; the output of one transducer is passed  parts-of-speech), non-terminal nodes (phrases) and edges (labeled with grammatical functions). as input to the next transducer. (Roche, 1994) uses the fix point of a finite-state transducer. The transducer is iteratively applied to its own output until it remains identical to the input. The method is successfully used for efficient processing with large grammars. (Cardie and Pierce, 1998) present an approach to chunking based on a mixture of finite state and context-free techniques.</Paragraph>
    <Paragraph position="10"> They use N P rules of a pruned treebank grammar.</Paragraph>
    <Paragraph position="11"> For processing, each point of a text is matched against the treebank rules and the longest match is chosen. Cascades of automata and transducers can also be found in speech processing, see e.g.</Paragraph>
    <Paragraph position="12"> (Pereira et al., 1994; Mohri, 1997).</Paragraph>
    <Paragraph position="13"> Contrary to finite-state transducers, Cascaded Markov Models exploit probabilities when processing layers of a syntactic structure. They do not generate longest matches but most-probable sequences. Furthermore, a higher layer sees different alternatives and their probabilities for the same span. It can choose a lower ranked alternative if it fits better into the context of the higher layer. An additional advantage is that Cascaded Markov Models do not need a &amp;quot;stratified&amp;quot; grammar (i.e., each layer encodes a disjoint subset of phrases). Instead the system can be immediately trained on existing treebank data.</Paragraph>
    <Paragraph position="14"> The rest of this paper is structured as follows.</Paragraph>
    <Paragraph position="15"> Section 2 addresses the encoding of parsing processes as Markov Models. Section 3 presents Cascaded Markov Models. Section 4 reports on the evaluation of Cascaded Markov Models using tree-bank data. Finally, section 5 will give conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML