File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1015_intro.xml

Size: 2,295 bytes

Last Modified: 2025-10-06 14:01:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1015">
  <Title>Combining Deep and Shallow Approaches in Parsing German</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In several areas of Natural Language Processing, a combination of different approaches has been found to give the best results. It is especially rewarding to combine deep and shallow systems, where the former guarantees interpretability and high precision and the latter provides robustness and high recall.</Paragraph>
    <Paragraph position="1"> This paper investigates such a combination consisting of an n-gram based shallow parser and a cascaded finite-state parser1 with hand-crafted grammar and morphological checking. The respective strengths and weaknesses of these approaches are brought to light in an in-depth evaluation on a tree-bank of German newspaper texts (Skut et al., 1997) containing ca. 340,000 tokens in 19,546 sentences.</Paragraph>
    <Paragraph position="2"> The evaluation format chosen (dependency tuples) is used as the common denominator of the systems 1Although not everyone would agree that finite-state parsers constitute a 'deep' approach to parsing, they still are knowledge-based, require efforts of grammar-writing, a complex linguistic lexicon, manage without training data, etc. in building a hybrid parser with improved performance. An underspecification scheme allows the finite-state parser partially ambiguous output. It is shown that the other parser can in most cases successfully disambiguate such information.</Paragraph>
    <Paragraph position="3"> Section 2 discusses the evaluation format adopted (dependency structures), its advantages, but also some of its controversial points. Section 3 formulates a classification problem on the basis of the evaluation format and applies a machine learner to it. Section 4 describes the architecture of the cascaded finite-state parser and its output in a novel underspecification format. Section 5 explores several combination strategies and tests them on several variants of the two base components. Section 6 provides an in-depth evaluation of the component systems and the hybrid parser. Section 7 concludes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML