File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-0904_intro.xml

Size: 2,774 bytes

Last Modified: 2025-10-06 14:01:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0904">
  <Title>Translating Treebank Annotation for Evaluation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> The most appropriate work to consider within this context is the grammar extraction literature. Perhaps the earliest example is the approach of Charniak (Charniak, 1996), who simply extracted a context-free grammar by reading off the production rules implied by the trees in the Penn Treebank. While not translating the formalism of the treebank, this has led to work extracting grammars of different formalisms.</Paragraph>
    <Paragraph position="1"> The majority of work is based on the most obvious extension of the Charniak approach, which is to extract subtree-based grammars e.g. the Data-Oriented Parsing (DOP) approach (Bod, 1995), or extracting Lexicalised Tree Adjoining Grammars (LTAGs), or more generally Lexicalised Tree Grammars (LTGs) (Neumann, 1998; Xia, 1999; Chen and Vijay-Shanker, 2000). Each approach involves a process that splits up the annotated trees in the treebank into a set of subtrees that define the grammar. These approaches still continue to work with the syntactic data in the same form as it is found in the corpora.</Paragraph>
    <Paragraph position="2"> A slightly different approach has been followed by Krotov et al (Krotov et al., 1998), where they extract the grammar from the Penn Treebank like Charniak, but then compact it. This provides a smaller grammar of similar quality to a grammar that has not been compacted, when a linguistically motivated compaction is used. However, the formalism remains unchanged. Similarly, Johnson (Johnson, 1998) modifies the labelling of the Penn Treebank, but remains within a CFG framework. null Hockenmaier et al (Hockenmaier et al., 2000), although to some extent following the approach of Xia (Xia, 1999) where LTAGs are extracted, have pursued an alternative by extracting Combinatory Categorial Grammar (CCG) (Steedman, 1993; Wood, 1993) lexicons from the Penn Treebank. In this case the data in the treebank is truly translated into another formalism providing an entire CCG annotation for the corpus based on a top-down algorithm. The lexicon is built by reading off the lexical assignments made for each tree. This is the most closely related work to this research, especially as it translates into a formalism very closely related to CG.</Paragraph>
    <Paragraph position="3"> The algorithm presented by Hockenmaier et al (Hockenmaier et al., 2000) has been used to build a top-down system against which to compare our data-driven system. The algorithms are both described in detail in Section 4.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML