File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1205_intro.xml
Size: 1,453 bytes
Last Modified: 2025-10-06 14:01:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1205"> <Title>Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> The Penn Treebank (Marcus et al.</Paragraph> <Paragraph position="1"> 1993) initiated a new paradigm in corpus-based research. The English.</Paragraph> <Paragraph position="2"> Penn Treebank has enabled and motivated corpus and computational linguistic research based on information extractable from structurally annotated corpora. Recently, the research has focused on the following two issues: first, when and how can a structurally annotated corpus of language X be built? Second, what information should or can be annotated? A good sample of issues in these two directions can be found in the papers collected in Abeille (1999).</Paragraph> <Paragraph position="3"> The construction of the Sinica Treebank deals with both issues. First, it is one of the first structurally annotated corpora in Mandarin Chinese. Second, as a design feature, the Sinica Treebank annotation includes thematic role information in addition to syntactic categories. In this paper, we will discuss the design criteria and annotation guidelines of the Sinica Treebank. We will also give a preliminary research result based on the Sinica Treebank.</Paragraph> </Section> class="xml-element"></Paper>