File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1324_intro.xml
Size: 3,334 bytes
Last Modified: 2025-10-06 14:01:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1324"> <Title>A query tool for syntactically annotated corpora*</Title> <Section position="3" start_page="190" end_page="191" type="intro"> <SectionTitle> 2 The Verbmobil treebanks </SectionTitle> <Paragraph position="0"> The German Verbmobil corpus (Stegmann et al., 1998; Hinrichs et al., 2000) is a tree-bank annotated at the University of Tiibingen that contains approx. 38.000 trees (or rather tree-like annotation structures since, as already mentioned, the structures are not always trees). The corpus consists of spoken texts restricted to the domain of arrangement of business appointments.</Paragraph> <Paragraph position="1"> The Verbmobil corpus is part-of-speech tagged using the Stuttgart Tiibingen tagset (STTS) described in (Schiller et al., 1995). One of the design decisions in Verbmobil was that for the purpose of reusability of the treebank, the annotation scheme should not reflect a commitment to a particular syntactic theory. Therefore a surface-oriented am notation scheme was adopted that is inspired by the notion of topological fields in the sense of (HShle, 1985). The discontinuous positioning of the verbal elements in verb-first and verb-second sentences (as in (1) for example) is the traditional reason to structure the German sentence by means of topological fields: The verbal elements have the categories LK ( linke Klammer) and VC (verbal complex), and roughly everything preceding the LK forms the 'voffeld' VF, everything between LK and vc forms the 'mittelfeld' MF and the 'nachfeld' NF follows the verbal complex.</Paragraph> <Paragraph position="2"> The Verbmobil corpus is annotated with syntactic categories as node labels, grammatical functions as edge labels and dependency relations. The syntactic categories are based on traditional phrase structure and on the theory of topological fields. In contrast to Negra or Penn Treebank, there are neither crossing branches nor empty categories. Instead, de- null pendency relations are expressed within the * grammatical functions (e.g. OA-MOD for a constituent modifying the accusative object).</Paragraph> <Paragraph position="3"> A sample annotation conformant to the Verbmobil annotation scheme is the annotation of (1) shown in Fig. 1. (The elements set in boxes are edge labels.) In order to search for structures as in Fig. 1, one needs to search for trees containing a node nl with label PX and grammatical function 0A-MOD, a node n2 with label VF that dominates nl, a node n3 with label MF and a node n4 with label NX and gra.mmatical function 0A that is immediately dominated by n3.</Paragraph> <Paragraph position="4"> Evaluating a query for structures as in Fig. 1 on the Verbmobil corpus gives results such as (2) that sound much more natural than the constructed example (1).</Paragraph> <Paragraph position="5"> * tja, fiber Flugverbindungen habe ich about flight connections have I (2) leider keine Information.</Paragraph> <Paragraph position="6"> unfortunately no information 'unfortunately I have no information about flight connections.' This example illustrates the usefulness of syntactic annotations for linguistic research * and it shows the need of query languages and query tools that allow access to these annotations. null</Paragraph> </Section> class="xml-element"></Paper>