File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/e95-1029_intro.xml
Size: 4,407 bytes
Last Modified: 2025-10-06 14:05:52
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1029"> <Title>Specifying a shallow grammatical representation for parsing purposes</Title> <Section position="3" start_page="0" end_page="210" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The central task of a parser is to assign grammatical descriptions onto input sentences. Evaluating a parser's output (as well as designing a computational lexicon and grammar) presupposes a predefined, parser-independent specification of the grammatical representation.</Paragraph> <Paragraph position="1"> Perhaps surprisingly, the possibility of specifying a workable grammatical representation is a matter of controversy, even at lower levels of analysis, e.g. morphology (incl. parts of speech).</Paragraph> <Paragraph position="2"> Consider the following setting (the double-blind experiment). Two linguists trained to apply a tag set to running text according to application guidelines (a &quot;style sheet&quot;) are to analyse a given data individually. The results are then automatically compared, and the differences are jointly examined by these linguists to see whether the differences are due to inattention, or whether they are intentional (i.e. there is a genuine difference in analysis). - How many percentage points of all words in running text are retain a different analysis after the differences due to inattention have been omitted? The higher this percentage, the more susceptible seems the possibility of specifying a workable grammatical representation.</Paragraph> <Paragraph position="3"> According to a pessimistic view (e.g. Church 1992), the part of speech of several percentage points of words in running text is impossible to agree on by different judges, even after negotiations. A more optimistic view can be found in (Leech and Eyes 1993, p. 39; Marcus et al. 1993, p. 328); they argue that a near-100% interjudge agreement is possible, provided the part-of-speech annotation is done carefully by experts. Unfortunately, they give very little empirical evidence for their position e.g. in terms of double-blind experiments. null Supposing defining these lower levels of grammatical representation is so problematic, the more distinctive levels should be even more difficult. If specifying the task of the parser - what the parser is supposed to do - turns out to be so problematic, one could even question the rationality of natural language parser design as a whole. In other words, the controversy regarding the specifiability of a grammatical representation is a fundamental issue.</Paragraph> <Paragraph position="4"> In this article we report on a double-blind experiment with a surface-oriented morphosyntactic grammatical representation used by a large-scale English parser. We show that defining a grammatical representation is possible, even relatively straightforward. We present results from part-of-speech annotation and shallow syntactic analysis. Our three main findings are: 1. A practically 100% interjudge agreement can be reached at the level of morphological (incl.</Paragraph> <Paragraph position="5"> part-of-speech) analysis provided that (i) the grammatical representation is based on structural distinctions and (ii) the individual descriptive practices of the most frequent 'problem cases' are properly documented.</Paragraph> <Paragraph position="6"> 2. A shallow dependency-oriented functional syntax can be defined, very much like a morphological representation. The only substan- null tial difference seems to be that somewhat more effort for documenting the individual solution is needed at the level of syntax.</Paragraph> <Paragraph position="7"> 3. A grammatical representation (morphosyntactic descriptors and their application guidelines) can be specified with a reasonable effort. In addition to general descriptive principles, only a few dozen construction-specific entries seem necessary for reaching a high coverage of running text.</Paragraph> <Paragraph position="8"> In short: In this paper we give empirical evidence for the possibility of specifying a grammatical representation in enough detail to make it (almost) consistently applicable. What we are less specific about here is the exact formal properties that make a representation easy to specify; this topic remains open for future investigation.</Paragraph> </Section> class="xml-element"></Paper>