File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-0521_intro.xml

Size: 1,668 bytes

Last Modified: 2025-10-06 14:01:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0521">
  <Title>Corpus Variation and Parser Performance</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Previous Comparisons of Corpora
</SectionTitle>
    <Paragraph position="0"> A great deal of work has been done outside of the parsingcommunityanalyzingthe variationsbetween corpora and di#0Berent genres of text. Biber #281993#29 investigated variation in a number syntactic features over genres, or registers, of language. Of particular importance to statistical parsers is the investigation of frequencies for verb subcategorizations suchasRolandandJurafsky #281998#29. Roland et al. #282000#29 #0Cnd that subcategorization frequencies for certain verbs vary signi#0Ccantly between the Wall Street Journal corpus and the mixed-genre Brown corpus, but that they vary less so between genre-balanced British and American corpora. Argument structure is essentially the task that automatic parsers attempt to solve, and the frequencies of various structures in training data are re#0Dected in a statistical parser's probability model. The variation in verb argument structure found by previous research caused us to wonder to what extent a model trained on one corpus would be useful in parsing another. The probability models of modern parsers include not only the number and syntactic type of a word's arguments, but lexical information about their #0Cllers. Although wearenotaware of previous comparisons of the frequencies of argument #0Cllers, we can only assume that they vary at least as much as the syntactic subcategorization frames.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML