File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/c02-1021_concl.xml

Size: 2,040 bytes

Last Modified: 2025-10-06 13:53:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1021">
  <Title>(Semi-)Automatic Detection of Errors in PoS-Tagged Corpora</Title>
  <Section position="4" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4. Conclusions
</SectionTitle>
    <Paragraph position="0"> The main contribution of this paper lies in the presentation of a method for detecting errors in part-of-speech tagged corpus which is both quite powerful (as to coverage of errors) and easy to apply, and hence it offers a relatively low-cost means for achieving high-quality PoS-tagged corpora. The main advantage is that the approach described is based on the combination of focussed search for errors of a particular, specific type with bootstrapping of the search, which makes it possible to detect errors even in a very large corpus where manual checking would not be feasible (at least in practice), since it requires passing through the whole of the text and paying attention to all kinds of possible violations - while the approach described concentrates on violations of particular phenomena on particular spots. Hence, it allows for straight-forward checking whether an error really occurrs - and if so, for a direct correction.</Paragraph>
    <Paragraph position="1"> As a side-effect, it should be also mentioned that the method allows not for detecting errors only, but also for detecting inconsistencies in hand-tagging (i.e. differences in application of a given tagging scheme by different human annotators and/or in different time), and even inconsistencies in the tagging guidelines. A particular issue is further the area of detecting and tagging idioms and collocations, in the particular case when these take a form which makes them deviate from the rules of standard syntax (i.e. they are detected as &amp;quot;suspect spots&amp;quot; by the method). For details on all these points, including the particular problems encountered in NEGRA(r), cf.</Paragraph>
    <Paragraph position="2"> Kva0 toa1 and Oliva (in prep.).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML