File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1042_concl.xml

Size: 2,178 bytes

Last Modified: 2025-10-06 13:55:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1042">
  <Title>Error mining in parsing results</Title>
  <Section position="7" start_page="335" end_page="335" type="concl">
    <SectionTitle>
5 Conclusions and perspectives
</SectionTitle>
    <Paragraph position="0"> As we have shown, parsing large corpora allows to set up error mining techniques, so as to identify missing and erroneous information in the different resources that are used by full-featured parsing systems. The technique described in this paper and its implementation on forms and form bi-grams has already allowed us to detect many errors and omissions in the Lefff lexicon, to point out inappropriate behaviors of the SXPipe pre-syntactic processing chain, and to reveal the lack of coverage of the grammars for certain phenomena.</Paragraph>
    <Paragraph position="1"> We intend to carry on and extend this work.</Paragraph>
    <Paragraph position="2"> First of all, the visualization environment can be enhanced, as is the case for the implementation of the algorithm itself.</Paragraph>
    <Paragraph position="3"> We would also like to integrate to the model the possibility that facts taken into account (today, forms and form bigrams) are not necessarily certain, because some of them could be the consequence of an ambiguity. For example, for a given form, several lemmas are often possible.</Paragraph>
    <Paragraph position="4"> The probabilization of these lemmas would thus allow to look for most suspicious lemmas.</Paragraph>
    <Paragraph position="5"> We are already working on a module that will allow not only to detect errors, for example in the lexicon, but also to propose a correction. To achieve this, we want to parse anew all non-parsable sentences, after having replaced their main suspects by a special form that receives under-specified lexical information. These information can be either very general, or can be computed by appropriate generalization patterns applied on the information associated by the lexicon with the original form. A statistical study of the new parsing results will make it possible to propose corrections concerning the involved forms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML