File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/w96-0101_concl.xml
Size: 1,465 bytes
Last Modified: 2025-10-06 13:57:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0101"> <Title>Using word class for Part-of-speech disambiguation</Title> <Section position="8" start_page="11" end_page="11" type="concl"> <SectionTitle> 8 Conclusion </SectionTitle> <Paragraph position="0"> We explored the morpho-syntactic ambiguities of a language, basing our experiments on French.</Paragraph> <Paragraph position="1"> Several ways to estimate lexical probabilities were discussed and a new paradigm, the genotype, was presented. This paradigm has the advantage to capture the morphological variation of words along with the frequency at which they occur. A methodology is presented in order to optimize the construction of a restricted training corpus for developing taggers. In order to disambiguate word part-of-speech with a small training corpus, genotypes turn out to be much easier to model than the words themselves. They offer a successful solution to the small training corpus problem as well as to the problem of data sparsness. Compared to lexical probabilities, they give much more reliable accounts, since only 429 genotypes need to be estimated instead of 10,696 words for lexical probabilities. Results are even more convincing when genotypes are used in context and bigrams and trigrams are applied to disambiguate. Additionally, they are used for smoothing which is a particularly important issue in the context of small training corpus.</Paragraph> </Section> class="xml-element"></Paper>