File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/c96-1096_concl.xml
Size: 3,050 bytes
Last Modified: 2025-10-06 13:57:32
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1096"> <Title>Disambiguation of morphological analysis in Bantu languages</Title> <Section position="7" start_page="570" end_page="572" type="concl"> <SectionTitle> 5 Success rate and remaining </SectionTitle> <Paragraph position="0"> problems of disambiguation The CGP of Swahili was tested with two text corpora, which had not been used as test material in writing rules: E. Kezilahabi's novel Mzingile (22,984 word-form tokens), and a collection of newspaper texts from the weekly paper Mzalendo, 1994 (49,969 word-form tokens). Test results are in Table 3.</Paragraph> <Paragraph position="2"> = ambiguity in tokens, amb-(w) = ambiguity in unique word-forms.</Paragraph> <Paragraph position="3"> The parser performed best with newspaper texts, leaving ambiguity to 4.9% of tokens. Yet the overall result has to be considered promising, given that the parser is still under development and that the rules are almost solely grammarbased. null The most common types of ambiguity still remaining are: noun vs. adverb, adjective vs. adverb, noun vs. conjunction, verb (imperative) vs. noun, and verb (infinitive) vs. noun. Those are typically in such positions in a sentence that writing of reliable rules is difficult. A fairly large part of remaining ambiguity concerns genitive connectors ya and wa, and possessive pronouns. They are generally in positions where the governing noun is beyond the current clause or sentence boundary on the left. For such cases, the rule syntax should allow the use of more distantly located information. null The vast majority of constraints are selection rules for resolving ambiguity based on homographic noun class agreement markers, lit is possible to resolve most of this ambiguity by using contextual information.</Paragraph> <Paragraph position="4"> Conclusion The morphologicM anMysis of SwMfili tends to produce a comparatively large number of ambiguous readings. The noun class structure coupled with class agreement marking in dependent constituents, contributes significantly to ambiguity. The phenomenon is particularly evident in verb structures, where different sets of noun class markers add to the ambiguity of the same verb-form. It is assumed that the solutions suggested here apply Mso to other Bantu languages.</Paragraph> <Paragraph position="5"> The ambiguity resolution is based on the Constraint Grammar formMism, which allows tile use of grammatically motivated rules. The maximal context in the present application is a sentence, but there is a need for extending it over sentence boundaries. ConstrMnt rules are grouped into sections, so thai; the most obvious cases are disambiguated first. A parser wiLt~ only grammar-based rules disambiguatcs M)out 95% or Swahili wordtbrms from running text, which initiMly has about 50% of the tokens ambiguous. The remaining ambiguity is hard to resolve fully safely, but probabilistic and hcnristic techniques are likely to still improve tile pertbrmance.</Paragraph> </Section> class="xml-element"></Paper>