File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3232_concl.xml
Size: 1,943 bytes
Last Modified: 2025-10-06 13:54:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3232"> <Title>Identifying Broken Plurals in Unvowelised Arabic Text</Title> <Section position="4" start_page="0" end_page="0" type="concl"> <SectionTitle> 6. Conclusion </SectionTitle> <Paragraph position="0"> We discussed several different methods for BP identification: simple BP matching, affix-based simple BP matching, simple BP matching+POS, manually-and-DT restricted, and dictionary-based.</Paragraph> <Paragraph position="1"> Although the simplest methods had poor or mediocre results, they were used to bootstrap better performing methods.</Paragraph> <Paragraph position="2"> The baseline, the simple BP matching method, has a high recall but a low precision (~14%). We attempted to improve the performance of the BP identification algorithm by (i) using affix information, (ii) identifying proper names, and (iii) restricting the BP patterns. Having implemented the simple and restricted methods, and used them to analyse all the BPs in a large corpus (A_Corpus2), made a dictionary approach possible. All methods were evaluated on a larger data set of 187,000 words. The results confirmed that the restricted method clearly improved the overall performance and the dictionary approach outperformed the other ones.</Paragraph> <Paragraph position="3"> We also developed a new light-stemming algorithm that conflates both regular and broken plurals with their singular forms. The new light-stemming algorithm was assessed in an information retrieval context, comparing its performance with other stemming algorithms. Our work provides evidence that identifying broken plurals results in an improved performance for information retrieval systems. We found that any form of stemming improves retrieval for Arabic; and that light-stemming with broken plural recognition outperforms standard light-stemming, rootstemming, and no form of stemming.</Paragraph> </Section> class="xml-element"></Paper>