File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1027_intro.xml
Size: 1,679 bytes
Last Modified: 2025-10-06 14:01:19
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1027"> <Title>Shallow language processing architecture for Bulgarian</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The state of the art of today's full parsing and knowledge-based automatic analysis still falls short of providing a reliable processing framework for robust, real-world applications such as automatic abstracting or information extraction. The problem is especially acute for languages which do not benefit from a wide range of processing programs such as Bulgarian. There have been various projects which address different aspects of the automatic analysis in Bulgarian such as morphological analysis (Krushkov, 1997), (Simov et al., 1992), morphological disambiguation (Simov et al., 1992) and parsing (Avgustinova et al., 1989), but no previous work has pursued the development of a knowledge-poor, robust processing environment with a high level of component integrity. This paper reports the development and implementation of a robust architecture for language processing in Bulgarian referred to as LINGUA, which includes modules for POS tagging, sentence splitting, clause segmentation, parsing and anaphora resolution. Our text processing framework builds on the basis of considerably shallower linguistic analysis of the input, thus trading off depth of interpretation for breadth of coverage and workable, robust solution. LINGUA uses knowledge poor, heuristically based algorithms for language analysis, in this way getting round the lack of resources for Bulgarian.</Paragraph> </Section> class="xml-element"></Paper>