File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-2024_intro.xml
Size: 1,098 bytes
Last Modified: 2025-10-06 14:03:27
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-2024"> <Title>A Suite of Shallow Processing Tools for Portuguese: LX-Suite</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The purpose of this paper is to present LX-Suite, a set of tools for the shallow processing of Portuguese, developed under theTagShare1 project by the NLX Group.2 The tools included in this suite are a sentence chunker; a tokenizer; a POStagger; a nominal featurizer; a nominal lemmatizer; and a verbal featurizer and lemmatizer.</Paragraph> <Paragraph position="1"> These tools were implemented as autonomous modules. This option allows to easily replace any of the modules by an updated version or even by a third-party tool. It also allows to use any of these tools separately, outside the pipeline of the suite. The evaluation results mentioned in the next sections have been obtained using an accurately hand-tagged 280,000 token corpus composed of newspaper articles and short novels.</Paragraph> </Section> class="xml-element"></Paper>