File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2001_intro.xml
Size: 3,874 bytes
Last Modified: 2025-10-06 14:04:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2001"> <Title>Multilingual Extension of a Temporal Expression Normalizer using Annotated Corpora</Title> <Section position="2" start_page="0" end_page="1" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Recently, the Natural Language Processing community has become more and more interested in developing language independent systems, in the effort of breaking the language barrier hampering their application in real use scenarios. Such a strong interest in multilinguality is demonstrated by the growing number of ernment (contract TIC2003-07158-C04-01) international conferences and initiatives placing systems' multilingual/cross-language capabilities among the hottest research topics, such as the European Cross-Language Evaluation Forum2 (CLEF), a successful evaluation campaign which aims at fostering research in different areas of multilingual information retrieval. At the same time, in the temporal expressions recognition and normalization field, systems featuring multilingual capabilities have been proposed. Among others, (Moia, 2001; Wilson et al., 2001; Negri and Marseglia, 2004) emphasized the potentialities of such applications for different information retrieval related tasks.</Paragraph> <Paragraph position="1"> As many other NLP areas, research in automated temporal reasoning has recently seen the emergence of machine learning approaches trying to overcome the difficulties of extending a language model to other languages (Carpenter, 2004; Ittycheriah et al., 2003). In this direction, the outcomes of the first Time Expression Recognition and Normalization Workshop (TERN 20043) provide a clear indication of the state of the field. In spite of the good results obtained in the recognition task, normalization by means of machine learning techniques still shows relatively poor results with respect to rule-based approaches, and still remains an unresolved problem.</Paragraph> <Paragraph position="2"> The difficulty of porting systems to new languages (or domains) affects both rule-based and machine learning approaches. With rule-based approaches (Schilder and Habel, 2001; Filatova and Hovy, 2001), the main problems are related to the fact that the porting process requires rewriting from scratch, or adapting to each new language, large numbers of rules, which is costly and time- null consuming work. Machine learning approaches (Setzer and Gaizauskas, 2002; Katz and Arosio, 2001), on the other hand, can be extended with little human intervention through the use of language corpora. However, the large annotated corpora that are necessary to obtain high performance are not always available. In this paper we describe a new procedure to build temporal models for new languages, starting from previously defined ones.</Paragraph> <Paragraph position="3"> While still adhering to the rule-based paradigm, its main contribution is the proposal of a simple, but effective, methodology to automate the porting of a system from one language to another. In this procedure, we take advantage of the architecture of an existing system developed for Spanish (TERSEO, see (Saquete et al., 2005)), where the recognition model is language-dependent but the normalizing procedure is completely independent. In this way, the approach is capable of automatically learning the recognition model by adjusting the set of normalization rules.</Paragraph> <Paragraph position="4"> The paper is structured as follows: Section 2 provides a short overview of TERSEO; Section 3 describes the automatic extension of the system to Italian; Section 4 presents the results of our evaluation experiments, comparing the performance of Ita-TERSEO (i.e. our extended system) with the performance of a state of the art system for Italian.</Paragraph> </Section> class="xml-element"></Paper>