File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-1045_concl.xml
Size: 3,890 bytes
Last Modified: 2025-10-06 13:58:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1045"> <Title>Automatic Semantic Tagging of Unknown Proper Names</Title> <Section position="4" start_page="290" end_page="291" type="concl"> <SectionTitle> 4 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> Our current implementation of a PN analyzer still has a limited performance, caused by a variety of problems that range from unsatisfactory performance of state-of-art POS taggers in inflected languages, to limited availability of linguistic resources,in Italian, such as PN gazetteers.</Paragraph> <Paragraph position="1"> The algorithm that we propose has indeed the purpose of overcoming limitations of gazetteers and manually defined contextual rules for PN recognition. In (Cucchiarelli et al. 1998) we also show how to extend our method to incrementally update the initial gazzeteer.</Paragraph> <Paragraph position="2"> The performance of the proposed algorithm is more than satisfactory. A comparison with existing systems is difficult because in the literature global PN recognition performances are reported, without considering the semantic classification of unknowns as a subtask.</Paragraph> <Paragraph position="3"> The only exception is in (Wacholder et al, 1997) where the reported performance for the sole semantic disambiguation task of PNs is 79%. In that paper, however, semantic disambiguation is performed among a lower number of classes 5.</Paragraph> <Paragraph position="4"> The performance of our system is clearly affected by the dimension of the initial seed gazetteer and contextual rules. If the sets ESLA and ESLB are large enough, obviously more examples of similar contexts are found, even for unknown PNs with a single occurrence.</Paragraph> <Paragraph position="5"> In our test experiment, we always managed to find at least one or two similar contexts of an unknown PN, but in some cases they were misleading and caused a wrong classification, especially for Products.</Paragraph> <Paragraph position="6"> However, it may be possible to increase the evidence provided by the set ESLB by including contexts in which the words are 5One of the advantages of Information Gain is that, if widely adopted, this measure facilitates the comparison among learning methods with different complexity of the classification task. not strictly synonyms, but belong to the same semantic category.</Paragraph> <Paragraph position="7"> One such experiment requires a word taxonomy, like for example WordNet.</Paragraph> <Paragraph position="8"> WordNet is currently unavailable in Italian (the first known results of the EuroWordNet project are too preliminary), therefore we plan to reproduce our experiment in English.</Paragraph> <Paragraph position="9"> Another strategy to improve performances in absence of a substantial evidence is the definition of general (not contextual) rules to capture unknown complex nominals.</Paragraph> <Paragraph position="10"> For example, looking at the Product experiment in more detail, we found that product names are often formed by very complex nominals, e.g. Fiat- Marea Weekend 2000 (the name of a car model).</Paragraph> <Paragraph position="11"> Capturing complex nominals in absence of anchors and specific contextual rules (here the only anchor is Fiat, which appears in the gazetteer as an Organization name) may be difficult, and if a complex nominal is not captured as a unit, the resulting syntactic context may be misleading (e.g.</Paragraph> <Paragraph position="12"> N_ADJ(Fiat_Marea_Weekend, 2000)).</Paragraph> <Paragraph position="13"> We believe that finding class-independent heuristics for capturing complex nominals is a more &quot;general&quot; way of improving the performance of the method, rather than adding specific rules for specific entity types and enriching the gazetteer.</Paragraph> </Section> class="xml-element"></Paper>