File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-2019_intro.xml
Size: 3,533 bytes
Last Modified: 2025-10-06 14:01:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-2019"> <Title>Markov models for language-independent named entity recognition</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 3 Results </SectionTitle> <Paragraph position="0"> Each of the models described in the previous section were trained using a57a6a92a94a93a74a95a22a96a40a97a14a59a6a98a94a55 and evaluated on a57a6a92a94a93a74a95a22a96a41a57a6a92a62a96a14a59 . The results are summarized in Table 1.</Paragraph> <Paragraph position="1"> As would be expected, a72a74a73a75a73 performs substantially better than a60a62a61a64a63a66a65a68a67a70a69a52a71a64a65 for every category but locations, though earlier cross-validation experiments suggest that this exception is an accident of the particular split between training and test data.</Paragraph> <Paragraph position="2"> Perhaps more surprisingly, a73a77a76 outperforms a72a74a73a75a73 by an even wider margin. In these two models, the tag probabilities are conditioned on exactly the same properties of the contexts. The only difference between the models is that the probabilities in a73a77a76 are estimated in a way which avoids the independence assumption in (2). The poor performance of a72a99a73a75a73 suggests that this assumption is highly problematic.</Paragraph> <Paragraph position="3"> Adding additional features, in a73a77a76a6a86 and a73a77a76a6a86a88a87 , offer further gains over the base model. However, the addition of a database of first names, in a73a77a76a6a86a88a87a100a90 , only slightly improves the performance on personal names and actually reduces the overall performance. This is likely due to the fact that the list of names contains many words which can also be used as locations and organizations. Perhaps the use of additional databases of geographic and nonpersonal names would help counteract this effect.</Paragraph> <Paragraph position="4"> For the final results, the model which preformed the best on the evaluation data, a73a77a76a6a86a88a87 , was trained on a57a11a92a94a93a74a95a22a96a36a97a41a59a6a98a94a55 and evaluated with a57a6a92a94a93a74a95a22a96a41a57a6a92a62a96a14a59 and a57a11a92a101a93a100a95a102a96a41a57a11a92a64a96a8a103 , and trained on a55a6a57a36a104a105a95a22a96a40a97a14a59a6a98a94a55 and evaluated with a55a6a57a36a104a105a95a22a96a41a57a6a92a62a96a14a59 and a55a6a57a36a104a106a95a102a96a41a57a6a92a62a96a8a103 . Before training, the part of speech tags were removed from a55a11a57a8a104a106a95a102a96a40a97a41a59a41a98a64a55 , to allow a more direct cross-language comparison of the performance of a73a77a76a6a86a88a87 .</Paragraph> <Paragraph position="5"> The results of the final evaluation are given in Table 2. The performance of the model is roughly the same for both test samples of each language, though the performance differs somewhat between the two languages. In particular, the performance on a46a108a38a34a47a8a45 entities is quite a bit better for Dutch than it is for Spanish, and the performance on a33a36a35a36a37 entities is quite a bit better for Spanish than it is for Dutch. These differences are somewhat surprising, as nothing in the model is language specific. Perhaps the discrepancy (especially for the a46a108a38a39a47a36a45 class) reflects differences in the way the training data was annotated; a46a108a38a39a47a36a45 is a highly heterogenous class, and the criteria for distinguishing between a46a12a38a34a47a36a45 and a42 entities is sometimes unclear.</Paragraph> </Section> class="xml-element"></Paper>