File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0103_evalu.xml
Size: 12,454 bytes
Last Modified: 2025-10-06 13:58:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0103"> <Title>Semi-supervised learning of geographical gazetteers from the internet</Title> <Section position="5" start_page="1" end_page="1" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> We have run two experiments evaluating our approach.</Paragraph> <Paragraph position="1"> First, we used the system exactly as it was described above. In the second experiment, we tried to relax the requirement that training data should be fully classified.</Paragraph> <Paragraph position="2"> If possible, that would allow us to have a true knowledge-poor approach, because currently the only manually encoded knowledge in our system is the initial gazetteer -if the system can work without these data, it does not need any precompiled resources or human intervention while processing.</Paragraph> <Paragraph position="3"> Our system produces two types of resources: classifiers and world lists for each class separately. When the lists collected are big enough, one can compile them, obtaining a gazetteer. We evaluate mainly our classifiers using the accuracy measure. Recall, that the system outputs three classifiers: with the best recall, precision and overall accuracy. The latter one is taken for the evaluation. null We also want to estimate the quality of learned names lists. The measure, we are interested in, is the precision rather than the recall: when false positives manage to penetrate into the lists, the lexicon gets infected and the performance may decrease. Moreover, it is not clear, how to estimate the recall in our task, as we do not know the total number of names on the Internet for each class. It does not make much sense either, as the system produces more and more entities, and thus improves its own recall continuously. So, we simply took one of the lists (for ISLANDS) and checked all the items manually.</Paragraph> <Paragraph position="4"> bootstrapping iterations, training on the precompiled gazetteer Below we describe the results of both evaluating the classifiers and checking the ISLAND list.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.1 Bootstrapping with the initial gazetteer </SectionTitle> <Paragraph position="0"> The system's performance after the first two bootstrapping loops is shown in table 5, the initial system is added for comparison.</Paragraph> <Paragraph position="1"> The most surprising fact is that three classes (RIVER, MOUNTAIN, and COUNTRY) outperformed the initial system already after the first bootstrapping iteration. Unfortunately, RIVER and MOUNTAIN performed worse after the second loop, but they were still better than the system without bootstrapping.</Paragraph> <Paragraph position="2"> ISLANDS improved significantly at the second bootstrapping iteration, outperforming the initial system as well.</Paragraph> <Paragraph position="3"> The REGION class was problematic. One of the patterns the system extracted was &quot;departments of X&quot;. It produced new regions, but, additionally, many other names were added to the lexicon (such as Ecology or Economics). Some of them were filtered out by the high-precision classifier, but, unfortunately, many mistakes remained. This might have been dangerous, as those items, in turn, extracted wrong patterns and tried to infect the REGION class. However, due to our very cautious rechecking strategy, this did not happen: all the dangerous patterns were discarded at the second loop and the system was even able to produce a better classifier, slightly outperforming the initial system.</Paragraph> <Paragraph position="4"> The only class that performed badly was CITY. It was the most difficult task for both the initial and the new system. The problem is that city names can be used in much more different constructions than, for example, islands. Moreover, many cities were named after locations of other types, people, or different objects. Such homonyms make looking for CITIES collocations very complicated. There was only one good pattern, &quot;streets of X&quot; in the 20-best set at the first bootstrapping iteration. The system was able to pick it up and construct a classi- null fier with a very high precision (92.5%) and a very low recall (26.2%). This pattern in turn extracted new candidates. They helped to get two more reliable patterns -- at the second bootstrapping iteration the system produced &quot;km from X&quot; and &quot;ort X&quot; (&quot;place/city X&quot; in German). These new patterns increased the performance by 10.8%.</Paragraph> <Paragraph position="5"> We expect the CITY class to get significantly improved after the next 3-5 iterations and, hopefully, reach the initial performance as well.</Paragraph> <Paragraph position="6"> On average, our bootstrapping system performs not much worse than the initial one. Moreover, if one does not take CITIES into account, the new system performs even slightly better -- 90.9% the initial vs. 91.4% the bootstrapping system after the second loop. As CITIES are improving, we hope the new system will outperform the initial one soon.</Paragraph> <Paragraph position="7"> When one wants to use the system online, for classifying items in real time, a second issue becomes important. In that case the number of queries sent to AltaVista plays a very important role: each query slows the processing down dramatically. On average, the classifiers, produced by the no-bootstrapping system, send about six queries per class in the worst case. In our previous study we managed to reduce this number to 5 (for the C4.5 machine learner) by selecting features manually.</Paragraph> <Paragraph position="8"> The new system found more effective patterns: the classifiers require on average 4-5 queries in the worst case. Although after the second bootstrapping iteration there are twice more patterns available, the system worst case still produces classifiers requiring only few queries. For MOUNTAIN and COUNTRY the new system outperforms the initial one using two or even four times less patterns. Details are given in table 6.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.2 Bootstrapping with positive examples only </SectionTitle> <Paragraph position="0"> Although the current approach allows us to reduce the amount of hand-coding dramatically, we still need a pre-compiled gazetteer to train on. In fact, preparing even a small dataset of fully classified geographical names was a very hard and time-consuming task. On the other side, one can easily and quickly obtain a big dataset of partially classified names -- there are many lists of various locations on the Web. Unfortunately, these lists can only tell us, that some items belong to the class C, but not that they do not belong to it. Exploring the possibility of using Class Training on Training on the gazetteer positives precompiled gazetteer and on positive examples only such lists, we attempted to learn classifiers from positive examples only.</Paragraph> <Paragraph position="1"> The experiment was organized as follows. We take our 100-words lists and use them as a source of positive data: we eliminate all the classification labels and reclassify a training item X as +C, if it appears on the list for the class C, otherwise it is classified as -C. For example, Washington is represented as [+MOUNTAIN, -...],compared to [+MOUNTAIN, +CITY, +ISLAND, +REGION, -...] for the first experiment. Testing items remain unchanged, as we still want to learn the full classification. Of course, this sampling strategy (obtain negative examples merging all the unknown items) is too simple. In future we plan to investigate another ways of sampling.</Paragraph> <Paragraph position="2"> To start with, we ran our initial system in this new, &quot;positives-only&quot; mode. Table 7 shows the results. At first glance, they look a bit surprising, as several classes perform better when trained on deliberately spoiled data. However, this fact can be explained if one takes into account homonymy.</Paragraph> <Paragraph position="3"> In particular, quite often a city has the same name as, for example, a nearby mountain. This name, however, is used much more often to refer to the city, than to the mountain -- apart from some special ones, mountains are usually of less interest to authors of web pages, than cities. Therefore, when the full gazetteer is used, this name produces noisy data for the class MOUNTAIN, infecting it with CITY patterns at the extraction step (relevant for the bootstrapping system only, not for the initial one) and creating a CITY bias during the training. To sum up, allowing only positive information, we discard a few MOUNTAINS, that could potentially decrease the performance.</Paragraph> <Paragraph position="4"> The most significant improvement was shown by the REGION class. Our dataset contains many names of U.S. cities or towns, that can also refer to counties. In the first experiment they were all classified as [+CITY, +REGION], making the REGION data very noisy. In the second experiment we were able to increase the performance by 4.6%, classifying some of them as [+CITY, REGION]. null amples only CITIES, on the contrary, suffered a lot from the new learning strategy. First, about a half of names in the dataset are CITIES. Second, there are only few items, belonging to CITY and some other class, that are used rather seldom as [+CITY] (one of few examples is China -- [+CITY, +COUNTRY]). This resulted in a very poor performance for CITIES, when the classifier is trained on positives only.</Paragraph> <Paragraph position="5"> We also ran our bootstrapping system using only positive example for learning. The results are summarized in table 8.</Paragraph> <Paragraph position="6"> For the easier classes (ISLAND, RIVER, MOUN-TAIN, COUNTRY) the system performs very well.</Paragraph> <Paragraph position="7"> Moreover, the classifiers are almost always better than those we've got at the first experiment. However, one big problem arises -- with this setup the system has much less control over the noise, as there are no completely correct data available at all. In particular, the system can not overcome two difficulties. First, it is not able to extract reliable patterns for CITY at the second loop and, thus, make such an improvement as we have seen in the previous section. Second, the system can not defeat the &quot;departments&quot; items, appeared on the REGION list after the first bootstrapping iteration. As a result, REGIONS' performance decreases dramatically and it seems to be no way to repair the situation later.</Paragraph> <Paragraph position="8"> Overall, when trained on the gazetteer, the system improved significantly (2.7% on average) between the first and the second loops, the improvement affecting mainly two most difficult classes. On the contrary, when trained on positive examples only, the system improved only slightly (0.6% on average), and in rather useless manner.</Paragraph> </Section> <Section position="3" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.3 Names lists </SectionTitle> <Paragraph position="0"> Finally, we estimated the quality of learned names. For this purpose, we took the ISLAND list, mainly because it contained not too many names, and the classifier's performance was satisfactory.</Paragraph> <Paragraph position="1"> Downloading the first 2000 pages for each extraction pattern (cf. table 3) and then applying the high-precision gazetteer, we've got 134 new names, 93 of them are designated as islands in the atlases we used for reference. Additionally, 28 names refer to small islands, simply not listed in this resources. The list also contains 13 items, not referring to any particular island. However, not all of them are full mistakes. Thus, 3 items (Juan, Layang, and Phi) are parts of legitimate ISLAND names. And five more items are islands descriptions, such as Mediterranean islands.</Paragraph> <Paragraph position="2"> The remaining 5 items are mistakes. They all come from different proper names exploiting the ISLAND idea.</Paragraph> <Paragraph position="3"> For example, &quot;Monkey island&quot; is not an island, but a computer game.</Paragraph> </Section> </Section> class="xml-element"></Paper>