File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/m98-1019_evalu.xml
Size: 4,005 bytes
Last Modified: 2025-10-06 14:00:29
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1019"> <Title>NYU: DESCRIPTION OF THE JAPANESE NE SYSTEM USED FOR MET-2</Title> <Section position="5" start_page="1" end_page="1" type="evalu"> <SectionTitle> RESULTS </SectionTitle> <Paragraph position="0"> We will report resultsof#0Cve experimentsdescribed in Table 3. Here, #5CTrainingdata&quot;, #5CDry rundata&quot; and#5CFormal rundata&quot; are thedata distributed by SAIC, and #5Cseefu data&quot; is thedata created by Oki, NTT dataand NYU #28available through #5B8#5D#29. Notethat all Training, Dry runand seefu data are in thetopic of ISURAERU #28first token#29 KEISATSU #28second token#29 if current token is a location -#3E yes if current token is a location -#3E no if next token is a loc-suffix -#3E no if current token is a organization -#3E no if next token is a person-suffix-#3Eno if current token is a time -#3E no if next token is a org-suffix -#3E yes if current token is a loc-suffix -#3E no if previous token is a location -#3E no if next token is a time-suffix -#3E no THEN none = 0.67, org-OP-CN = 0.33 if current token is a time-suffix -#3E no if next token is a date-suffix -#3E no if current token is a date-suffix -#3E no if current token is a date -#3E no if next token is a location -#3E no if current token is a org-suffix -#3E yes if previous token is a location -#3E yes The resultsofFormal runandthe best in-house dry-run are shown in Table 4. We can clearly tell that the recall of Named Entities #28person, organization andlocation#29 are bad. This is caused bythechange of the topic. For example, there are very few foreign person names written in Katakanainthe trainingdata, #28as a foreign person would hardly be a victim of a crash in Japan#29. However, in the space craft launch, there are many foreign person names written in Katakana. This is the reason whythe recall of persons is so low. Also, in thetest documents, planet names, #5CtheSun&quot;,&quot;the Earth&quot; or #5CSaturn&quot; are tagged as locations, which could not be predicted from the trainingtopic. We missed all suchnames in the formal test. The best in-house Dry run resultwas achieved before the formal run without lookingatthetest data. So it should be regarded as an example of the performance if we knowthetopic of thematerial. Wethink this is satisfactory, consideringthatthe e#0Bort wemadewas just preparing dictionaries andnopatterns. Table 5 shows three experiments performed after the formal run. As thetopic change may degradeofthe performance, we conducted experiments in whichthe trainingdata includes documentsinthe sametopic. The #0Crst experiment used 75#25 of the formal rundata for trainingandtherestofthedata for testing. Four such experimentswere madeto obtain the result for theentire corpus. The second experiment includes the trainingdata used in the formal run in addition tothe 75#25 of the formal rundata. Thetable shows about 1#25 improvementover the formal run. This is an encouraging result, thebetter performance was achieved with only 75 articles on the sametopic compared with 294 articles on a di#0Berenttopic used in the formal run. The resultofthe second experiment also shows a good sign thatdocuments in a di#0Berenttopic helped to improvethe performance. This result suggestsanidea of #5Cdomain adaptation scheme&quot;. Thatistohavea large general corpus of tagged documentsasthe basis, andto add small domain speci#0Cc documentstohavea domainspeci#0Cc system. Lastly,inthethird experiment, we added the planet names in thelocation dictionary. From the formal run result, it was clear thatoneofthemain reasons of the performance degradation is the lackofthe planet names. The addition improves 3.5#25 whichisbetter than theother trials. Although there are several other obvious reasons to be #0Cxed, the F-measure 86.34 is comparable tothe best in-house Dry run experimentdescribed before #28Experiment 2;F-measure = 88.62#29.</Paragraph> </Section> class="xml-element"></Paper>