File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1024_intro.xml
Size: 3,497 bytes
Last Modified: 2025-10-06 14:00:47
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1024"> <Title>A Multilingual News Summarizer</Title> <Section position="3" start_page="161" end_page="163" type="intro"> <SectionTitle> 3. Experiments </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="161" end_page="161" type="sub_section"> <SectionTitle> 3.1 l'reparation el'Testing Corpus </SectionTitle> <Paragraph position="0"> Six events selected from Central l)aily News, China I)aily Newspaper, China Times Interactive, and FTV News Online in Taiwan arc used to lneasure tile performance of each lnodel. They are shown as follows: (1) military service: 6 articles (2) construction permit: 4 articles (3) landslide in Shah Jr: 6 articles (4) Buslfs sons: 4 articles (5) Typhoon Babis: 3 articles (6) stabilization fund: 5 articles The news events are selected from different editions, including social edition, economic edition, international edition, political edition, etc. An annotator reads all tile news articles, and connects tile MUs that discuss the same story. Because each MU is assigned a unique ID, the links among MUs form the answer keys for the performance evaluation.</Paragraph> </Section> <Section position="2" start_page="161" end_page="163" type="sub_section"> <SectionTitle> 3.2 Resulls </SectionTitle> <Paragraph position="0"> Traditional precision and recall are computed.</Paragraph> <Paragraph position="1"> Table 1 lists the perfornmnce of these five models.</Paragraph> <Paragraph position="2"> M I is regarded as a baseline model. M2 is different l'ronl M1 in that the matching order of nouns itl\](l verbs are kept conditionally. It tries to consider the subject-verl>object sequence. The experiment shows that tile performance is worse.</Paragraph> <Paragraph position="3"> The major reason is that we c~ltl express the same meaning using different syntactic structures.</Paragraph> <Paragraph position="4"> Movement transformation may affect tile order of sulkiest-verb-object. Thus in M3 we give up the order criterion, but we add an extra score when continuous terms are matched, lind subtract some score when tile object of a transitive verb is not matched. Compared with M1, the precision is a little higher, and tile recall is improved about 4.5%. If we further consider some special named entities such as date/time expressions and monetary and percentage expressions in M4, tile recall is increased about 7.6% at no expense of precision.</Paragraph> <Paragraph position="5"> M5 tries Io estimate tile function of tile thesauri.</Paragraph> <Paragraph position="6"> It uses exact matching. Tile precision is a little higher but the recall is decreased abollt G% compared with M4.</Paragraph> <Paragraph position="7"> Several m~\ior errors affect tile overall performance. Using nouns and verbs to find the similar MUs is not always workable. Tile same meaning may not be expressed in terms of the same words or synonymous words. Besides, we can use different format to express monetary and percentage expressions. Word segmentation is another source of errors. Two sentences denoting tile similar meaning may be segmented differently clue to tile segmentation strategies.</Paragraph> <Paragraph position="8"> Unknown words generate many single-character words. After tagging, these words tend to be nOUllS and verbs, which are used in computing tile scores for similarity measure. Thus errors may be introduced.</Paragraph> </Section> </Section> class="xml-element"></Paper>