File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-1005_metho.xml
Size: 5,856 bytes
Last Modified: 2025-10-06 14:09:31
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1005"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 33-40, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Improving Multilingual Summarization: Using Redundancy in the Input to Correct MT errors</Title> <Section position="4" start_page="35" end_page="36" type="metho"> <SectionTitle> ROLE=a3 Defaulta4 COUNTRY ROLE PERSON ORG ROLE PERSON COUNTRY ORG ROLE PERSON ROLE PERSON </SectionTitle> <Paragraph position="0"> If no frame matches, organizations, countries and locations are dropped one by one in decreasing order of argument number, until a matching frame is found. After a frame is selected, any prenominal temporal adjectives in the AVM are inserted to the left of the frame, and any postnominal temporal adjectives are inserted to the immediate right of the role in the frame. Country names that are not objects of a preposition are replaced by their adjectival forms (using the correspondences in the CIA factsheet). For the AVMs above, our generation module produces the following referring expressions:</Paragraph> <Section position="1" start_page="35" end_page="36" type="sub_section"> <SectionTitle> 2.6 Evaluation </SectionTitle> <Paragraph position="0"> To evaluate the referring expressions generated by our program, we used the manual translation of each document provided by DUC. The drawback of using a summarization corpus is that only one human translation is provided for each document, while multiple model references are required for automatic evaluation. We created multiple model references by using the initial references to a person in the manual translation of each input document in the set in which that person was referenced. We calculated unigram, bigram, trigram and fourgram precision, recall and f-measure for our generated references evaluated against multiple models from the manual translations. To illustrate the scoring, consider evaluating a generated phrase a b d against three model references a b c d , a b c and b c d . The bigram precision is a128a36a129a88a130a132a131a134a133a120a135a78a136 (one out of two bigrams in generated phrase occurs in the model set), bigram recall is a130a86a129a67a137a138a131a139a133a120a135a78a130a67a140a67a141 (two out of 7 bi-grams in the models occurs in the generated phrase) and f-measure (a142a143a131a144a130a33a145a147a146a149a148a150a129a120a151a125a145a9a152a153a148a150a154 ) is a133a120a135a78a155a67a141a20a156 . For fourgrams, P, R and F are zero, as there is a fourgram in the models, but none in the generated NP.</Paragraph> <Paragraph position="1"> We used 6 document sets from DUC'04 for development purposes and present the average P, R and F for the remaining 18 sets in Table 1. There were 210 generated references in the 18 testing sets. The table also shows the popular BLEU (Papineni et al., 2002) and NIST2 MT metrics. We also provide two baselines - most frequent initial reference to the person in the input (Base1) and a randomly selected initial reference to the person (Base2). As Table 1 shows, Base1 performs better than random selection. This is intuitive as it also uses redundancy to correct errors, at the level of phrases rather than words. The generation module outperforms both baselines, particularly on precision - which for unigrams gives an indication of the correctness of lexical choice, and for higher ngrams gives an indication of grammaticality. The unigram recall of a133a120a135a125a137a88a140a67a141 indicates that we are not losing too much information at the noise ltering stage. Note that we expect a low a165a167a166a53a168 for our approach, as we only generate particular attributes that are important for a summary. The important measure is a169a170a166a27a168 , on which we do well. This is also re ected in the high scores on BLEU and NIST.</Paragraph> <Paragraph position="2"> It is instructive to see how these numbers vary as the amount of redundancy increases. Information theory tells us that information should be more recoverable with greater redundancy. Figure 1 plots f-measure against the minimum amount of redundancy. In other words, the value at X=3 gives the f-measure averaged over all people who were mentioned at least thrice in the input. Thus X=1 includes all examples and is the same as Table 1.</Paragraph> <Paragraph position="3"> As the graphs show, the quality of the generated reference improves appreciably when there are at least 5 references to the person in the input. This is a convenient result for summarization because people who are mentioned more frequently in the input are more likely to be mentioned in the summary.</Paragraph> </Section> <Section position="2" start_page="36" end_page="36" type="sub_section"> <SectionTitle> 2.7 Advantages over using extraneous sources </SectionTitle> <Paragraph position="0"> Our approach performs noise reduction and generates a reference from information extracted from the machine translations. Information about a person can be obtained in other ways; for example, from a database, or by collecting references to the person from extraneous English-language reports. There are two drawbacks to using extraneous sources: 1. People usually have multiple possible roles and af liations, so descriptions obtained from an external source might not be appropriate in the current context.</Paragraph> <Paragraph position="1"> 2. Selecting descriptions from external sources can change perspective one country's terrorist is another country's freedom ghter.</Paragraph> <Paragraph position="2"> In contrast, our approach generates references that are appropriate and re ect the perspectives expressed in the source.</Paragraph> </Section> </Section> class="xml-element"></Paper>