File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1011_evalu.xml
Size: 2,810 bytes
Last Modified: 2025-10-06 13:58:41
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1011"> <Title>Automatic Title Generation for Spoken Broadcast News</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3. RESULTS AND OBSERVATIONS </SectionTitle> <Paragraph position="0"> The experiment was conducted both on the closed caption transcripts and automatic speech recognized transcripts. The F1 results and the average number of correct title word in correct order are shown in Figure 1 and 2 respectively.</Paragraph> <Paragraph position="1"> KNN works surprisingly well. KNN generates titles for a new document by choosing from the titles in the training corpus. This works fairly well because both the training set and test set come from CNN news of the same year. Compared to other methods, KNN degrades much less with speech-recognized transcripts.</Paragraph> <Paragraph position="2"> Meanwhile, even though KNN performance not as well as TF.IDF and NBL in terms of F1 metric, it performances best in terms of the average number of correct title words in the correct order. If consideration of human readability matters, we would expect KNN to outperform considerately all the other approaches since it is guaranteed to generate human readable title.</Paragraph> <Paragraph position="3"> test corpus of 1006 documents with either perfect transcript or speech recognized transcripts using the F1 score.</Paragraph> <Paragraph position="4"> NBF performs much worse than NBL. NBF performances much worse than NBL in both metrics. The difference between NBF and NBL is that NBL assumes a document word can only generate a title word with the same surface string. Though it appears that NBL loses information with this very strong assumption, the results tell us that some information can safely be ignored. In NBF, nothing distinguishes between important words and trivial words. This lets frequent, but unimportant words dominate the document-word-title-word correlation.</Paragraph> <Paragraph position="5"> Light learning approach TF.IDF performances considerably well compared with heavy learning approaches. Surprisingly, heavy learning approaches, NBL, NBF and EM algorithm didn't out performance the light learning approach TF.IDF. We think learning the association between document words and title words by inspecting directly the document and its title is very problematic since many words in the document don't reflect its content. The better strategy should be distilling the document first before learning the correlation between document words and title words.</Paragraph> <Paragraph position="6"> test corpus of 1006 documents with either perfect transcript or speech recognized transcripts using the average number of correct words in the correct order.</Paragraph> </Section> class="xml-element"></Paper>