File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1010_metho.xml
Size: 12,027 bytes
Last Modified: 2025-10-06 14:09:11
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1010"> <Title>Template-Filtered Headline Summarization</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 First Look at the Headline Templates </SectionTitle> <Paragraph position="0"> It is difficult to formulate a rule set that defines how headlines are written. However, we may discover how headlines are related to the templates derived from them using a training set of 60933 (headline, text) pairs.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Template Creation </SectionTitle> <Paragraph position="0"> We view each headline in our training corpus as a potential template. For any new text(s), if we can select an appropriate template from the set and fill it with content words, then we will have a well-structured headline. An abstract representation of the templates suitable for matching against new material is required. In our current work, we build templates at the part-of-speech (POS) level.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Sequential Recognition of Templates </SectionTitle> <Paragraph position="0"> We tested how well headline templates overlap with the opening sentences of texts by matching POS tags sequentially. The second column of Table 1 shows the percentage of files whose POSlevel headline words appeared sequentially within the context described in the first column.</Paragraph> <Paragraph position="1"> of a headline against its text, on training data</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Filling Templates with Key Words </SectionTitle> <Paragraph position="0"> Filling POS templates sequentially using tagging information alone is obviously not the most appropriate way to demonstrate the concept of headline summarization using template abstraction, since it completely ignores the semantic information carried by words themselves.</Paragraph> <Paragraph position="1"> Therefore, using the same set of POS headline templates, we modified the filling procedure.</Paragraph> <Paragraph position="2"> Given a new text, each word (not a stop word) is categorized by its POS tag and ranked within each POS category according to its tf.idf weight. A word with the highest tf.dif weight from that POS category is chosen to fill each placeholder in a template. If the same tag appears more than once in the template, a subsequent placeholder is filled with a word whose weight is the next hig hest from the same tag category. The score for each filled template is calculated as follows:</Paragraph> <Paragraph position="4"> where score_t(i) denotes the final score assigned to template i of up to N placeholders and Wj is the tf.idf weight of the word assigned to a placeholder in the template. This scoring mechanism prefers templates with the most desirable length. The highest scoring template-filled headline is chosen as the result.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Key Phrase Selection </SectionTitle> <Paragraph position="0"> The headlines generated in Section 3 are grammatical (by virtue of the templates) and reflect some content (by virtue of the tf.idf scores). But there is no guarantee of semantic accuracy! This led us to the search of key phrases as the candidates for filling headline templates. Headline phrases should be expanded from single seed words that are important and uniquely reflect the contents of the text itself. To select the best seed words for key phrase expansion, we studied several keyword selection models, described below.</Paragraph> <Paragraph position="1"> 4. 1 Model Selection Bag -of-Words Models 1) Sentence Position Model: Sentence position information has long proven useful in identifying topics of texts (Edmundson, 1969). We believe this idea also applies to the selection of headline words. Given a sentence with its position in text, what is the likelihood that it would contain the first appearance of a headline word:</Paragraph> <Paragraph position="3"> Over all M texts in the collection and over all words from the corresponding M headlines (each has up to N words), Count_Pos records the number of times that sentence position i has the first appearance of any headline word Wj. P(Hk |Wj) is a binary feature. This is computed for all sentence positions from 1 to Q. Resulting P(Posi) is a table on the tendency of each sentence position containing one or more headlines words (without indicating exact words).</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 2) Headline Word Position Model: For each </SectionTitle> <Paragraph position="0"> headline word Wh, it would most likely first appear at sentence position Posi:</Paragraph> <Paragraph position="2"> The difference between models 1 and 2 is that for the sentence position model, statistics were collected for each sentence position i; for the headline word positio n model, information was collected for each headline word Wh.</Paragraph> <Paragraph position="3"> 3) Text Model: This model captures the correlation between words in text and words in headlines</Paragraph> <Paragraph position="5"> doc_tf(w,j) denotes the term frequency of word w in the jth document of all M documents in the collection. title_tf(w,j) is the term frequency of word w in the jth title. Hw and Tw are words that appear in both the headline and the text body. For each instance of Hw and Tw pair, Hw = Tw.</Paragraph> <Paragraph position="6"> 4) Unigram Headline Model: Unigram probabilities on the headline words from the training set. 5) Bigram Headline Model: Bigram probabilities on the headline words from the training set.</Paragraph> <Paragraph position="7"> Choice on Model Combinations Having these five models, we needed to determine which model or model combination is best suited for headline word selection. The blind data was the DUC2001 test set of 108 texts. The reference headlines are the original headlines with a total of 808 words (not including stop words). The evaluation was based on the cumulative unigram overlap between the n top-scoring words and the reference headlines. The models are numbered as in Section 4.1. Table 2 shows the effectiveness of each model/model combination on the top 10, 20, 30, 40, and 50 scoring words.</Paragraph> <Paragraph position="8"> Clearly, for all lengths greater than 10, sentence position (model 1) plays the most important role in selecting headline words. Selecting the top 50 words solely based on position information means that sentences in the beginning of a text are the most informative. However, when we are wor- null king with a more restricted length requirement, text model (model 3) adds advantage to the position model (highlighted, 7th from the bottom of Table 2). As a result, the following combination of sentence position and text model was used:</Paragraph> <Paragraph position="10"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Phrase Candidates to Fill Templates </SectionTitle> <Paragraph position="0"> Section 4.1 explained how we select headline-worthy words. We now need to expand them into phrases as candidates for filling templates. As illustrated in Table 2 and stated in (Zajic et al., 2002), headlines from newspaper texts mostly use words from the beginning of the text. Therefore, we search for n-gram phrases comprising key-words in the first part of the story. Using the model combination selected in Section 4.1, 10 top-scoring words over the whole story are selected and hig hlighted in the first 50 words of the text. The system should have the ability of pulling out the largest window of top-scoring words to form the headline. To help achieve grammaticality, we produced bigrams surrounding each headline-worthy word (underlined), as shown in Figure 1. From connecting overlapping bigrams in sequence, one sees interpretable clusters of words forming. Multiple headline phrases are considered as candidates for template filling. Using a set of hand-written rules, dangling words were removed from the beginning and end of each headline phrase.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Filling Templates with Phrases </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Method </SectionTitle> <Paragraph position="0"> Key phrase clustering preserves text content, but lacks the complete and correct representation for structuring phrases. The phrases need to go through a grammar filter/reconstruction stage to gain grammaticality.</Paragraph> <Paragraph position="1"> A set of headline-worthy phrases with their corresponding POS tags is presented to the template filter. All templates in the collection are matched against each candidate headline phrase. Strict tag matching produces a small number of matching templates. To circumvent this problem, a more general tag-matching criterion, where tags belonging to the same part-of-speech category can be matched interchangeably, was used.</Paragraph> <Paragraph position="2"> Headline phrases tend to be longer than most of the templates in the colle ction. This results in only partial matches between the phrases and the templates. A score of fullness on the phrase-template match is computed for each candidate template fti:</Paragraph> <Paragraph position="4"> ti is a candidate template and hi is a headline phrase. The top-scoring template is used to filter each headline phrase in composing the final multiphrase headline. Table 3 shows a random sele ction of the results produced by the system.</Paragraph> <Paragraph position="5"> can be concatenated from several phrases, separated by '/'s</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Evaluation </SectionTitle> <Paragraph position="0"> Ideally, the evaluation should show the system's performance on both content selection and grammaticality. However, it is hard to measure the level of grammaticality achieved by a system computationally. Similar to (Banko, et al., 2000), we restricted the evaluation to a quantitative analysis on content only.</Paragraph> <Paragraph position="1"> Our system was evaluated on previously unseen DUC2003 test data of 615 files. For each file, headlines generated at various lengths were compared against i) the original headline, and ii) headlines written by four DUC2003 human assessors.</Paragraph> <Paragraph position="2"> The performance metric was to count term overlaps between the generated headlines and the test standards.</Paragraph> <Paragraph position="3"> Table 4 shows the human agreement and the performance of the system comparing with the two test standards. P and R are the precision and recall scores.</Paragraph> <Paragraph position="4"> The system-generated headlines were also evaluated using the automatic summarization evalua- null words Allegations of police racism and brutality have shaken this city that for decades has prided itself on a progressive attitude toward civil rights and a reputation for racial harmony. The death of two blacks at a drug raid that went awry, followed 10 days later by a scuffle between police and...</Paragraph> <Paragraph position="5"> lap 2003). The ROUGE score is a measure of n-gram recall between candidate headlines and a set of reference headlines. Its simplicity and reliability are gaining audience and becoming a standard for performing automatic comparative summarization evaluation. Table 5 shows the ROUGE performance results for generate d headlines with length 12 against headlines written by human assessors.</Paragraph> </Section> </Section> class="xml-element"></Paper>