XML Viewer - c04-1094

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1094_metho.xml
Size: 15,735 bytes
Last Modified: 2025-10-06 14:08:41
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1094">
  <Title>Using Syntactic Information to Extract Relevant Terms for Multi-Document Summarization</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Test bed: the ISCORPUS
</SectionTitle>
    <Paragraph position="0"> We have created a reference test bed, the ISCOR-PUS1 (Amigo et al., 2004) which contains 72 manually generated reports summarizing the relevant information for a given topic contained in a large document set.</Paragraph>
    <Paragraph position="1"> For the creation of the corpus, nine subjects performed a complex multi-document summarization  task for eight different topics and one hundred relevant documents per topic. After creating each topic-oriented summary, subjects were asked to make a list of relevant concepts for the topic, in two categories: relevant entities (people, organizations, etc.) and relevant factors (such as &amp;quot;ethnic conflicts&amp;quot; as the origin of a civil war) which play a key role in the topic being summarized.</Paragraph>
    <Paragraph position="2"> These are the relevant details of the ISCORPUS test bed:</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Document collection and topic set
</SectionTitle>
      <Paragraph position="0"> We have used the Spanish CLEF 2001-2003 news collection testbed (Peters et al., 2002), and selected the eight topics with the largest number of documents manually judged as relevant from the CLEF assessment pools. All the selected CLEF topics have more than one hundred documents judged as relevant by the CLEF assessors; for homogeneity, we have restricted the task to the first 100 documents for each topic (using a chronological order). This set of eight CLEF topics was found to have two differentiated subsets: in six topics, it is necessary to study how a situation evolves in time: the importance of every event related to the topic can only be established in relation with the others. The invasion of Haiti by UN and USA troops is an example of such kind of topics. We refer to them as &amp;quot;Topic Tracking&amp;quot; (TT) topics, because they are suitable for such a task. The other two questions, however, resemble &amp;quot;Information Extraction&amp;quot; (IE) tasks: essentially, the user has to detect and describe instances of a generic event (for instance, cases of hunger strikes and campaigns against racism in Europe in this case); hence we will refer to them as IE summaries.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Generation of manual summaries
</SectionTitle>
      <Paragraph position="0"> Nine subjects between 25 and 35 years-old were recruited for the manual generation of summaries. All subjects were given an in-place detailed description of the task, in order to minimize divergent interpretations. They were told they had to generate summaries with a maximum of information about every topic within a 50 sentence space limit, using a maximum of 30 minutes per topic. The 50 sentence limit can be temporarily exceeded and, once the 30 minutes have expired, the user can still remove sentences from the summary until the sentence limit is reached back.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Manual identification of key concepts
</SectionTitle>
      <Paragraph position="0"> After summarizing every topic, the following questionnaire was filled in by users: Who are the main people involved in the topic? What are the main organizations participating in the topic? What are the key factors in the topic? Users provided free-text answers to these questions, with their freshly generated summary at hand. We did not provide any suggestions or constraints at this point, except that a maximum of eight slots were available per question (i.e., a maximum of 8X3 = 24 key concepts per topic, per user).</Paragraph>
      <Paragraph position="1"> This is, for instance, the answer of one user for a topic about the invasion of Haiti by UN and USA troops:  militares golpistas (coup attempting soldiers) golpe militar (coup attempt) restaurar la democracia (reinstatement of democracy) Finally, a single list of key concepts is generated for each topic, joining all the answers given by the nine subjects. These lists of key concepts constitute the gold standard for all the experiments described below.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Shallow parsing of documents
</SectionTitle>
      <Paragraph position="0"> Documents are processed with a robust shallow parser based in finite automata. The parser splits sentences in chunks and assigns a label to every chunk. The set of labels is: [N]: noun phrases, which correspond to names or adjectives preceded by a determiner, punctuation sign, or beginning of a sentence.</Paragraph>
      <Paragraph position="1"> [V]: verb forms.</Paragraph>
      <Paragraph position="2"> [Mod]: adverbial and prepositional phrases, made up of noun phrases introduced by an adverb or preposition. Note that this is the mechanism to express NP modifiers in Spanish (as compared to English, where noun compounding is equally frequent).</Paragraph>
      <Paragraph position="3"> [Sub]: words introducing new subordinate clauses within a sentence (que, cuando, mientras, etc.).</Paragraph>
      <Paragraph position="4"> [P]: Punctuation marks.</Paragraph>
      <Paragraph position="5"> This is an example output of the chunker: Previamente [Mod] ,[P]el presidente Bill Clinton [N] hab'ia dicho [V] que [Sub] tenemos [V] la obligacion [N] de cambiar la pol'itica estadounidense [Mod] que [Sub] no ha funcionado [V] en Hait'i [Mod].[P] Although the precision of the parser is limited, the results are good enough for the statistical measures used in our experiments.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Distribution of key concepts in syntactic
</SectionTitle>
    <Paragraph position="0"> structures We have extracted empirical data to answer these questions: Is the probability of finding a key concept correlated with the distance to the verb in a sentence or clause? Is the probability of finding a key concept in a noun phrase correlated with the syntactic function of the phrase (subject, object, etc.)? Within a noun phrase, where is it more likely to find key concepts: in the noun phrase head, or in the modifiers? We have used certain properties of Spanish syntax (such as being an SVO language) to decide which noun phrases play a subject function, which are the head and modifiers of a noun phrase, etc. For instance, NP modifiers usually appear after the NP head in Spanish, and the specification of a concept is usually made from left to right.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Distribution of key concepts with verb
</SectionTitle>
      <Paragraph position="0"> distance Figure 1 shows, for every topic, the probability of finding a word from the manual list of key concepts in fixed distances from the verb of a sentence. Stop words are not considered for computing word distance. The broader line represents the average across topics, and the horizontal dashed line is the average probability across all positions, i.e., the probability that a word chosen at random belongs to the list of key concepts.</Paragraph>
      <Paragraph position="1"> The plot shows some clear tendencies in the data: the probability gets higher when we get close to the verb, falls abruptly after the verb, and then grows steadily again. For TT topics, the probability of finding relevant concepts immediately before the verb is 56% larger than the average (0:39 before the verb, versus 0:25 in any position). This is true not only as an average, but also for all individual TT topics. This can be an extremely valuable result: it shows a direct correlation between the position of a term in a sentence and the importance of the term in the topic. Of course, this direct distance to the verb should be adapted for languages with different syntactic properties, and should be validated for different domains.</Paragraph>
      <Paragraph position="2"> The behavior of TT and IE topics is substantially different. IE topics have smaller probabilities overall, because there are less key concepts common to all documents. For instance, if the topic is &amp;quot;cases of hunger strikes&amp;quot;, there is little in common between  all cases of hunger strikes found in the collection; each case has its own relevant people and organizations, for instance. Users try to make abstraction of individual cases to write key concepts, and then the number of key concepts is smaller. The tendency to have larger probabilities just before the verb and smaller probabilities just after the verb, however, can also be observed for IE topics.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Key Concepts and Noun Phrase Syntactic
Function
</SectionTitle>
      <Paragraph position="0"> We wanted also to confirm that it is more likely to find a key concept in a subject noun phrase than in general NPs. For this, we have split compound sentences in chunks, separating subordinate clauses ([Sub] type chunks). Then we have extracted sequences with the pattern [N][Mod]*. We assume that the sentence subject is a sequence[N][Mod]* occurring immediately before the verb. For instance: null El presidente [N] en funciones [Mod] de Hait'i [Mod] ha afirmado [V] que [Sub]...</Paragraph>
      <Paragraph position="1"> The rest of [N] and [Mod] chunks are considered as part of the sentence verb phrase. In a majority of cases, these assumptions lead to a correct identification of the sentence subject. We do not capture, however, subjects of subordinate sentences or subjects appearing after the verb.</Paragraph>
      <Paragraph position="2"> Figure 2 shows how the probability of finding a key concept is always larger in sentence subjects. This result supports the assumption in (Boguraev et al., 1998), where noun phrases receive a higher weight, as representative terms, if they are syntactic subjects.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Distribution of key concepts within noun
</SectionTitle>
      <Paragraph position="0"> head versus NP modifiers For this analysis, we assume that, in [N][Mod]* sequences identified as subjects, [N] is the head and [Mod]* are the modifiers. Figure 3 shows that the probability of finding a key concept in the NP modifiers is always higher than in the head (except for topic TT3, where it is equal). This is not intuitive a priori; an examination of the data reveals that the most characteristic concepts for a topic tend to be in the complements: for instance, in &amp;quot;the president of Haiti&amp;quot;, &amp;quot;Haiti&amp;quot; carries more domain information than &amp;quot;president&amp;quot;. This seems to be the most common case in our news collection. Of course, it cannot be guaranteed that these results will hold in other domains.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Automatic Selection of Key Terms
</SectionTitle>
    <Paragraph position="0"> We have shown that there is indeed a correlation between syntactic information and the possibility of finding a key concept. Now, we want to explore whether this syntactic information can effectively be used for the automatic extraction of key concepts.</Paragraph>
    <Paragraph position="1"> The problem of extracting key concepts for summarization involves two related issues: a) What kinds of terms should be considered as candidates? and b) What is the optimal weighting criteria for them? There are several possible answers to the first question. Previous work includes using noun phrases (Boguraev et al., 1998; Jones et al., 2002), words (Buyukkokten et al., 1999), n-grams (Leuski et al., 2003; Lin and Hovy, 1997) or proper nouns, multi-word terms and abbreviations (Neff and Cooper, 1999).</Paragraph>
    <Paragraph position="2"> Here we will focus, however, in finding appropriate weighting schemes on the set of candidate terms. The most common approach in interactive single-document summarization is using tf.idf measures (Jones et al., 2002; Buyukkokten et al., 1999; Neff and Cooper, 1999), which favour terms which are frequent in a document and infrequent across the collection. In the iNeast system (Leuski et al., 2003), the identification of relevant terms is oriented towards multi-document summarization, and they use a likelihood ratio (Dunning, 1993) which favours terms which are representative of the set of documents as opposed to the full collection.</Paragraph>
    <Paragraph position="3"> Other sources of information that have been used as complementary measures consider, for instance, the number of references of a concept (Boguraev et al., 1998), its localization (Jones et al., 2002) or the distribution of the term along the document (Buyukkokten et al., 1999; Boguraev et al., 1998).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Experimental setup
</SectionTitle>
      <Paragraph position="0"> A technical difficulty is that the key concepts introduced by the users are intellectual elaborations, which result in complex expressions which might even not be present (literally) in the documents.</Paragraph>
      <Paragraph position="1"> Hence, we will concentrate on extracting lists of terms, checking whether these terms are part of some key concept. We will assume that, once key terms are found, it is possible to generate full nominal expressions using, for instance, phrase browsing strategies (Pe~nas et al., 2002).</Paragraph>
      <Paragraph position="2"> We will then compare different weighting criteria to select key terms, using two evaluation measures: a recall measure saying how well manually selected key concepts are covered by the automatically generated term list; and a noise measure counting the number of terms which do not belong to any key concept. An optimal list will reach maximum recall with a minimum of noise. Formally: R = jCljjCj Noise =jLnj where C is the set of key concepts manually selected by users; L is a (ranked) list of terms generated by some weighting schema;Ln is the subset of terms inLwhich do not belong to any key concept; andCl is the subset of key concepts which are represented by at least one term in the ranked listL. Here is a (fictitious) example of how R and Noise are computed: C =fHaiti, reinstatement of democracy, UN and USA troopsg L=fHaiti, soldiers, UN, USA, attemptg ! Cl =fHaiti, UN and USA troopsg R = 2=3L n =fsoldiers,attemptg Noise = 2 We will compare the following weighting strategies: null TF The frequency of a word in the set of documents is taken as a baseline measure.</Paragraph>
      <Paragraph position="3"> Likelihood ratio This is taken from (Leuski et al., 2003) and used as a reference measure. We have implemented the procedure described in (Rayson and Garside, 2000) using unigrams only.</Paragraph>
      <Paragraph position="4"> OKAPImod We have also considered a measure derived from Okapi and used in (Robertson et al., 1992). We have adapted the measure to consider the set of 100 documents as one single document.</Paragraph>
      <Paragraph position="5"> TFSYNTAX Using our first experimental result, TFSYNTAX computes the weight of a term as the number of times it appears preceding a</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML