XML Viewer - c04-1202

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1202_intro.xml
Size: 3,988 bytes
Last Modified: 2025-10-06 14:02:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1202">
  <Title>Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Background
2.1 Gene Expression Programming
</SectionTitle>
    <Paragraph position="0"> Gene Expression Programming (GEP), first introduced by (Ferreira 2001), is an evolutionary algorithm that evolves computer programs and predicts mathematical models from experimental data. The algorithm is similar to Genetic Programming (GP), but uses fixed-length character strings (called chromosomes) to represent computer programs which are afterwards expressed as expression trees (ETs). GEP begins with a random population of candidate solutions in the form of chromosomes. The chromosomes are then mapped into ETs, evaluated based on a fitness function and selected by fitness to reproduce with modification via genetic operations.</Paragraph>
    <Paragraph position="1"> The new generation of solutions goes through the same process until the stop condition is satisfied.</Paragraph>
    <Paragraph position="2"> The fittest individual serves as the final solution.</Paragraph>
    <Paragraph position="3"> GEP has been used to solve symbolic regression, sequence induction, and classification problems efficiently (Ferreira 2002; Zhou 2003). We utilized GEP to find the explicit form of sentence ranking functions for the automatic text summarization.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Sentence Features
</SectionTitle>
      <Paragraph position="0"> In our current system, every sentence s is represented by five normalized features: * Location of the Paragraph (P):</Paragraph>
      <Paragraph position="2"> where M is the total number of paragraphs in a document; Y is the index of the paragraph s belongs to.</Paragraph>
      <Paragraph position="3"> * Location of the Sentence (S):</Paragraph>
      <Paragraph position="5"> where N is the total number of sentences in the paragraph; X is the index of sentence s.</Paragraph>
      <Paragraph position="6"> * Length of the Sentence (L): The length of the sentence is the number of words it contained, i.e., l(s), normalized by Sigmoid function:</Paragraph>
      <Paragraph position="8"> in that document; u(CW(S)) is the mean of all the sentence scores, and std(CW(s)) is the standard deviation.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Sentence ranking function
</SectionTitle>
      <Paragraph position="0"> We assume that for a certain type of documents, the mechanism to perform summarization would be the same. Therefore, we only need to find one algorithm that links a collection of documents and their corresponding summaries. We process the text summarization learning task in two stages: training and testing. In the training stage, a set of training documents with their summaries are provided, and the text features are preprocessed using statistical methods and natural language processing methods as defined in 2.2, then each sentence in a document is scored based on a sentence ranking function constructed by GEP. Fitness value of the summarization task is the similarity between the summary produced by the machine and the summarization text of training document. The top n ranked sentences</Paragraph>
      <Paragraph position="2"> The number of sentences extracted by the GEP module can be a variable, which is decided by the required number of words in a summary. Or it can be a specified percentage of the total number of sentences in the document.</Paragraph>
      <Paragraph position="3"> will be returned as the summary of that document and presented in their nature order. In the testing stage, a different document set is supplied to test the similarity between the machine summarized text and the human or other system summarized text.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML