File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/p05-1070_relat.xml

Size: 2,579 bytes

Last Modified: 2025-10-06 14:15:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1070">
  <Title>Instance-based Sentence Boundary Determination by Optimization for Natural Language Generation</Title>
  <Section position="3" start_page="565" end_page="565" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Existing approaches to sentence boundary determination typically employ one of the following strategies. The first strategy uses domain-specific heuristics to decide which propositions can be combined.</Paragraph>
    <Paragraph position="1"> For example, Proteus (Davey, 1979; Ritchie, 1984) produces game descriptions by employing domain-specific sentence scope heuristics. This approach can work well for a particular application, however, it is not readily reusable for new applications.</Paragraph>
    <Paragraph position="2"> The second strategy is to employ syntactic, lexical, and sentence complexity constraints to control the aggregation of multiple propositions (Robin, 1994; Shaw, 1998). These strategies can generate fluent complex sentences, but they do not take other criteria into consideration, such as semantic cohesion. Further more, since these approaches do not employ global optimization as we do, the content of each sentence might not be distributed evenly. This may cause dangling sentence problem (Wilkinson, 1995).</Paragraph>
    <Paragraph position="3"> Another strategy described in Mann and Moore(1981) guided the aggregation process by using an evaluation score that is sensitive to the structure and term usage of a sentence. Similar to our approach, they rely on search to find an optimal solution. The main difference between this approach and ours is that their evaluation score is computed based on preference heuristics. For example, all the semantic groups existing in a domain have to be coded specifically in order to handle semantic grouping. In contrast, in our framework, the score is computed based on a sentence's similarity to corpus instances, which takes advantage of the naturally occurring semantic grouping in the corpus.</Paragraph>
    <Paragraph position="4"> Recently, Walker (2002) and Stent (2004) used statistical features derived from corpus to rank generated sentence plans. Because the plan ranker was trained with existing examples, it can choose a plan that is consistent with the examples. However, depending on the features used and the size of the training examples, it is unclear how well it can capture patterns like semantic grouping and avoid problems likes dangling sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML