File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1038_intro.xml
Size: 4,063 bytes
Last Modified: 2025-10-06 14:03:19
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1038"> <Title>Discriminative Sentence Compression with Soft Syntactic Evidence</Title> <Section position="3" start_page="297" end_page="298" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> Knight and Marcu (2000) first tackled this problem by presenting a generative noisy-channel model and a discriminative tree-to-tree decision tree model. The noisy-channel model defines the problem as finding the compressed sentence with maximum conditional probability</Paragraph> <Paragraph position="2"> P(y) is the source model, which is a PCFG plus bigram language model. P(x|y) is the channel model, the probability that the long sentence is an expansion of the compressed sentence. To calculate the channel model, both the original and compressed versions of every sentence in the training set are assigned a phrase-structure tree. Given a tree for a long sentence x and compressed sentence y, the channel probability is the product of the probability for each transformation required if the tree for y is to expand to the tree for x.</Paragraph> <Paragraph position="3"> The tree-to-tree decision tree model looks to rewrite the tree for x into a tree for y. The model uses a shift-reduce-drop parsing algorithm that starts with the sequence of words in xand the corresponding tree. The algorithm then either shifts (considers new words and subtrees for x), reduces (combines subtrees from x into possibly new tree constructions) or drops (drops words and subtrees from x) on each step of the algorithm. A decision tree model is trained on a set of indicative features for each type of action in the parser. These models are then combined in a greedy global search algorithm to find a single compression.</Paragraph> <Paragraph position="4"> Though both models of Knight and Marcu perform quite well, they do have their shortcomings.</Paragraph> <Paragraph position="5"> The noisy-channel model uses a source model that is trained on uncompressed sentences, even though the source model is meant to represent the probability of compressed sentences. The channel model requires aligned parse trees for both compressed and uncompressed sentences in the training set in order to calculate probability estimates. These parses are provided from a parsing model trained on out-of-domain data (the WSJ), which can result in parse trees with many mistakes for both the original and compressed versions. This makes alignment difficult and the channel probability estimates unreliable as a result. On the other hand, the decision tree model does not rely on the trees to align and instead simply learns a tree-to-tree transformation model to compress sentences. The primary problem with this model is that most of the model features encode properties related to including or dropping constituents from the tree withno encoding of bigram or trigram surface features to promote grammaticality. As a result, the model will sometimes return very short and ungrammatical compressions.</Paragraph> <Paragraph position="6"> Both models rely heavily on the output of a noisy parser to calculate probability estimates for the compression. We argue in the next section that ideally, parse trees should be treated solely as a source of evidence when making compression decisions to be balanced with other evidence such as that provided by the words themselves.</Paragraph> <Paragraph position="7"> Recently Turner and Charniak (2005) presented supervised and semi-supervised versions of the Knight and Marcu noisy-channel model. The resulting systems typically return informative and grammatical sentences, however, they do so at the cost of compression rate. Riezler et al. (2003) present a discriminative sentence compressor over the output of an LFG parser that is a packed representation of possible compressions. Though this model is highly likely to return grammatical compressions, it required the training data be human annotated with syntactic trees.</Paragraph> </Section> class="xml-element"></Paper>