File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2065_intro.xml
Size: 1,997 bytes
Last Modified: 2025-10-06 14:03:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2065"> <Title>Unsupervised Analysis for Decipherment Problems</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Decipherment </SectionTitle> <Paragraph position="0"> In this paper, we look at a particular type of unsupervised analysis problem in which we face a ciphertext stream and try to uncover the plaintext that lies behind it. We will investigate several applications that can be pro tably analyzed this way. We will also apply the same technical solution these different problems.</Paragraph> <Paragraph position="1"> The method follows the well-known noisy-channel framework. At the top level, we want to nd the plaintext that maximizes the probability P(plaintext a0 ciphertext). We rst build a probabilistic model P(p) of the plaintext source. We then build probabilistic channel model P(c a0 p) that explains how plaintext sequences (like p) become ciphertext sequences (like c). Some of the parameters in these models can be estimated with supervised training, but most cannot.</Paragraph> <Paragraph position="2"> When we face a new ciphertext sequence c, we rst use expectation-maximization (EM) (Dempster, Laird, and Rubin, 1977) to set all free parameters to maximize P(c), which is the same (by Bayes Rule) as maximizing the sum over all p of P(p) a1 P(c a0 p). We then use the Viterbi algorithm to choose the p maximizing P(p) a1 P(c a0 p), which is the same (by Bayes Rule) as our original goal of maximizing P(p a0 c), or plaintext given ciphertext.</Paragraph> <Paragraph position="3"> Figures 1 and 2 show standard EM algorithms (Knight, 1999) for the case in which we have a bi-gram P(p) model (driven by a two-dimensional b table of bigram probabilities) and a one-for-one P(c a0 p) model (driven by a two-dimensional s table of substitution probabilities). This case covers Section 3, while more complex models are employed in later sections.</Paragraph> </Section> class="xml-element"></Paper>