File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/p02-1001_concl.xml
Size: 2,470 bytes
Last Modified: 2025-10-06 13:53:18
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1001"> <Title>Parameter Estimation for Probabilistic Finite-State Transducers</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> We have exhibited a training algorithm for parameterized finite-state machines. Some specific consequences that we believe to be novel are (1) an EM algorithm for FSTs with cycles and epsilons; (2) training algorithms for HMMs and weighted contextual edit distance that work on incomplete data; (3) end-to-end training of noisy channel cascades, so that it is not necessary to have separate training data for each machine in the cascade (cf. Knight and Graehl, 20If xi and yi are acyclic (e.g., fully observed strings), and f (or rather its FST) has no : cycles, then composition will &quot;unroll&quot; f into an acyclic machine. If only xi is acyclic, then the composition is still acyclic if domain(f) has no cycles.</Paragraph> <Paragraph position="1"> 1998), although such data could also be used; (4) training of branching noisy channels (footnote 7); (5) discriminative training with incomplete data; (6) training of conditional MEMMs (McCallum et al., 2000) and conditional random fields (Lafferty et al., 2001) on unbounded sequences.</Paragraph> <Paragraph position="2"> We are particularly interested in the potential for quickly building statistical models that incorporate linguistic and engineering insights. Many models of interest can be constructed in our paradigm, without having to write new code. Bringing diverse models into the same declarative framework also allows one to apply new optimization methods, objective functions, and finite-state algorithms to all of them.</Paragraph> <Paragraph position="3"> To avoid local maxima, one might try deterministic annealing (Rao and Rose, 2001), or randomized methods, or place a prior on . Another extension is to adjust the machine topology, say by model merging (Stolcke and Omohundro, 1994). Such techniques build on our parameter estimation method.</Paragraph> <Paragraph position="4"> The key algorithmic ideas of this paper extend from forward-backward-style to inside-outside-style methods. For example, it should be possible to do end-to-end training of a weighted relation defined by an interestingly parameterized synchronous CFG composed with tree transducers and then FSTs.</Paragraph> </Section> class="xml-element"></Paper>