File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3121_intro.xml
Size: 2,700 bytes
Last Modified: 2025-10-06 14:04:11
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3121"> <Title>Phramer - An Open Source Statistical Phrase-Based Translator</Title> <Section position="3" start_page="0" end_page="146" type="intro"> <SectionTitle> 2 Phramer </SectionTitle> <Paragraph position="0"> Phramer is a phrase-based SMT system written in Java. It includes: * A decoder that is compatible with Pharaoh (Koehn, 2004), * A minimum error rate training (MERT) module, compatible with Phramer's decoder, with Pharaoh and easily adaptable to other SMT or non-SMT tasks and * various tools.</Paragraph> <Paragraph position="1"> The decoder is fully compatible with Pharaoh 1.2 in the algorithms that are implemented, input files (configuration file, translation table, language models) and command line. Some of the advantages of Phramer over Pharaoh are: (1) source code availability and its permissive license; (2) it is very fast (1.5-3 times faster for most of the configurations); (3) it can work with various storage layers for the translation table (TT) and the language models (LMs): memory, remote (access through TCP/IP), disk (using SQLite databases1). Extensions for other storage layers can be very easily implemented; (4) it is more configurable; (5) it accepts compressed data files (TTs and LMs); (6) it is very easy to extend; an example is provided in the package - part-of-speech decoding on either source language, target language or both; support for POS-based language models; (7) it can internally generate n-best lists. Thus no external tools are required.</Paragraph> <Paragraph position="2"> The MERT module is a highly modular, efficient and customizable implementation of the algorithm described in (Och, 2003). The release has implementations for BLEU (Papineni et al., 2002), WER and PER error criteria and it has decoding interfaces for Phramer and Pharaoh. It can be used to search parameters over more than one million variables. It offers features as resume search, reuse hypotheses from previous runs and various strategies to search for optimal l weight vectors.</Paragraph> <Paragraph position="3"> The package contains a set of tools that include: * Distributed decoding (compatible with both Phramer and Pharaoh) - it automatically splits decoding jobs and distributes them to workers and assembles the results. It is compatible with lattice generation, therefore it can also be used during weights search (using MERT).</Paragraph> <Paragraph position="4"> * Tools to process translation tables - filter the TT based on the input file, flip TT to reuse it for English-to-Foreign translation, filter the TT by phrase length, convert the TT to a database.</Paragraph> </Section> class="xml-element"></Paper>