File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2036_intro.xml
Size: 4,026 bytes
Last Modified: 2025-10-06 14:03:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2036"> <Title>Factoring Synchronous Grammars By Sorting</Title> <Section position="3" start_page="0" end_page="279" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Synchronous Context-Free Grammars (SCFGs) are a generalization of the Context-Free Grammar (CFG) formalism to simultaneously produce strings in two languages. SCFGs have a wide range of applications, including machine translation, word and phrase alignments, and automatic dictionary construction. Variations of SCFGs go back to Aho and Ullman (1972)'s Syntax-Directed Translation Schemata, but also include the Inversion Transduction Grammars in Wu (1997), which restrict grammar rules to be binary, the synchronous grammars in Chiang (2005), which use only a single nonterminal symbol, and the Multitext Grammars in Melamed (2003), which allow independent rewriting, as well as other tree-based models such as Yamada and Knight (2001) and Galley et al. (2004).</Paragraph> <Paragraph position="1"> When viewed as a rewriting system, an SCFG generates a set of string pairs, representing some translation relation. We are concerned here with the time complexity of parsing such a pair, according to the grammar. Assume then a pair with each string having a maximum length of N, and consider an SCFG G of size |G|, with a bound of n nonterminals in the right-hand side of each rule in a single dimension, which we call below the rank of G. As an upper bound, parsing can be carried out in time O(|G|Nn+4) by a dynamic programming algorithm maintaining continuous spans in one dimension. As a lower bound, parsing strategies with discontinuous spans in both dimensions can take time Ohm(|G|Nc[?]n) for unfriendly permutations (Satta and Peserico, 2005). A natural question to ask then is: What if we could reduce the rank of G, preserving the generated translation? As in the case of CFGs, one way of doing this would be to factorize each single rule into several rules of rank strictly smaller than n. It is not dif cult to see that this would result in a new grammar of size at most 2*|G|. In the time complexities reported above, we see that such a size increase would be more than compensated by the reduction in the degree of the polynomial in N. We thus conclude that a reduction in the rank of an SCFG would result in more ef cient parsing algorithms, for most common parsing strategies.</Paragraph> <Paragraph position="2"> In the general case, normal forms with bounded rank are not admitted by SCFGs, as shown in (Aho and Ullman, 1972). Nonetheless, an SCFG with a rank of n may not necessarily meet the worst case of Aho and Ullman (1972). It is then reasonable to ask if our SCFG G can be factorized, and what is the smallest rank k < n that can be obtained in this way. This paper answers these two questions, by providing an algorithm that factorizes the rules of an input SCFG, resulting in a new, generatively equivalent, SCFG with rank k as low as possible. The algorithm works in time O(n log n) for each rule, regardless of the rank k of the factorized rules. As discussed above, in this way we achieve an improvement of the parsing time for SCFGs, obtaining an upper bound of O(|G|Nk+4) by using a parsing strategy that maintains continuous tions associated with the leaves can be produced by composing the permutations at the internal nodes.</Paragraph> <Paragraph position="3"> spans in one dimension.</Paragraph> <Paragraph position="4"> Previous work on this problem has been presented in Zhang et al. (2006), where a method is provided for casting an SCFG to a form with rank k = 2. If generalized to any value of k, that algorithm would run in time O(n2). We thus improve existing factorization methods by almost a factor of n. We also solve an open problem mentioned by Albert et al. (2003), who pose the question of whether irreducible, or simple, permutations can be recognized in time less than Th(n2).</Paragraph> </Section> class="xml-element"></Paper>