File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1024_intro.xml

Size: 1,566 bytes

Last Modified: 2025-10-06 14:02:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1024">
  <Title>Ef cient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Grammar Transformation
</SectionTitle>
    <Paragraph position="0"> The CKY algorithm requires a grammar in Chomsky normal form where the right-hand side of each rule either consists of two non-terminals or a single terminal symbol. BitPar uses a modi ed version of the CKY algorithm allowing also chain rules (rules with a single non-terminal on the right-hand side). BitPar expects that the input grammar is already epsilon-free and that terminal symbols only occur in unary rules. Rules with more than 2 non-terminals on the right-hand side are split into binary rules by applying a transformation algorithm proposed by Andreas Eisele1. It is a greedy algorithm which tries to minimise the number of binarised rules by combining frequently cooccurring symbols rst. The algorithm consists of the following two steps which are iterated until all rules are either binary or unary.</Paragraph>
    <Paragraph position="1">  1. Compute the frequencies of the pairs of neighboring symbols on the right-hand sides of rules. (The rule A a0 B C D, e.g., adds 1 to the counts ofa1B,Ca2 and a1C,Da2, respectively.) 2. Determine the most frequent pair a1A,Ba2. Add  a new non-terminal X. Replace the symbol pair 1personal communication A B in all grammar rules with X. Finally, add the rule Xa0 A B to the grammar.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML