File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/85/j85-4001_intro.xml
Size: 3,435 bytes
Last Modified: 2025-10-06 14:04:27
<?xml version="1.0" standalone="yes"?> <Paper uid="J85-4001"> <Title>ON THE COMPLEXITY OF ID/LP PARSING 1</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 GENERALIZING EARLEY'S ALGORITHM </SectionTitle> <Paragraph position="0"> Shieber generalizes Earley's algorithm by modifying the progress datum that tracks progress through a rule. The Earley algorithm uses the position of a dot to track.linear advancement through an ordered sequence of constituents. The major predicates and operations on such dotted rules are these: * A dotted rule is initialized with the dot at the left edge, as in X --, .ABC .</Paragraph> <Paragraph position="1"> * A dotted rule is advanced across a terminal or nonterminal that was predicted and has been located in the input by simply moving the dot to the right. For example, X -~ A.BC is advanced across a B by moving the dot to obtain X -~ AB.C .</Paragraph> <Paragraph position="2"> * A dotted rule is complete iff the dot is at the right edge. For example, X -~ ABC. is complete.</Paragraph> <Paragraph position="3"> * A dotted rule predicts a terminal or nonterminal iff the dot is immediately before the terminal or nonterminal.</Paragraph> <Paragraph position="4"> For example, X -~ A.BC predicts B.</Paragraph> <Paragraph position="5"> UCFG rules differ from CFG rules only in that the right-hand sides represent unordered multisets (that is, sets with repeated elements allowed). It is thus appropriate to use successive accumulation of set elements in place of linear advancement through a sequence. In essence, Shieber's algorithm replaces the standard operations on dotted rules with corresponding operations on what will be called dotted UCFG rules. 5 * A dotted UCFG rule is initialized with the empty multi-set before the dot and the entire multiset of right-hand elements after the dot, as in X --- { } * {A, B, C}.</Paragraph> <Paragraph position="6"> * A dotted UCFG rule is advanced across a terminal or nonterminal that was predicted and has been located in the input by simply moving one element from the multiset after the dot to the multiset before the dot.</Paragraph> <Paragraph position="7"> For example, X -- {d} * {B, C} is advanced across a B by moving the B to obtain X -* {A, B} * {C}. Similar206 Computational Linguistics, Volume 11, Number 4, October-December 1985 G. Edward Barton, Jr. On the Complexity of ID/LP Parsing ly, X -~ {A } * {B, C, C} may be advanced across a C to obtain X-~ {A, C}&quot; {B, C}.</Paragraph> <Paragraph position="8"> * A dotted UCFG rule is complete iff the multiset after the dot is empty. For example, X -~ {A, B, C} * {} is complete.</Paragraph> <Paragraph position="9"> * A dotted UCFG rule predicts a terminal or nonterminal iff the terminal or nonterminal is a member of the multiset after the dot. For example, X -~ {A} * {B, C} predicts B and C.</Paragraph> <Paragraph position="10"> Given these replacements for operations on dotted rules, Shieber's algorithm operates in the same way as Earley's algorithm. As usual, each state in the parser's state sets consists of a dotted rule tracking progress through a constituent plus the if~terword position defining the constituent's left edge (Earley 1970:95, omitting lookahead). The left-edge position is also referred to as the return pointer because of its role in the complete operation of the parser.</Paragraph> </Section> class="xml-element"></Paper>