File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2132_metho.xml

Size: 12,396 bytes

Last Modified: 2025-10-06 14:12:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-2132">
  <Title>Island Parsing and Bidirectional Charts</Title>
  <Section position="4" start_page="636" end_page="636" type="metho">
    <SectionTitle>
2. Chart parsing
</SectionTitle>
    <Paragraph position="0"> Chart parsing is a very powerful idea for parsing natural language. It was introduced by Martin Kay \[1973, 1980\] and Ronald Kaplan \[1973\] and historically was inspired by Earley'r. algorithm \[1970\]. The most basic goal in introducing the chart was to reduce the complexity of a nondeterministic parsing algorithm.</Paragraph>
    <Paragraph position="1"> An advantage of chart parsing is that the mechanism is perfectly suited for both bottom-up and top-down parsing.</Paragraph>
    <Paragraph position="2"> A further advantage is that the chart can be complemented with an agenda. In this way, instead of introducing new edges following the rigid application of the algorithm, tasks can be added to the agenda and at every moment a scheduling function can decide the order in which tasks should be executed, in a multiprogramming fashion. Very easily the scheduling function can implement depth-first control and breadth-first control, but any kind of control can in principle be inserted \[see for instance Stock 1987\].</Paragraph>
    <Paragraph position="3"> Also, a particular point is that the input relation with other levels of analysis is very coherent: lcxical ambiguity results in the very simple fact that more than one inactive edge are introduced for an mnbiguous word.</Paragraph>
  </Section>
  <Section position="5" start_page="636" end_page="636" type="metho">
    <SectionTitle>
3. Bidirectionality
</SectionTitle>
    <Paragraph position="0"> Chart parsing has a positive aspect and some evident problems in facing speech recognition. Typical of continuous speech recognition are the following aspects: 1) The separation between words is not univocally given; one of the tasks of the sentence parser is exactly to yield suggestion:3 for word separations. In the chart this can be very well accomplished introducing more vertices, one for every hypothetical separation point. Vertices must be ordered and ordering here is provided by the time order relation. Therefore we can introduce a vertices structure 2) Some words in the input matrix are anchored as &amp;quot;surely&amp;quot; recognized while others are only very tentative interpretations. It makes sense that the analysis privileges elements of the first type as starting points.</Paragraph>
    <Paragraph position="1"> This is the concept of island parsing, for which the parser tries to make sense of portions of a sentence starting from fixed points (islands), that can occur in any position. The traditional chart mechanism cannot deal with this task.</Paragraph>
    <Paragraph position="2"> 3) Island parsing is required to get to the extreme borders of the recognizable fragments, and from that situation help in making suggestions for the unrecognized fragments based on both the left and the right contexts.</Paragraph>
    <Paragraph position="3"> IIere again the traditional chart mechanism cannot deal with this task.</Paragraph>
    <Paragraph position="4"> We are now going to introduce a new concept: bidirectional charts.</Paragraph>
    <Paragraph position="5"> Data structures must be rearranged in this connection and the whole parsing process will be different: things get complex if one wants to preserve the good qualities of charts and be reasonably efficient.</Paragraph>
    <Paragraph position="6"> We begin with redifining active edges.</Paragraph>
    <Paragraph position="7"> An active edge here is a data structure that includes two positions in the rule involved : an initial position and a final position, such that a fragment is covered by the given edge in reference to a ti'agmeut of the right bandside of the rule.</Paragraph>
    <Paragraph position="8"> Therefore an active edge is characterized by from, the left vertex, to, the right vertex, rule, the referred rule, fromposition, the first of the two positions in the rule, toposition, the second of the positions, and sub-inactives, the list of the immediately spanned inactive edges that were included.</Paragraph>
    <Paragraph position="9"> Inactive edges are characterized as usual, by from,to and cat, the category.</Paragraph>
    <Paragraph position="10"> Let us now say that an active edge E is locally rightward largest iff there is no other active edge E' with from(E')= from(E), rule(E') = rule(E), fi'omposition(E')=fromposition(E) and sub-inactives(E') including as an initial substring sub-inactives(E).</Paragraph>
    <Paragraph position="11"> Analogously we.can define a locally leftward largest edge. Vt0...Vt I ...... Vti.. .... Vt n, with for i=O, n-1 ti &lt;ti+i where for every vertex arrives or leaves at least one lexical edge. It just does not matter if the final analysis will not &amp;quot;make use&amp;quot; of all the vertices in the chart. We then define four different rules for introducing a new edge in the chart: The first rule says, roughly, that if you are trying to build the same thing from the left and from the right you should unify your efforts.</Paragraph>
  </Section>
  <Section position="6" start_page="636" end_page="636" type="metho">
    <SectionTitle>
A-A Rule:
</SectionTitle>
    <Paragraph position="0"> If we have two active edges Ai and A2,</Paragraph>
    <Paragraph position="2"> and A1 is locally leftward largest and A2 is locally rightward largest, then we can introduce a new active edge A3 into the chart with</Paragraph>
    <Paragraph position="4"> concat(sub-inactives(Al), sub-inactives(A2)), where concat is the usual string concatenation operator.</Paragraph>
    <Paragraph position="5"> If fromposition(Al)=0 and toposition(A2)=n, number of symbols in the right hand side of rule(A1), an inactive edge I is introduced instead, with from(I)=from(Al), to(I)=to(A2) and cat(I) equal to the left hand side of rule(AlL We also maintain the usual edge combination rule, with the extension to the two directions.</Paragraph>
  </Section>
  <Section position="7" start_page="636" end_page="638" type="metho">
    <SectionTitle>
A-I Rule:
</SectionTitle>
    <Paragraph position="0"> Given an active edge A and an inactive edge I with from(I)=to(A), and, having named i toposition(A), with iC/ n (the number of symbols in the right hand side of the rule), cat(I)= Ci +1, i + 1-th symbol of the right hand side of rule(A), then a new edge E can be added to the chart, with from(E)=from(A), to(E)=to(I), and, if i+l=n was the last symbol in rule(A) and fromposition(A)=0, E will be an inactive edge with cat(E) equal to the left hand side of rule(A), if not it will be an active edge with</Paragraph>
    <Paragraph position="2"> Similarly, if to(I)=from(A), and having named i fromposition(A), iC/ 0, cat(I) = Ci-I , i-l-th symbol of the right handside of rule(A), then a new edge E can be added to the chart, with from(E)= from(I), to(E)= to(A), and, if i1 = 0 and toposition(A) is equal to the length of the right handside of rule(A), E will be an inactive edge with cat(E) equal to the left handside of rule(A), if not, it will be an active edge with rule(E)=rule(A), fromposition(E)=i-1, toposition(E) = toposition(A).</Paragraph>
    <Paragraph position="3"> Let us now recall our classification of word hypotheses into three classes, say a, b, c, in relation ~ their scores. As stated earlier, we consider word hypotheses of class a the islands for our process. The algorithm will proceed outward from the islands and bottom-up when a  constituent including an island (however far inside the structure) is completed. Let us say that an edge has another feature, called withisland, a boolean that is originally true for lexical edges of class a and false for the others, and during the process is propagated to any new edge that &amp;quot;includes&amp;quot; an edge with withisland = true. We can now state the I/bu Rule: When an inactive edge I, with .withisland(I)=true, is introduced in the chart, a new active edge is introduced for every rule R in the grammar that includes on its right hand side the symbol cat(I) and in relation to R for every position i such that cat(I) is the i + 1-th symbol on the right hand side of R. Let us denote such a generic active edge as A; its characteristics will be from(A)= from(I),</Paragraph>
    <Paragraph position="5"> We have also the usual top-down rule, rivisited consistently with our approach: A/td Rule: When an active edge A is added to the chart, if from the vertex to(A) only edges with withisland=false leave rightward, then introduce a cycling active edge on to(A) for every rule that has on the left handside the symbol that comes after the position toposition(A) for rule rule(A), unless there is already an active edge with that rule or an inactive edge with that category. Do likewise on the other vertex.</Paragraph>
    <Paragraph position="6"> The meaning of the presence of both the I/bu and the A/td rules is that the process will be a bottom-up one, starting from the islands. When a point is met where only class b words are found, hypotheses of the presence of certain constituents, according to the &amp;quot;island&amp;quot; constraints, are introduced in the form of cycling active edges. This top-down operation will ensure that the parser is led by the most consolidated fragments.</Paragraph>
    <Paragraph position="7"> Every time we introduce a new active edge A we must perform a redundancy check to ensure that we do not build, not only now, but also in the forseeable future, anything that has already been built.</Paragraph>
    <Paragraph position="8"> r/Check: A new active edge A can be inserted in the chart unless from the vertex from(A) there is an active edge A' leaving rightward with rule(A') = rule(A), fromposition(A')=fromposition(A) and sub-inaetives(A') including as an initial substring sub-inactives(A).</Paragraph>
    <Paragraph position="9"> Similarly, A can be inserted in the chart unless from the vertex to(A) there is an active edge A' leaving leftward with rnle(A')=rule(A), toposition(A')=topos~tion(/k) and sub-inactives(A') including as a final substring sub-inactives (A).</Paragraph>
    <Paragraph position="10"> It is conw~nient that the above rules be applied in the given order so as to minimize the effort.</Paragraph>
    <Paragraph position="11"> As regards the question of control, it seems reasonable that all edge building tasks originated by an island should be carried on in the first place, and the actions resulting from l~redictions over class b hypotheses be carried out later, in order to avoid an explosion of fuzzy edges in the chart. Still, it is clear that, because of the nature of the algorithm, after the introduction of an edge of the second type, an edge building action originated by an island can take place again.</Paragraph>
    <Paragraph position="12"> With this in mind we introduce two agendas, a-agenda, where tasks of building edges with withisland---true are added and b-agenda where the other tasks are added.</Paragraph>
    <Paragraph position="13"> Task execution is constrained only by the discipline that a task in b-agenda can be executed only if a-agenda is empty. At the beginning of the process a-agenda is filled with all the tasks originated by the class a word hypothese:~.</Paragraph>
    <Paragraph position="14"> class a words: MILAN, BOSS class b words: THE WANTS AN IMMEDIATE CALL TO rules: 1) S-&gt;NPVNPPP  2) S-&gt; NPVP 3) NP- &gt; ProperN 4) NP - &gt; DET N 5) NP - &gt; DET ADJ N 6) PP-&gt; PREPNP 7) VP -&gt; V NP  We shall insert inactive edges in the lower side of the sentence and active edges in the upper side of the sentence. The edge being processed is drawn with a dotted line, the possible other edge considered in the rule that is currently applied is drawn with a dashed line, the resulting edge is drawn with a bold line.</Paragraph>
    <Paragraph position="15"> The process starts bottom-up fi'om the islands (class a words) MILAN and BOSS, introducing active, inactive and cycling adges into the cahrt, following the composition rules introduced before.</Paragraph>
  </Section>
  <Section position="8" start_page="638" end_page="638" type="metho">
    <SectionTitle>
4. All example
</SectionTitle>
    <Paragraph position="0"> We shall present here an example of parsing with the concepts iritroduced in this paper. The sentence is :THE</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML