File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1024_intro.xml

Size: 8,909 bytes

Last Modified: 2025-10-06 14:06:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1024">
  <Title>Parsing Algorithms and Metrics</Title>
  <Section position="3" start_page="0" end_page="178" type="intro">
    <SectionTitle>
2 Evaluation Metrics
</SectionTitle>
    <Paragraph position="0"> In this section, we first define basic terms and symbols. Next, we define the different metrics used in evaluation. Finally, we discuss the relationship of these metrics to parsing algorithms.</Paragraph>
    <Section position="1" start_page="0" end_page="177" type="sub_section">
      <SectionTitle>
2.1 Basic Definitions
</SectionTitle>
      <Paragraph position="0"> Let Wa denote word a of the sentence under consideration. Let w b denote WaW~+l...Wb-lWb; in particular let w~ denote the entire sequence of terminals (words) in the sentence under consideration.</Paragraph>
      <Paragraph position="1"> In this paper we assume all guessed parse trees are binary branching. Let a parse tree T be defined as a set of triples (s, t, X)--where s denotes the position of the first symbol in a constituent, t denotes the position of the last symbol, and X represents a terminal or nonterminal symbol--meeting the following three requirements:  * The sentence was generated by the start symbol, S. Formally, (1, n, S) E T.</Paragraph>
      <Paragraph position="2"> * Every word in the sentence is in the parse tree. Formally, for every s between 1 and n the triple (s,s, ws) E T.</Paragraph>
      <Paragraph position="3"> * The tree is binary branching and consistent.</Paragraph>
      <Paragraph position="4"> Formally, for every (s,t, X) in T, s C/ t, there is exactly one r, Y, and Z such that s &lt; r &lt; t and (s,r,Y) E T and (r+ 1,t,Z) e T.</Paragraph>
      <Paragraph position="5"> Let Tc denote the &amp;quot;correct&amp;quot; parse (the one in the treebank) and let Ta denote the &amp;quot;guessed&amp;quot; parse (the one output by the parsing algorithm). Let Na denote \[Tal, the number of nonterminals in the guessed parse tree, and let Nc denote \[Tel, the number of nonterminals in the correct parse tree.</Paragraph>
    </Section>
    <Section position="2" start_page="177" end_page="177" type="sub_section">
      <SectionTitle>
2.2 Evaluation Metrics
</SectionTitle>
      <Paragraph position="0"> There are various levels of strictness for determining whether a constituent (element of Ta) is &amp;quot;correct.&amp;quot; The strictest of these is Labelled Match. A constituent (s,t, X) E Te is correct according to Labelled Match if and only if (s, t, X) E To. In other words, a constituent in the guessed parse tree is correct if and only if it occurs in the correct parse tree.</Paragraph>
      <Paragraph position="1"> The next level of strictness is Bracketed Match.</Paragraph>
      <Paragraph position="2"> Bracketed match is like labelled match, except that the nonterminal label is ignored. Formally, a constituent (s, t, X) ETa is correct according to Bracketed Match if and only if there exists a Y such that (s,t,Y) E To.</Paragraph>
      <Paragraph position="3"> The least strict level is Consistent Brackets (also called Crossing Brackets). Consistent Brackets is like Bracketed Match in that the label is ignored.</Paragraph>
      <Paragraph position="4"> It is even less strict in that the observed (s,t,X) need not be in Tc--it must simply not be ruled out by any (q, r, Y) e To. A particular triple (q, r, Y) rules out (s,t, X) if there is no way that (s,t,X) and (q, r, Y) could both be in the same parse tree.</Paragraph>
      <Paragraph position="5"> In particular, if the interval (s, t) crosses the interval (q, r), then (s, t, X) is ruled out and counted as an error. Formally, we say that (s, t) crosses (q, r) if and only ifs&lt;q&lt;t &lt;rorq&lt;s&lt;r&lt;t.</Paragraph>
      <Paragraph position="6"> If Tc is binary branching, then Consistent Brackets and Bracketed Match are identical. The following symbols denote the number of constituents that match according to each of these criteria.</Paragraph>
      <Paragraph position="8"> crossing (s,t)}\[ : the number of constituents in TG correct according to Consistent Brackets.</Paragraph>
      <Paragraph position="9"> Following are the definitions of the six metrics used in this paper for evaluating binary branching trees: The in the following table:  (1) Labelled Recall Rate = L/Nc. (2) Labelled Tree Rate = 1 if L = ATe. It is also called the Viterbi Criterion.</Paragraph>
      <Paragraph position="10"> (3) Bracketed Recall Rate = B/Nc.</Paragraph>
      <Paragraph position="11"> (4) Bracketed Tree Rate = 1 if B = Nc. (5) Consistent Brackets Recall Rate = C/NG. It is often called the Crossing Brackets Rate. In the case where the parses are binary branching, this criterion is the same as the Bracketed Recall Rate.</Paragraph>
      <Paragraph position="12"> (6) Consistent Brackets Tree Rate = 1 if C = No.  This metric is closely related to the Bracketed Tree Rate. In the case where the parses are binary branching, the two metrics are the same. This criterion is also called the Zero Crossing Brackets Rate.</Paragraph>
      <Paragraph position="13"> preceding six metrics each correspond to cells</Paragraph>
    </Section>
    <Section position="3" start_page="177" end_page="178" type="sub_section">
      <SectionTitle>
2.3 Maximizing Metrics
</SectionTitle>
      <Paragraph position="0"> Despite this long list of possible metrics, there is only one metric most parsing algorithms attempt to maximize, namely the Labelled Tree Rate. That is, most parsing algorithms assume that the test corpus was generated by the model, and then attempt to evaluate the following expression, where E denotes the expected value operator:</Paragraph>
      <Paragraph position="2"> This is true of the Labelled Tree Algorithm and stochastic versions of Earley's Algorithm (Stolcke, 1993), and variations such as those used in Picky parsing (Magerman and Weir, 1992). Even in probabilistic models not closely related to PCFGs, such as Spatter parsing (Magerman, 1994), expression (1) is still computed. One notable exception is Brill's Transformation-Based Error Driven system (Brill, 1993), which induces a set of transformations designed to maximize the Consistent Brackets Recall Rate. However, Brill's system is not probabilistic.</Paragraph>
      <Paragraph position="3"> Intuitively, if one were to match the parsing algorithm to the evaluation criterion, better performance should be achieved.</Paragraph>
      <Paragraph position="4"> Ideally, one might try to directly maximize the most commonly used evaluation criteria, such as Consistent Brackets Recall (Crossing Brackets)  Rate. Unfortunately, this criterion is relatively difficult to maximize, since it is time-consuming to compute the probability that a particular constituent crosses some constituent in the correct parse. On the other hand, the Bracketed Recall and Bracketed Tree Rates are easier to handle, since computing the probability that a bracket matches one in the correct parse is inexpensive. It is plausible that algorithms which optimize these closely related criteria will do well on the analogous Consistent Brackets criteria.</Paragraph>
    </Section>
    <Section position="4" start_page="178" end_page="178" type="sub_section">
      <SectionTitle>
2.4 Which Metrics to Use
</SectionTitle>
      <Paragraph position="0"> When building an actual system, one should use the metric most appropriate for the problem. For instance, if one were creating a database query system, such as an ATIS system, then the Labelled Tree (Viterbi) metric would be most appropriate. A single error in the syntactic representation of a query will likely result in an error in the semantic representation, and therefore in an incorrect database query, leading to an incorrect result. For instance, if the user request &amp;quot;Find me all flights on Tuesday&amp;quot; is misparsed with the prepositional phrase attached to the verb, then the system might wait until Tuesday before responding: a single error leads to completely incorrect behavior. Thus, the Labelled Tree criterion is appropriate.</Paragraph>
      <Paragraph position="1"> On the other hand, consider a machine assisted translation system, in which the system provides translations, and then a fluent human manually edits them. Imagine that the system is given the foreign language equivalent of &amp;quot;His credentials are nothing which should be laughed at,&amp;quot; and makes the single mistake of attaching the relative clause at the sentential level, translating the sentence as &amp;quot;His credentials are nothing, which should make you laugh.&amp;quot; While the human translator must make some changes, he certainly needs to do less editing than he would if the sentence were completely misparsed. The more errors there are, the more editing the human translator needs to do. Thus, a criterion such as the Labelled Recall criterion is appropriate for this task, where the number of incorrect constituents correlates to application performance.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML