File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1017_evalu.xml

Size: 6,158 bytes

Last Modified: 2025-10-06 13:58:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1017">
  <Title>Probabilistic Parsing and Psychological Plausibility</Title>
  <Section position="5" start_page="113" end_page="115" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="113" end_page="113" type="sub_section">
      <SectionTitle>
5.1 Data
</SectionTitle>
      <Paragraph position="0"> We use sections 2 - 21 of the Wall Street JourlYecl)ank (Marcus el; al., nal part of' the Penn ~ &amp;quot; 1993) to generate a treebank grammar. Traces, flmctional tags and other tag extensions that do not mark syntactic category are removed before training 3. No other modifications are made. For testing, we use the \] 578 sentences of length 40 or less of section 22. The input to the parser is the sequence of i)art-ofspeech tags.</Paragraph>
    </Section>
    <Section position="2" start_page="113" end_page="113" type="sub_section">
      <SectionTitle>
5.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> For evaluation, we use the parsewfi measures and report labeld F-score (the harmolfiC mean of labeled recall and labeled precision). R.eporting the F-score makes ore&amp;quot; results comt)aral)le to those of other previous experinmnts using the same data sets. As a nleasure of the anlount of work done by the parser, we report the size of the chart. The mnnl)er of active and immrive edges that enter the chart is given tbr the exhaustive search, not cored;lug those hypothetical edges theft are replaced or rejected because there is an alternative edge with higher t)robat)ility 4. For t)runed search, we give |:tie percentage of edges required.</Paragraph>
    </Section>
    <Section position="3" start_page="113" end_page="114" type="sub_section">
      <SectionTitle>
5.3 Fixed Beam
</SectionTitle>
      <Paragraph position="0"> For our experiments, we define the beam by a maximunl number of edges per span. Beams for active and inactive edges are set separately. The Imams run from 2 to 12, and we test all 2Here, we use proper prefixes, i.e., all prefixes not including the last element.</Paragraph>
      <Paragraph position="1"> aAs an example, PP-TMP=3 is replaced 173, PP.</Paragraph>
      <Paragraph position="2">  age of edges relative to (',xhaustiv(; s(;ar(:h mid l;h(', F-s(:()re a(:hieved with this chart size. Exhaustive search yiehled 71.21% fin&amp;quot; th(; original en(:o(ting and 7!).28% for the I)arent (m(:o(ting. l/.c, sull;s in the grey ar(;as are equiwflent with a (:()nli(l('n(:('~ (tegr(',e of (~ =: 0.99. 12\] comlfi\]~ati(ms of the, s(~ lmmus for ac:i;ivc and illactiw~ edges, l~ach setting results in a lm.ri;ic ulm&amp;quot; average size of l;he chart and an F-score, which arc tel)erred ill (;he following se(:l;ioll.</Paragraph>
    </Section>
    <Section position="4" start_page="114" end_page="115" type="sub_section">
      <SectionTitle>
5.4 Experimental Results
</SectionTitle>
      <Paragraph position="0"> The results of our 121 tes(; Hills with (tifl'erent settings for active and in;u:tivc \])(~a.ms m'e given in figure 1. The (tittgranl shows ch~trt sizes vs.</Paragraph>
      <Paragraph position="1"> labeled F-scores. It sorts char|; sizes across dif ferent sel;l;ings of the beams. If several beam sett;ings result in equiwdenfi chart sizes, the diagram cent;tins the one yielding th(', highes|, FSCOI'(L null The 111~ill tinding is thai: we can r('xlu(:e the size of the chart to l)el;ween 1% and 3% of the size required fi)r exhaustive s(,ar(:h without affecting the results. Only very small 1)cams d(;grad(' t)ertbrmance 5. The eiti;ct occurs for both models despite the simple ranking formub~.</Paragraph>
      <Paragraph position="2"> This significantly reduces memory r(,quirements '~Givc, n the' amount of test data (26,322 non-terminal nod(!s), results within a rang(' of around 0.7% arc cquival(mt with a (:onfidcnc(; degr(',(, of (~ = 99%. (given as size of the chart) and increases l)m'sing qmed.</Paragraph>
      <Paragraph position="3"> i1 t Exhaustive search yields an I-Score of 71.21 % when using the original Petal %'eel)ank cn(:odh~g. ()nly around 1% the edges are re(tuir('.d to yield e.(tuiwdcnt resul(;s with incrcm(,.ntal processing and printing after each word is added to the chart;. This result is, among other settings, obtained by a tixcd beam of 2 for inactive edges and 3 tin&amp;quot; active e(lges ri 1,br the parmtt encoding, exhaustive search yields an l,-Scorc of 79.28%. Only 1)etween 2 mM 3% of the edges are required to yMd an equiwflcnt result with incremental t)l'OCcSSillg and pruning. As an cXmnl)le, the point at size = 3.0% F-score = 79.1% is generated by the beam setting of 12 for imml;ive and 9 tbr active edges. The parent encoding yields around 8% higher F-scores but it also imposes a higher absolute and relative memory load on t;he process.</Paragraph>
      <Paragraph position="4"> The higher (hw'ee of par~dlelism in l;he inactive (;Using variable Imams, wc would nccd \].95% of the \[:hart entries 1;o achieve an (KlllivalenI; F-scor(x  chart stems from the parent hytmthesis in each node. In terms of pure node categories, the average number of parallel nodes at this point is</Paragraph>
    </Section>
    <Section position="5" start_page="115" end_page="115" type="sub_section">
      <SectionTitle>
3.5 7 .
</SectionTitle>
      <Paragraph position="0"> Exhaustive search for the base encoding needs in average 140,000 edges per sentence, tbr tile parent encoding 200,000 edges; equivalent results for the base encoding can be achieved with around 1% of these edges, equivalent results tbr the parent encoding need between 2 and 3%.</Paragraph>
      <Paragraph position="1"> The lower mmlber of edges significantly increases parsing speed. Using exhaustive search tbr the base model, the parser processes 3.0 tokens per second (measured on a Pentium III 500; no serious efforts of optimization have gone into the parser). With a chart size of 1%, speed is 630 tokens/second. This is a factor of 210 without decreasing accuracy. Sl)eed for the parent model is 0.5 tokens/second (exhaustive) and 111 tokens/seconds (3.0% chart size), yielding an improvement by factor 220.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML