File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1043_concl.xml

Size: 2,244 bytes

Last Modified: 2025-10-06 13:55:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1043">
  <Title>Reranking and Self-Training for Parser Adaptation</Title>
  <Section position="8" start_page="342" end_page="343" type="concl">
    <SectionTitle>
6 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> We have demonstrated that rerankers and self-trained models can work well across domains.</Paragraph>
    <Paragraph position="1"> Models self-trained on WSJ appear to be better parsing models in general, the benefits of which are not limited to the WSJ domain. The WSJ-trained reranker using out-of-domain LA Times parses (produced by the WSJ-trained reranker) achieves a labeled precision-recall f-measure of 87.8% on Brown data, nearly equal to the performance one achieves by using a purely Brown trained parser-reranker. The 87.8% f-score on Brown represents a 24% error reduction on the corpus.</Paragraph>
    <Paragraph position="2"> Of course, as corpora differences go, Brown is relatively close to WSJ. While we also find that our  the difference in parentheses as estimated by a randomization test with 106 samples. &amp;quot;x/y&amp;quot; indicates that the first-stage parser was trained on data set x and the second-stage reranker was trained on data set y. &amp;quot;best&amp;quot; WSJ-parser-reranker improves performance on the Switchboard corpus, it starts from a much lower base (74.0%), and achieves a much less significant improvement (3% absolute, 11% error reduction). Bridging these larger gaps is still for the future.</Paragraph>
    <Paragraph position="3"> One intriguing idea is what we call &amp;quot;self-trained bridging-corpora.&amp;quot; We have not yet experimented with medical text but we expect that the &amp;quot;best&amp;quot; WSJ+NANC parser will not perform very well.</Paragraph>
    <Paragraph position="4"> However, suppose one does self-training on a biology textbook instead of the LA Times. One might hope that such a text will split the difference between more &amp;quot;normal&amp;quot; newspaper articles and the specialized medical text. Thus, a self-trained parser based upon such text might do much better than our standard &amp;quot;best.&amp;quot; This is, of course, highly speculative.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML