File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1043_concl.xml
Size: 2,244 bytes
Last Modified: 2025-10-06 13:55:19
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1043"> <Title>Reranking and Self-Training for Parser Adaptation</Title> <Section position="8" start_page="342" end_page="343" type="concl"> <SectionTitle> 6 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> We have demonstrated that rerankers and self-trained models can work well across domains.</Paragraph> <Paragraph position="1"> Models self-trained on WSJ appear to be better parsing models in general, the benefits of which are not limited to the WSJ domain. The WSJ-trained reranker using out-of-domain LA Times parses (produced by the WSJ-trained reranker) achieves a labeled precision-recall f-measure of 87.8% on Brown data, nearly equal to the performance one achieves by using a purely Brown trained parser-reranker. The 87.8% f-score on Brown represents a 24% error reduction on the corpus.</Paragraph> <Paragraph position="2"> Of course, as corpora differences go, Brown is relatively close to WSJ. While we also find that our the difference in parentheses as estimated by a randomization test with 106 samples. &quot;x/y&quot; indicates that the first-stage parser was trained on data set x and the second-stage reranker was trained on data set y. &quot;best&quot; WSJ-parser-reranker improves performance on the Switchboard corpus, it starts from a much lower base (74.0%), and achieves a much less significant improvement (3% absolute, 11% error reduction). Bridging these larger gaps is still for the future.</Paragraph> <Paragraph position="3"> One intriguing idea is what we call &quot;self-trained bridging-corpora.&quot; We have not yet experimented with medical text but we expect that the &quot;best&quot; WSJ+NANC parser will not perform very well.</Paragraph> <Paragraph position="4"> However, suppose one does self-training on a biology textbook instead of the LA Times. One might hope that such a text will split the difference between more &quot;normal&quot; newspaper articles and the specialized medical text. Thus, a self-trained parser based upon such text might do much better than our standard &quot;best.&quot; This is, of course, highly speculative.</Paragraph> </Section> class="xml-element"></Paper>