File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/n01-1023_concl.xml
Size: 2,154 bytes
Last Modified: 2025-10-06 13:53:01
<?xml version="1.0" standalone="yes"?> <Paper uid="N01-1023"> <Title>Applying Co-Training methods to Statistical Parsing</Title> <Section position="9" start_page="2" end_page="2" type="concl"> <SectionTitle> 8 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we proposed a new approach for training a statistical parser that combines labeled with unlabeled data. It uses a Co-Training method where a pair of models attempt to increase their agreement on labeling the data. The algorithm takes as input a small corpus of 9695 sentences (234467 word tokens) of bracketed data, a large pool of unlabeled text and a tag dictionary of lexicalized structures for each word in this training set (based on the LTAG formalism). The algorithm presented iteratively labels the unlabeled data set with parse trees. We then train a statistical parser on the combined set of labeled and unlabeled data.</Paragraph> <Paragraph position="1"> We obtained 80.02% and 79.64% labeled bracketing precision and recall respectively. The baseline model which was only trained on the 9695 sentences of labeled data performed at 72.23% and 69.12% precision and recall. These results show that training a statistical parser using our Co-training method to combine labeled and unlabeled data strongly outperforms training only on the labeled data.</Paragraph> <Paragraph position="2"> It is important to note that unlike previous studies, our method of moving towards unsupervised parsing can be directly compared to the output of supervised parsers.</Paragraph> <Paragraph position="3"> Unlike previous approaches to unsupervised parsing our method can be trained and tested on the kind of representations and the complexity of sentences that are found in the Penn Treebank.</Paragraph> <Paragraph position="4"> In addition, as a byproduct of our representation we obtain more than the phrase structure of each sentence.</Paragraph> <Paragraph position="5"> We also produce a more embellished parse in which phenomena such as predicate-argument structure, subcategorization and movement are given a probabilistic treatment. null</Paragraph> </Section> class="xml-element"></Paper>