Limitations of Co-Training for Natural Language Learning from Large Datasets
by David Pierce and Claire Cardie

References

S. Argamon, I. Dagan, and Y. Krymolowski. 1999. A memory-based approach to learning shallow natural language patterns. Journal of Experimental and Theoretical Artificial Intelligence, 11(3). Available as cmp-lg/9806011.
A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98).
C. Cardie and D. Pierce. 1998. Error-driven pruning of treebank grammars for base noun phrase identification. In Proceedings of the 36th Annual Meeting of the ACL and COLING-98, pages 218224. Available as cmp-lg/9808015.
C. Cardie, V. Ng, D. Pierce, and C. Buckley.  2000. Examining the role of statistical and linguistic knowledge sources in a general-knowledge question answering system. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-NAACL 2000), pages 180187.
K. Church. 1988. A stochastic parts programs and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, pages 136143.
D. Cohn, L. Atlas, and R. Ladner. 1994. Improving generalization with active learning. Machine Learning, 15(2):201221.
M. Collins and Y. Singer. 1999. Unsupervised models for named entity classification. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99).
A. Dempster, N. Laird, and D. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):138. 
Y. Freund and R. Shapire. 1997. A decisiontheoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 139.
D. Lewis and J. Catlett. 1994. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148 156.
M.Marcus, M. Marcinkiewicz, andB. Santorini. 1993. Building a large annotated corpus of  English: The Penn Treebank. Computational Linguistics, 19(2):313330.
M. Mitra, C. Buckley, A. Singhal, and C. Cardie. 1997. An analysis of statistical and syntactic phrases. In 5TH RIAO Conference, Computer-Assisted Information Searching On the Internet, pages 200214.
I. Muslea, S. Minton, and C. Knoblock. 2000. Selective sampling with redundant views. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 621 626.
K. Nigam and R. Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Ninth International Conference on Information and Knowledge Management (CIKM- 2000).
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103134.
L. Ramshaw and M. Marcus. 1998. Text chunking using transformation-based learning. In Natural Language Processing Using Very Large Corpora. Kluwer. Originally appeared in WVLC95.
E. Riloff and R. Jones. 1999. Learning dictionaries for information extraction by multilevel bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, pages 474479.
E. Tjong Kim Sang and J. Veenstra. 1999. Representing text chunks. In Proceedings of EACL99. Available as cs.CL/9907006.
D. Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 189196.
