References

1   Akaike, H. 1974. A New Look at the Statistical Model Identification. IEEE Trans. Autom. Control, vol. AC-19, pp. 716--723. 

2   L. Douglas Baker , Andrew Kachites McCallum, Distributional clustering of words for text classification, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.96-103, August 24-28, 1998, Melbourne, Australia 

3   Peter F. Brown , Peter V. deSouza , Robert L. Mercer , Vincent J. Della Pietra , Jenifer C. Lai, Class-based n-gram models of natural language, Computational Linguistics, v.18 n.4, p.467-479, December 1992 

4   Kenneth Ward Church , Patrick Hanks, Word association norms, mutual information, and lexicography, Computational Linguistics, v.16 n.1, p.22-29, March 1990 

5   Inderjit S. Dhillion, Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning, University of Texas at Austin, Austin, TX, 2001 

6   Thomas Hofmann , Jan Puzicha, Statistical Models for Co-occurrence Data, Massachusetts Institute of Technology, Cambridge, MA, 1998 

7   Thorsten Joachims, Text Categorization with Suport Vector Machines: Learning with Many Relevant Features, Proceedings of the 10th European Conference on Machine Learning, p.137-142, April 21-23, 1998 

8   Hang Li , Naoki Abe, Word clustering and disambiguation based on co-occurrence data, Proceedings of the 17th international conference on Computational linguistics, p.749-755, August 10-14, 1998, Montreal, Quebec, Canada 

9   McCallum, A. and Nigam, K. 1998. A Comparison of Event Models for Naive Bayes Text Classification. Proceedings of AAAI-98 Workshop on Learning for Text Categorization, pp. 41--48. 

10   Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997 

11   Kamal Nigam , Andrew Kachites McCallum , Sebastian Thrun , Tom Mitchell, Text Classification from Labeled and Unlabeled Documents using EM, Machine Learning, v.39 n.2-3, p.103-134, May-June 2000 

12   Rissanen, J. 1987. Stochastic Complexity. Journal of Royal Statistical Society, Series B, 49(3), pp. 223--239. 

13   Schmid, H. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of International Conference on New Methods in Language Processing, pp. 44--49, Manchester. 

14   Noam Slonim , Naftali Tishby, Document clustering using word clusters via the information bottleneck method, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.208-215, July 24-28, 2000, Athens, Greece 

15   Slonim, N. and Tishby, N. 2001. The Power of Word Clusters for Text Classification. 23rd European Colloquium on Information Retrieval Research. 

16   Tishby, N., Pereira, F. and Bialek, W. 1999. The Information Bottleneck Method. Proceedings of the 37-th Annual Allerton Conference on Communication, Control and Computing, pp. 368--377. 

17   Vladimir N. Vapnik, The nature of statistical learning theory, Springer-Verlag New York, Inc., New York, NY, 1995 
