References

1   Michael R. Brent, An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery, Machine Learning, v.34 n.1-3, p.71-105, Feb. 1999 

2   Jing-Shin Chang and Keh-Yih Su. 1997. An unsupervised iterative method for Chinese new lexicon extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1(1):101--157. 

3   Jyun-Sheng Chang, C.-D. Chen, and S.-D. Chen. 1991. Chinese word segmentation through constraint satisfaction and statistical optimization. In ROCLING-IV, pages 147--165, National Chiao-Thung University, Hsinchu, Taiwan. 

4   Jing-Shin Chang, Yi-Chung Lin, and Keh-Yih Su. 1995. Automatic construction of a Chinese electronic dictionary. In David Yarovsky and Kenneth Church, editors, WVLC-3, pages 107--120, Somerset, New Jersey, June. 

5   Keh-Jiann Chen , Shing-Huan Liu, Word identification for Mandarin Chinese sentences, Proceedings of the 14th conference on Computational linguistics, August 23-28, 1992, Nantes, France 

6   Tung-Hui Chiang, Ming-Yu Lin, and Keh-Yih Su. 1992. Statistical models for word segmentation and unknown word resolution. In ROCLING-V, pages 121--146, Taiwan. 

7   W. Daelemans, S. Buchholz, and J. Veenstra. 1999. Memory-based shallow parsing. In CoNLL-99, pages 53--60, Bergen, Norway. 

8   Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch. 2001. Timbl: Tilburg memory based learner, version 4.0, reference guide. Technical Report ILK Technical Report 01--04, Induction of Linguistic Knowledge, Tilburg University, The Netherlands. 

9   Carl G. De Marcken , Robert C. Berwick, Unsupervised language acquisition, 1996 

10   A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 34:1--38. 

11   Charng-Kang Fan and Wen-Hsiang Tsai. 1988. Automatic word identification in Chinese sentences by the relaxation technique.Computer Processing of Chinese and Oriental Languages, 4(1):33--56. 

12   Kok-Wee Gan , Kim-Teng Lua , Martha Palmer, A statistically emergent approach for language processing: application to modeling context effects in ambiguous Chinese word boundary perception, Computational Linguistics, v.22 n.4, p.531-553, December 1996 

13   Xianping Ge , Wanda Pratt , Padhraic Smyth, Discovering Chinese words from unsegmented text (poster abstract), Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.271-272, August 15-19, 1999, Berkeley, California, United States 

14   Gregory Grefenstette and P. Tapanainen. 1994. What is a word, what is a sentence? Problems of tokenization. In 3rd Conference on Computational Lexicography and Text Research, COMPLEX'94, Budapest, July 7--10. 

15   Gregory Grefenstette, Anne Schiller, and Salah At-Mokhtar. 2000. Recognizing lexical patterns in text. In F. van Eynde, D. Gibbon, and I. Schuurman, editors, Lexicon Development for Speech and Language Processing, pages 141--168. Kluwer, Dordrecht. 

16   Gregory Grefenstette. 1999. Tokenization. In Hans van Halteren, editor, Syntactic Wordclass Tagging, pages 117--133. Kluwer, Dordrecht. 

17   Yingchun Guan and Bei Qin. 1986. The design and implementation of a Chinese word statistical system. Journal of Chinese Information Processing, 1(1):26--32. (In Chinese). 

18   Jin Guo, Critical tokenization and its properties, Computational Linguistics, v.23 n.4, p.569-596, December 1997 

19   J. Hockenmaier and C. Brew. 1998. Error-driven learning of Chinese word segmentation. In PACLIC-12, pages 218--229, Singapore. Chinese and Oriental Languages Processing Society. 

20   Frederick Jelinek, Statistical methods for speech recognition, MIT Press, Cambridge, MA, 1998 

21   Wanying Jin. 1992. A case study: Chinese segmentation and its disambiguation. Technical Report MCCS-92-227, Computing Research Laboratory, New Mexico State University, Las Cruces. 

22   Wanying Jin, Chinese segmentation disambiguation, Proceedings of the 15th conference on Computational linguistics, August 05-09, 1994, Kyoto, Japan 

23   Chunyu Kit and Yorick Wilks. 1999. Unsupervised learning of word boundary with description length gain. In M. Osborne and E. T. K. Sang, editors, CoNLL-99, pages 1--6, Bergen, June. 

24   Chunyu Kit, Yuan Liu, and Nanyuan Liang. 1989. On methods of Chinese automatic word segmentation. Journal of Chinese Information Processing, 3(l):1--32. (In Chinese). 

25   Chunyu Kit. 2000. Unsupervised Lexical Learning as Inductive Inference. Ph.D. thesis, University of Sheffield, UK. 

26   Tom B. Y. Lai, Sun C. Lin, Chaofen Sun, and Maosong Sun. 1991. A maximal matching automatic Chinese word segmentation algorithm using corpus tagging for ambiguity resolution. In ROCLING-IV, pages 17--23. 

27   Nanyuan Liang and Yuan Liu. 1985. The OM method of automatic word segmentation. Chinese Information, 1(2). (In Chinese). 

28   Nanyuan Liang. 1984. Automatic word segmentation for written Chinese and the segmentation system CDWS. Journal of Beijing University of Aeronautics and Astronautics, ?(4). (In Chinese). 

29   Nanyuan Liang. 1986. CDWS - an automatic word segmentation system for written Chinese. Journal of Chinese Information Processing, 1(2):44--52. (In Chinese). 

30   Nanyuan Liang. 1989. Knowledge for Chinese word segmentation. Journal of Chinese Information Processing, 4(2):29--33. (In Chinese). 

31   Yuan Liu and Nanyuan Liang. 1986. Basic engineering for Chinese processing - Modern Chinese word frequency counting. Journal of Chinese Information Processing, 1(1):17--23. (In Chinese). 

32   Yuan Liu, Qiang Tan, and Xukun Shen. 1994. Contemporary Chinese Word Segmentation Standard Used for Information Processing, and Automatic Word Segmentation Methods. Tsinghua University Press, Bejing. (In Chinese). 

33   Hong I Ng and Kim Teng Lua. (forthcoming). A word finding automation for Chinese sentence tokenization. Submitted to ACM Transaction of Asian Languages Processing. 

34   David Palmer and J. Burger. 1997. Chinese word segmentation and information retrieval. In AAAI Spring Symposium on Cross-Language Text and Speech Retrieval. 

35   David D. Palmer, A trainable rule-based algorithm for word segmentation, Proceedings of the 35th annual meeting on Association for Computational Linguistics, p.321-328, July 07-12, 1997, Madrid, Spain 

36   David D. Palmer. 2000. Tokenization and sentence segmentation. In R. Dale, H. Moisl, and H. Somers, editors, Handbook of Natural Language Processing, pages 11--35. Marcel Dekker, New York. 

37   Fuchun Peng , Dale Schuurmans, Self-Supervised Chinese Word Segmentation, Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis, p.238-247, September 13-15, 2001 

38   J. Ponte, USe: A Retargetable Word Segmentation Procedure for Information Retrieval, University of Massachusetts, Amherst, MA, 1996 

39   Richard Sproat , William Gale , Chilin Shih , Nancy Chang, A stochastic finite-state word-segmentation algorithm for Chinese, Computational Linguistics, v.22 n.3, p.377-404, September 1996 

40   Mark Stevenson , Yorick Wilks, The interaction of knowledge sources in word sense disambiguation, Computational Linguistics, v.27 n.3, p.321-349, September 2001 

41   Maosong Sun and Benjamin K. T'sou. 1995. Ambiguity resolution in Chinese word segmentation. In Benjamin K. T'sou and Tom B. Y. Lai, editors, PACLIC-10, Hong Kong, December 27--28. 

42   Maosong Sun and Zhengping Zhou. 1998. Word segmentation ambiguity in Chinese texts. In Benjiamin K. T'sou, Tom B. Y. Lai, Samuel W. K. Chan, and Williams S-Y. Wang, editors, Quantitative and Computational Studies on the Chinese Language, pages 323--338. Language Information Sciences Research Centre, City University of Hong Kong. 

43   W. J. Teahan , Rodger McNab , Yingying Wen , Ian H. Witten, A compression-based algorithm for Chinese word segmentation, Computational Linguistics, v.26 n.3, p.375-393, September 2000 

44   J. Veenstra, A. Van den Bosch, S. Buchholz, W. Daelemans, and J. Zavrel. 2000. Memory-based word sense disambiguation. Computing and the Humanities, special issue on SENSEVAL, 34(1--2y). 

45   Anand Venkataraman, A statistical model for word discovery in transcribed speech, Computational Linguistics, v.27 n.3, p.352-372, September 2001 

46   Jonathan J. Webster , Chunyu Kit, Tokenization as the initial phase in NLP, Proceedings of the 14th conference on Computational linguistics, August 23-28, 1992, Nantes, France 

47   Jonathan J. Webster and Chunyu Kit. 1992b. Tokenization for machine translation: What can be learned from Chinese word identification. In Proc. of 3rd International Conference on Chinese Information Processing, Beijing. 

48   Zimin Wu , Gwyneth Tseng, Chinese text segmentation for text retrieval: achievements and problems, Journal of the American Society for Information Science, v.44 n.9, p.532-542, Oct. 1993 

49   Shiwen Yu. 1998. Knowledge Base of Grammatical Information for Contemporary Chinese. Tsinghua University Press, Bejing. (In Chinese). 

50   Jakub Zavrel, Walter Daelemans, and Jorn Veenstra. 1998. Resolving PP attachment ambiguities with memory-based learning. In T. Mark Ellison, editor, CoNLL97: Computational Natural Language Learning, pages 136--144, Somerset, New Jersey. 

51   Guodong Zhou and Kim Teng Lua. (forthcoming). A hybrid approach toward ambiguity resolution in segmenting Chinese sentences. Submitted to Computer Processing of Oriental Languages.
