Detecting Short Passages of Similar Text in Large Document Collections
by Caroline Lyon, James Malcolm, and Bob Dickerson

References

P. Barton, N. Davey, R. Frank, and E. Tansley. 1995. Dynamic competitive learning applied to the clone detection problem. Int. Workshop on Application of Neural Networks to Telecommunications, Stockholm.
T. C. Bell, J. G. Cleary, I. H. Witten. 1990. Text compression. Prentice Hall.
C. M. Bishop. 1995. Neural Networks for Pattern Recognition. OUP.
A. Z. Broder. 1998. On the resemblence and containment of documents. Compression and Complexity of Sequences, IEEE Computer Society.
D. Gibbon, R. Moore, R. Winski. 1997. Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter.
A. Hamilton, J. Madison, and J. Jay. 1787-1788. The Federalist Papers.
V. Hatzivassiloglou, J. Klavans, E. Eskin. 1999. Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. EMNLP.
A. K. Jain, R. P. W. Duin, J. Mao. 2000. Statistical pattern recognition: a review. IEEE Trans. on Pattern Analysis and Machine Intelligence.
K. Kukich. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys, December.
J. Kupiec. 1992. Robust part-of-speech tagging using hidden Markov model. Computer Speech and Language.
C. Lyon, R. Frank. 1997. Using single layer networks for discrete, sequential data: an example from NLP. Neural Computing Applications.
C. D. Manning and H. Schutze. 1999. Foundations of Statistical NLP. MIT.
F. Mosteller, D. L. Wallace. 1984. Applied Bayesian and classical inference: The case of the Federalist Papers. Springer-Verlag.
H. Ney, S. Martin, and F. Wessel. 1997. Statistical language modeling using leaving-one-out. In S. Young and G. Bloothooft, eds., Corpus based methods in Language and Speech PRocessing. Kluwer.
N. Heintze. 1996. Scalable Document Fingerprinting. Bell laboratories.
Thomas Phelps and Robert Wilensky. 2000. Robust Hyperlinks: Cheap, everywhere, now. Proc. of Digital Docuemtns and Electronic Publishing.
C. E. Shannon. 1951. Prediction and Entropy of Printed English. Bell System Tech. Journal.
N. Shivakumar and H. Garcia-Molina. 1996. Building a scalable and accurate copy detection mechanism. Int. Conf. on Theory and Practice of Digital Libraries.
L. Wittgenstein. 1945. Philosophical Investigations. Blackwell.
