Proceedings of the 
Fourth Conference on 
Computational Natural Language Learning 
and of the 
Second Learning Language in Logic Workshop 
Held in cooperation with ICGI-2000 
13-14 September 2000 
Lisbon, Portugal 
Proceedings of the 
Fourth Conference on 
Computational Natural Language Learning 
and of the 
Second Learning Language in Logic Workshop 
Held in cooperation with ICGI-2000 
13---14 September 2000 
Lisbon, Portugal 
Order additional copies from: 
Association for Computational Linguistics 
75 Paterson Street 
New Brunswick, NJ 08901 USA 
+1-732-342-9100 phone 
+1-732-342-9339 fax 
acl @ aclweb.org 
Preface 
The joint Second Learning Language in Logic (LLL-2000) Workshop and Fourth Conference on 
Computational Natural Language Learning (CoNLL-2000) took place September 13-14, 2000, at the 
Instituto Superior T6cnico in Lisbon, Portugal and have been co-organized with the 5th International 
Colloquium on Grammatical Inference (ICGI-2000). 
This volume contains the papers presented during this joint event. More information is available on-line from 
http : //www. iri. fr/~cn/LLL-2000/and http : / / icg-www, uia. ac. be/conll2000/. 
We would like to thank all the authors for submitting their papers and thus making these proceedings 
possible. We address special thanks to the members of the program committees for their great work which 
contributed to the high quality of these proceedings. We wish to extend our gratitude to the invited speakers 
for presenting us with their views on innovative results in Natural Language Processing and Machine 
Learning. 
We are also grateful to the Local Chair Arlindo Oliveira, the members of the Organizing Committee, Ana 
Fred and Ana T. Freitas, and all other individuals who helped in the organization of this event. 
Finally, we would like to thank the sponsors of LLL-2000 and CoNLL-2000 for their generous financial 
and moral support: the Network of Excellence in Inductive Logic Programming (ILPNet2), the Network of 
Excellence in Machine Learning (MLNet3), the Computational Linguistics in Flanders research community 
(CLIF), and SIGNLL (ACL's SIG on Natural Language Learning). 
Claire Cardie 
Walter Daelemans 
Claire N6dellec 
Efik Tjong Kim Sang 
°o. 111 
SPONSORS: 
CLIF (Computational Linguistics in Flanders) 
ILPNet2 (Network of Excellence in Inductive Logic Programming) 
MLNet3 (Network of Excellence in Machine Learning) 
SIGNLL (ACL's SIG for Natural Language Learning) 
INVITED SPEAKERS: 
J6rg-Uwe Kietz 
Dan Roth 
ORGANIZERS: 
Claire Cardie (CoNLL) 
Walter Daelemans (CoNLL) 
Claire N6dellec (LLL) 
Erik Tjong Kim Sang (CoNLL) 
LOCAL ARRANGEMENTS CHAIR: 
Arlindo Oliveira 
CoNLL PROGRAM COMMITTEE: 
Thorsten Brants 
James Cussens 
Raymond Mooney 
John Nerbonne 
Miles Osborne 
David Powers 
Ronan Reilly 
Antal van den Bosch 
(Universit~it des Saarlandes) 
(University of York) 
(University of Texas at Austin) 
(University of Groningen) 
(University of Edinburgh) 
(Flinders University) 
(University College Dublin) 
(Tilburg University) 
iv 
LLL PROGRAM COMMITTEE: 
Pieter Adriaans 
Roberto Basili 
Gilles Bisson 
Henrik Bostr0m 
Gosse Bouma 
James Cussens 
Tomaz Erjavec 
Daniel Kayser 
Suresh Manandhar 
Guenter Neumann 
Steve Pulman 
Christer Samuelsson 
Stefan Wrobel 
(Syllogic and University of Amsterdam, the Netherlands) 
(University of Roma, Italy) 
(INRIA, Grenoble, France) 
(University of Stockholm, Sweden) 
(University of Groningen, the Netherlands) 
(University of York, United Kingdom) 
(Institute Jozef Stefan, Slovenia) 
(LIPN, Universit Paris-Nor& France) 
(University of York, United Kingdom) 
(DFKI, Saarbrcken, Germany) 
(University of Cambridge, United Kingdom) 
(Xerox Research Center Europe, Grenoble, France) 
(University of Magdeburg, Germany) 
FURTHER INFORMATION: 
CoNLL and SIGNLL 
Walter Daelemans 
CNTS Language Technology Group 
University of Antwerp (UIA) 
Universiteitsplein 1 (building A) 
B-2610 Antwerpen, Belgium 
e-mail: daelem@uia.ua.ac.be 
LLL 
Claire N6dellec 
Laboratoire de Recherche en informatique (LRI) 
UMR 8623 CNRS 
Bat 490, Universit6 Paris-Sud 
F-91405 Orsay cedex, France 
e-mail: cn@lri.fr 
V 
Table of Contents 
CoNLL-2000 Invited Paper 
Learning in Natural Language: Theory and Algorithmic Approaches 
Dan Roth ............................................................................... 1 
CoNLL-2000 Papers 
Corpus-Based Grammar Specialization 
Nicola Cancedda and Christer Samuelsson ............................................... 7 
Pronunciation by Analogy in Normal and Impaired Readers 
R.I. Damper and Y. Marchand ......................................................... 13 
The Role of Algorithm Bias vs Information Source in Learning Algorithms for Morphosyntactic 
Disambiguation 
Guy De Pauw and Walter Daelemans ................................................... 19 
Increasing our Ignorance of Language: Identifying Language Structure in an Unknown 'Signal' 
John Elliott, Eric Atwell and Bill Whyte ................................................ 25 
A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation 
Gerard Escudero, Lluis M£rquez and German Rigau .................................... 31 
Incorporating Position Information into a Maximum Entropy/Minimum Divergence 
Translation Model 
George Foster .......................................................................... 37 
Memory-Based Learning for Article Generation 
Guido Minnen, Francis Bond and Ann Copestake ....................................... 43 
Overfitting Avoidance for Stochastic Modeling of Attribute- Value Grammars 
Tony Mullen and Miles Osborne ........................................................ 49 
Learning Distributed Linguistic Classes 
Stephan Raaijmakers ................................................................... 55 
Modeling the Effect of Cross-Language Ambiguity on Human Syntax Acquisition 
William Gregory Sakas ................................................................. 61 
Knowledge-Free Induction of Morphology Using Latent Semantic Analysis 
Patrick Schone and Daniel Jurafsky .................................................... 67 
Using Induced Rules as Complex Features in Memory-Based Language Learning 
Antal van den Bosch ................................................................... 73 
vi 
CoNLL-2000 Short Papers 
Using Perfect Sampling in Parameter Estimation of a Whole Sentence Maximum Entropy 
Language Model 
F. Amaya and J.M. Benedi ............................................................. 79 
Experiments on Unsupervised Learning for Extracting Relevant Fragments from Spoken Dialog 
Corpus 
Konstantin Biatov ...................................................................... 83 
Generating Synthetic Speech Prosody with Lazy Learning in Tree Structures 
Laurent Blin and Laurent Miclet ....................................................... 87 
Inducing Syntactic Categories by Context Distribution Clustering 
Alexander Clark ........................................................................ 91 
ALLiS: a Symbolic Learning System for Natural Language Learning 
Herv~ Ddjean ........................................................................... 95 
Combining Text and Heuristics for Cost-Sensitive Spam Filtering 
Jose M. GSmez Hidalgo and Enrique Puertas Sanz ...................................... 99 
Genetic Algorithms for Feature Relevance Assignment in Memory-Based Language Processing 
Anne Kool, Walter Daelemans and Jakub Zavrel ...................................... 103 
Shallow Parsing by Inferencing with Classifiers 
Vasin Punyakanok and Dan Roth ...................................................... 107 
Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov 
Models 
Patrick Ruch, Robert Baud, Pierrette Bouillon and Gilbert Robert .................... 111 
Learning IE Rules for a Set of Related Concepts 
J. Turmo and H. Rodrlguez ........................................................... 115 
A default First Order Family Weight Determination Procedure for WPD V Models 
Hans van Halteren .................................................................... 119 
A Comparison of PCFG Models 
Jose Luis Verdfi-Mas, Jorge Calera-Rubio and Rafael C. Carrasco ...................... 123 
vii 
CoNLL-2000 Shared Task Papers 
Introduction to the CoNLL-2000 Shared Task: Chunking 
Erik F. Tjong Kim Sang and Sabine Buchholz ......................................... 127 
Learning Syntactic Structures with XML 
Herv~ D~jean .......................................................................... 133 
A Context Sensitive Maximum Likelihood Approach to Chunking 
Christer Johansson .................................................................... 136 
Chunking with Maximum Entropy Models 
Rob Koeling .......................................................................... 139 
Use of Support Vector Learning for Chunk Identification 
Taku Kudoh and Yuji Matsumoto ..................................................... 142 
Shallow Parsing as Part-of-Speech Tagging 
Miles Osborne ......................................................................... 145 
Improving Chunking by Means of Lexical-Contextual Information in Statistical Language 
Models 
Ferran Pla, Antonio Molina and Natividad Prieto ...................................... 148 
Text Chunking by System Combination 
Erik F. Tjong Kim Sang ............................................................... 151 
Chunking with WPD V Models 
Hans van Halteren .................................................................... 154 
Single-Classifier Memory-Based Phrase Chunking 
Jorn Veenstra and Antal van den Bosch ............................................... 157 
Phrase Parsing with Rule Sequence Processors: an Application to the Shared CoNLL Task 
Marc Vilain and David Day ........................................................... 160 
Hybrid Text Chunking 
GuoDong Zhou, Jian Su and TongGuan Tey ........................................... 163 
viii 
LLL-2000 Invited Paper 
Extracting a Domain-Specific Ontology from a Corporate Intranet 
JSrg-Uwe Kietz, Raphael Volz and Alexander Maedche ............. ................... 167 
LLL-2000 Papers 
Learning from a Substructural Perspective 
Pieter Adriaans and Erik de Haas ..................................................... 176 
Incorporating Linguistics Constraints into Inductive Logic Programming 
James Cussens and Stephen Pulman ................................................... 184 
Learning from Parsed Sentences with INTHELEX 
F. Esposito, S. Ferilli, N. Fanizzi and G. Semeraro ..................................... 194 
Inductive Logic Programming for Corpus-Based Acquisition of Semantic Lexicons 
Pascale S~billot, Pierrette Bouillon and CEcile Fabre ................................... 199 
The Acquisition of Word Order by a Computational Learning System 
Aline Villavicencio .................................................................... 209 
Recognition and Tagging of Compound Verb Groups in Czech 
Eva Z~kov~, Lubo~ Popellnsk~ and Milo~ Nepil ....................................... 219 
ix 
Fourth Conference on 
Computational Natural Language Learning 
(CoNLL-2000) 
Preface 
CoNLL-2000 is the fourth in a series of meetings organized by SIGNLL, the ACL's SIG on Natural 
Language Learning. Previous meetings were organized in Madrid, Sydney, and Bergen, co-located with 
different, but always computational linguistics-oriented, events. We are pleased that this time we could 
combine efforts with the grammar induction and inductive logic programming for  processing 
communities. 
It is the explicit wish of the SIGNLL board to have the CoNLL meeting address all aspects of computational 
natural  learning, including issues that are not regularly discussed at computational linguistics 
meetings, such as computational models of human  acquisition, computational models of the origins 
and evolution of , biologically-inspired learning methods, etc. 
We are thrilled by the quality and quantity of the submissions, which allowed us to set up an intense 
but rewarding program with one invited talk, 12 long talks, and joint paper sessions with LLL-2000 and 
ICGI-2000. On top of that, we introduced two innovations: there are 12 bullet presentations, short talks 
accompanied by a poster presentation, and a shared task session in which 11 authors report on how their 
machine learning method performed on our shared task -- the identification of syntactic constituents in 
text (chunking). In this part of the proceedings, you will find 37 papers providing a useful record of all 
presentations. 
You can find out more about SIGNLL and its activities at http : //www. aclweb, org/signll/. 
Claire Cardie 
Walter Daelemans 
Erik Tjong Kim Sang 
Ithaca and Antwerp, 2000 
Second Learning Language in Logic Workshop 
(LLL-2000) 
Preface 
LLL-2000 is the follow-up of the first LLL workshop held in 1999 in Bled (Slovenia), and co-located with 
the International Conference on Machine Learning and the International Conference on Logic Programming. 
This year LLL was integrated with the Fourth Conference on Language Learning (CoNLL) and the 
Fifth International Colloquium on Grammatical Inference (ICGI) with which LLL shares strong common 
scientific interests in  learning. The registration to ICGI, CoNLL and LLL was a joint registration 
so that registrants could freely move belLween the three events. 
As in the first edition, LLL has attracted pluridisciplinary submissions from the three research fields 
-- Natural Language Processing (NLP), Machine Learning and Computational Logic, demonstrating the 
growing interest in NLP methods based on ILP or non-classic logics, and hybrid methods. Relational 
learning more and more appears as complementary to data analysis in many NLP domains. Relational 
learning and logic-based learning prove here again their capacity to learn complex structured linguistic 
resources and knowledge such as ontology and grammar from corpora and explicit background knowledge. 
The scientific program of LLL-2000 consisted of one invited talk by Jrrg-Uwe Kietz on the acquisition of 
ontology and seven paper presentations. Six of them are reported here and the paper by Christophe Costa 
Florencio, accepted for presentation by both LLL and ICGI, has been published in the ICGI proceedings. 
The joint sessions with ICGI and CoNLL included one invited talk by Dan Roth and paper and poster 
presentations. 
Claire Nrdellec 
Orsay, 2000 
Author Index 
Adriaans, Pieter ........................... 176 
Amaya, F .................................... 79 
Atwell, Eric ................................ 25 
Baud, Robert ............................. 111 
Benedi, J.M ................................ 79 
Biatov, Konstantin ......................... 83 
Blin, Laurent ............................... 87 
Bond, Francis .............................. 43 
Bouillon, Pierrette ..................... 111, 199 
Buchholz, Sabine .......................... 127 
Calera-Rubio, Jorge ....................... 123 
Cancedda, Nicola ............................ 7 
Carrasco, Rafael C ......................... 123 
Clark, Alexander ........................... 91 
Copestake, Ann ............................ 43 
Cussens, James ............................ 184 
Daelemans, Walter ..................... 19, 103 
Damper, R.I ................................ 13 
Day, David ................................ 160 
De Haas, Erik ............................. 176 
De Pauw, Guy ............................... 19 
D~jean, Herv~ ......................... 95, 133 
Elliott, John ................................. 25 
Escudero, Gerard ........................... 31 
Esposito, F ................................ 194 
Fabre, CEcile .............................. 199 
Fanizzi, N ................................. 194 
Ferilli, S ................................... 194 
Foster, George .............................. 37 
GSmez Hidalgo, Jose M ..................... 99 
Johansson, Christer ....................... 136 
Jurafsky, Daniel ............................ 67 
Kietz, JSrg-Uwe ........................... 167 
Koeling, Rob .............................. 139 
Kool, Anne ................................ 103 
Kudoh, Taku .............................. 142 
Maedche, Alexander ....................... 167 
Marchand, Y ..................... .......... 13 
M~rquez, Lluis ............................. 31 
Matsumoto, Yuji .......................... 142 
Miclet, Laurent ............................. 87 
Minnen, Guido ............................. 43 
Molina, Antonio ........................... 148 
Mullen, Tony ............................... 49 
Nepil, Milo~ ............................... 219 
Osborne, Miles ........................ 49, 145 
Pla, Ferran ................................ 148 
Popelinsk~, Lubog ......................... 219 
Prieto, Natividad .......................... 148 
Puertas Sanz, Enrique ...................... 99 
Pulman, Stephen .......................... 184 
Punyakanok, Vasin ........................ 107 
Raaijmakers, Stephan ...................... 55 
Rigau, German ............................. 31 
Robert, Gilbert ............................ 111 
Rodrlguez, H .............................. 115 
Roth, Dan .............................. 1, 107 
Ruch, Patrick ............................. 111 
Sakas, William Gregory ..................... 61 
Samuelsson, Christer ........................ 7 
Schone, Patrick ............................. 67 
S~billot, Pascale ........................... 199 
Semeraro, G ............................... 194 
Su, Jian ................................... 163 
Tey, TongGuan ............................ 163 
Tjong Kim Sang, Erik F .............. 127, 151 
Turmo, J .................................. 115 
Van Halteren, Hans ................... 119, 154 
Van den Bosch, Antal .................. 73, 157 
Veenstra, Jorn ............................. 157 
Verdd-Mas, Jose Luis ...................... 123 
Vilain, Marc .............................. 160 
Villavicencio, Aline ........................ 209 
Volz, Raphael ............................. 167 
Whyte, Bill ................................. 25 
Z£~kov£, Eva .............................. 219 
Zavrel, Jakub ............................. 103 
Zhou, GuoDong ........................... 163 
