Multilingual Authoring: the NAMIC approach
R. Basili, M.T. Pazienza
F. Zanzotto
Dept. of Computer Science
University of Rome, Tor Vergata
Via di Tor Vergata,
00133 Roma
Italy
basili@info.uniroma2.it
pazienza@info.uniroma2.it
zanzotto@info.uniroma2.it
R. Catizone, A. Setzer
N. Webb, Y. Wilks
Department of Computer Science
University of Sheffield
Regent Court
211 Portobello Street,
Sheffield S1 4DP, UK
R.Catizone@dcs.shef.ac.uk
A.Setzer@dcs.shef.ac.uk
N.Webb@dcs.shef.ac.uk
Y.Wilks@dcs.shef.ac.uk
L. Padr´o, G. Rigau
Dept. Llenguatges i Sistemes Inform`atics
Universitat Polit`ecnica de Catalunya
Centre de Recerca TALP
Jordi Girona Salgado 1-3,
08034 Barcelona
Spain
padro@lsi.upc.es
g.rigau@lsi.upc.es
Abstract
With increasing amounts of elec-
tronic information available, and the
increase in the variety of languages
used to produce documents of the
same type, the problem of how to
manage similar documents in dif-
ferent languages arises. This pa-
per proposes an approach to process-
ing/structuring text so that Multi-
lingual Authoring (creating hyper-
text links) can be effectively car-
ried out. This work, funded by
the European Union, is applied to
the Multilingual Authoring of news
agency text. We have applied meth-
ods from Natural Language Process-
ing, especially Information Extrac-
tion technology, to both monolingual
and Multilingual Authoring.
1 Introduction
Modern Information Technologies are faced
with the problem of selecting, filtering and
managing growing amounts of multilingual
information to which access is usually criti-
cal. Traditional Information Retrieval (IR)
approaches are too general in their selection
of relevant documents where as traditional
Information Extraction (IE) (Gaizauskas and
Wilks, 1998; Pazienza, 1997) approaches are
too specific and inflexible. Automatic Au-
thoring is a good example of how these two
methods can be improved and used to cre-
ate a hypertextual organisation of (multilin-
gual) information. This kind of information
is ‘added value’ to the information embodied
in the text and is not in contrast with other
retrieval paradigms. Automatic Authoring
is the activity of processing news items in
streams, detecting and extracting relevant in-
formation from them and, accordingly, organ-
ising texts in a non-linear fashion.
While IE systems like the ones participat-
ing in the Message Understanding Conference
(MUC, 1998) are oriented towards specific
phenomena (e.g. joint ventures) in restricted
domains, the scope of Automatic Authoring
is wider. In Automatic Authoring, the hy-
pertextual structure has to provide naviga-
tion guidelines to the final user which can also
refuse the system suggestions.
In this paper an architecture for Automatic
Multilingual Authoring is presented based on
knowledge-intensive and large-scale Informa-
tion Extraction. The general architecture
is presented capitalising robust methods of
Information Extraction (Cunningham et al.,
1999) and large-scale multilingual resources
(e.g. EuroWordNet). The system is de-
veloped within a European project in the
Human Language Technologies area, called
NAMIC (News Agencies Multilingual Infor-
mation Categorisation)1. It aims to extract
relevant facts from the news streams of large
European news agencies and newspaper pro-
ducers2, to provide hypertextual structures
within each (monolingual) stream and then
produce cross-lingual links between streams.
2 Authoring
2.1 Automatic Authoring
As Automatic Authoring is the task of au-
tomatically deriving a hypertextual structure
from a set of available news articles (in three
different languages English, Spanish and Ital-
ian in our case), the complexity of the overall
framework requires a suitable decomposition:
Text processing requires at least the de-
tection of morphosyntactic information char-
acterising the source texts: recognition, nor-
malisation, and assignment of roles is required
for the main participants for the different
events/facts described.
Event Matching is then the activity of
selecting the relevant facts of a news arti-
cle, in terms of their general type (e.g. sell-
ing or buying companies, winning a football
match), their participants and their related
roles (e.g. the company sold or the winning
football team).
Authoring is thus the activity of gener-
ating links between news articles according
to relationships established among facts de-
tected in the previous phase.
For instance, a company acquisition can be
referred to in one (or more) news items as:
• Intel, the world’s largest chipmaker,
bought a unit of Danish cable maker NKT
that designs high-speed computer chips ...
1See http://namic.itaca.it.
2EFE and ANSA, the major news agencies in Spain
and Italy respectively, and the Financial Times are all
members of the NAMIC consortium.
• The giant chip maker Intel said it ac-
quired the closely held ICP Vortex Com-
putersysteme, a German maker of sys-
tems ...
• Intel ha acquistato Xircom inc. per 748
milioni di dollari.
The hypothesis underlying Authoring is
that all the above news items deal with facts
in the same area of interest to a potential class
of readers. They should be thus linked and
links should suggest to the user that the un-
derlying motivation (used to decide whether
or not to follow an available link) is that they
all refer to Intel acquisitions.
Notice that a link generation process based
only upon words would fail in the above case
as the common word (that could play the role
of anchor in linking) is the proper noun Intel.
As no other information is available, the re-
sulting set of potential matches can be huge
and the connectivity too high.
In order to get the suitable links the equiv-
alence between the senses of bought and ac-
quired in the first two news items must be
known. Although such a relation can be
drawn by mechanisms like query expansion or
thesauri of synonyms (e.g. WordNet (Miller,
1990)), word polysemy and noise may re-
sult in an inherent proliferation of irrelevant
matches. Contextual information is critical
here. Notice that the senses of ‘buy’ and ‘ac-
quire’ are constrained by the role played by
Intel as ‘agent’ and NKT or ICP Vortex be-
ing the sold companies. In fact, Intel buys
silicon represents an unwanted sense of the
verb and should be distinguished.
The relevant information concerning Intel
should be thus limited to:
• Intel buys a unit of NKT
• Intel acquires ICP Vortex.
These descriptions provide the core infor-
mation able to establish equivalence among
the underlying events. Whenever base event
descriptions are available the linking process
can be carried out via simpler equivalence in-
ferences. The Authoring problem is thus a
side effect of the overall language-processing
task.
According to the suggested decomposition
all the above steps are mandatory. First text
processing is responsible for morpho-syntactic
recognition. Morphological units and syntac-
tic relations are produced for each sentence at
this stage. However, syntactic relations (e.g.
among subjects and verbs) are not sufficient
for proper event characterisation. In the ex-
ample(s), the subject of the verb acquire is
a pronoun only anaphorically referring to In-
tel. Co-reference resolution is usually applied
to this kind of mismatch at the surface level.
This capability is under the responsibility of
the event matching phase. Moreover, in or-
der to keep track of events over syntactic rep-
resentations, references to a target ontology
are required. In such an ontology, equiva-
lence among facts (e.g. buying companies) is
represented. For instance, the relation among
buy and acquire can be encoded under a more
general notion of financial acquisition. On-
tologies also define the set of relevant facts of
the target domain. A financial acquisition is
a perfect example of what is needed in cor-
porate industrial news but is less important,
for example, in sports news, where hiring of
players seems a more relevant event class.
Conceptual differences among facts (de-
tected during event matching) motivate a se-
lective notion of hyperlinking. These links
can be thus generated during the automatic
authoring phase. They are ontologically jus-
tified as their conceptual representation is al-
ready available at this stage. Types as same
acquisition fact, same person, or company can
be used to distinguish links and make expla-
nations available to the user.
2.2 Multilingual Automatic
Authoring
¿From a multilingual perspective, the prob-
lem is to establish links among news in dif-
ferent languages. Full-text approaches can
rely only on language independent phenom-
ena (e.g. proper nouns like Intel) that are
very limited in texts. Most of the above-
mentioned inferences require language neu-
tral information (i.e. conceptual and not lexi-
cal constraints). The inherent overgeneration
related to word polysemy affects the results
of translation-based approaches. Again prin-
cipled representations made available by IE
processes (i.e. templates) provide a viable
solution. The different event realisations (in
the different languages) can be handled dur-
ing the overall event matching. A lexical in-
terface to the ontology is able to factor the
language specific information. As syntactic
differences are handled during text process-
ing, the result is a common domain model for
IE plus independent lexical interfaces. The
unified representation of the set of facts ac-
tivates multilingual linking at a conceptual
level, thus making the Authoring a language
independent process. Some challenges of such
a framework are:
• the size of the ontological resources re-
quired in terms of taxonomic (i.e. IS A
relations) and conceptual information
(i.e. classes of events and implied
participant-event relations)
• the size of the lexical interfaces to the
ontology available for the different lan-
guages
• the amount of task dependent knowledge.
For example the definition of the set of
events useful for the target application is
underspecified.
In the following, we propose a complex ar-
chitecture where the above problems are
approached according to well-assessed tech-
niques presented elsewhere. Robust Informa-
tion Extraction is adopted (Humphreys et al.,
1998) as an overall method for text process-
ing and event matching. Target events are
semiautomatically derived from domain texts
and represented in the IE engine ontology. Fi-
nally, multilinguality is realised by assuming a
large-scale multilingual lexical hierarchy as a
reference ontology for nominal concepts. The
resulting architecture for Multilingual Auto-
matic Authoring is presented in Section 3.4.
3 The NAMIC system
3.1 Large scale IE for Automatic
Authoring
Information Extraction is a very good ap-
proach to Automatic Authoring for a num-
ber of reasons. The key components of an IE
system are events and objects - the kind of
components that trigger hyperlinks in an Au-
thoring system. Coreference is a significant
part of Information Extraction and indeed a
necessary component in Authoring. Named
Entities - people, places, and organisations,
etc. - play an important part in Authoring
and again are firmly addressed in Information
Extraction systems.
The role of a world model as a method
for event matching and coreferencing
The world model is an ontological represen-
tation of events and objects for a particular
domain or set of domains. The world model
is made up of a set of event and object types,
with attributes. The event types characterise
a set of events in a particular domain and
are usually represented in a text by verbs.
Object Types on the other hand, are best
thought of as characterising a set of people,
places or things and are usually represented
in a text by nouns (both proper and com-
mon). When used as part of an Information
Extraction system, the instances of each type
are inserted/added to the world model. Once
the instances have been added, a procedure
is carried out to link those instances that re-
fer to the same thing - achieving coreference
resolution.
In NAMIC, the world model is created
using the XI cross-classification hierarchy
(Gaizauskas and Humphreys, 1996). The def-
inition of a XI cross-classification hierarchy is
referred to as an ontology, and this together
with an association of attributes with nodes
in the ontology forms the world model. Pro-
cessing a text acts to populate this initially
bare world model with the various instances
and relations mentioned in the text, convert-
ing it into a discourse model specific to the
particular text.
The attributes associated with nodes in
the ontology are simple attribute:value pairs
where the value may either be fixed, as in
the attribute animate:yes which is associ-
ated with the person node, or where the value
may be dependent on various conditions, the
evaluation of which makes reference to other
information in the model.
3.1.1 The Description of LaSIE
LaSIE is a Large-scale Information Ex-
traction system, developed for MUC (Mes-
sage Understanding Conference) competi-
tions, comprised of a variety of modules, see
(Humphreys et al., 1998; MUC, 1998). Al-
though we are not using the complete LaSIE
system in NAMIC, we are using 2 of the key
modules - the Named Entity Matcher and the
Discourse Processor. Below is a description of
each of these modules.
Named Entity Matcher The Named En-
tity Matcher finds named entities through
a secondary phase of parsing which uses a
named entity grammar and a set of gazetteer
lists. It takes as input parsed text from the
first phase of parsing and the named entity
grammar which contains rules for finding a
predefined set of named entities and a set of
gazetteer lists containing proper nouns. The
Name Entity Matcher returns the text with
the Named Entities marked. The Named En-
tities in NAMIC are PERSONS, ORGANI-
SATIONS, LOCATIONS, and DATES. The
Named Entity grammar contains rules for
coreferring abbreviations as well as different
ways of expressing the same named entity
such as Dr. Smith, John Smith and Mr.
Smith occurring in the same article.
Discourse Processor The Discourse Pro-
cessor module translates the semantic rep-
resentation produced by the parser into a
representation of instances, their ontolog-
ical classes and their attributes, in the
XI knowledge representation language (see
Gaizauskas(1996)). XI allows a straightfor-
ward definition of cross-classification hierar-
chies, the association of arbitrary attributes
with classes or instances, and a simple mech-
anism to inherit attributes from classes or in-
stances higher in the hierarchy.
The semantic representation produced by
the parser for a single sentence is processed
by adding its instances, together with their
attributes, to the discourse model which has
been constructed so far for the text.
Following the addition of the instances
mentioned in the current sentence, together
with any presuppositions that they inherit,
the coreference algorithm is applied to at-
tempt to resolve, or in fact merge, each of
the newly added instances with instances cur-
rently in the discourse model.
The merging of instances involves the re-
moval of the least specific instance (i.e. the
highest in the ontology) and the addition of
all its attributes to the other instance. This
results in a single instance with more than one
realisation attribute, which corresponds to a
single entity mentioned more than once in the
text, i.e. a coreference.
3.2 Ontological Modeling
As we have seen in section 3.1, some critical
issues of the NAMIC project rely on the per-
formance of the lexical and conceptual compo-
nents of all linguistic processors. As NAMIC
faces large-scale coverage of news in several
languages we decided to adopt EuroWordNet
(Vossen, 1998) as a common semantic formal-
ism to support:
• lexical semantic inferences (e.g. general-
isation, disambiguation)
• broad coverage (e.g. lexical and semanti-
cal) and
• a common interlingual platform for link-
ing events from different documents.
The NAMIC ontology consists of 40 prede-
fined object classes and 46 attribute types re-
lated to Name Entity objects and nearly 1000
objects relating to EuroWordNet base con-
cepts.
3.2.1 EuroWordNet as a Multilingual
Lexical Knowledge Base
Since the world model aims to describe the
language used in a given domain via events
and objects, the accuracy and breadth of the
model will impact how well the information
extraction works.
EuroWordNet (Vossen, 1998) is a multilin-
gual lexical knowledge base (LKB) with word-
nets for several European languages (Dutch,
Italian, Spanish, German, French, Czech and
Estonian). The wordnets are structured
in the same way as the American wordnet
for English developed at Princeton (Miller,
1990) containing synsets (sets of synonymous
words) with basic semantic relations between
them.
Each wordnet represents a unique
language-internal system of lexicalisa-
tions. In addition, the wordnets are linked
to an Inter-Lingual-Index (ILI), based on
the Princeton WordNet 1.5. WordNet 1.6 is
also connected to the ILI as another English
WordNet (Daude et al., 2000). Via this
index, the languages are interconnected so
that it is possible to go from the words in
one language to words in any other language
having similar meaning. The index also
gives access to a shared top-ontology and
a subset of 1024 Base Concepts (BC). The
Base Concepts provide a common seman-
tic framework for all the languages, while
language specific properties are maintained
in the individual wordnets. The LKB can
be used, among others, for monolingual and
cross-lingual information retrieval, which
has been demonstrated in other projects
(Gonzalo et al., 1998).
3.3 Multilingual Event description
The traditional limitations of a knowledge-
based information extraction system such as
LaSIE have been the need to hand-code in-
formation for the world model - specifically
relating to the event structure of the domain.
For the NAMIC project, we have decided
to semi-automate the process of adding new
‘event descriptions’ to the World Model. To
us, event descriptions can be categorised as a
set of regularly occurring verbs within our do-
main, complete with their subcategorisation
information.
These verbs can be extracted with simple
statistical techniques and are, for the moment
subjected to hand pruning. Once a list of
verbs has been extracted, subcategorisation
patterns can be generated automatically using
a Galois lattice (as described in (Basili et al.,
2000b)). These frames can then be uploaded
into the event hierarchy of the discourse in-
terpreter world model.
The world model can have a structure
which is essentially language independent in
all but the lowest level - at which stage lexi-
calisations relating to each representative lan-
guage are required. Associated with these lex-
icalisations are language dependent scenario
rules which control the behaviour of instances
of these events with a Discourse Model. These
rules are expected to differ across languages in
the way they control coreference for languages
which are constrained to lesser or greater de-
gree.
The lattice generates patterns which refer
to synsets in the WordNet hierarchy. For
our purposes, we will use patterns referring to
Base Concepts in the EuroWordNet hierarchy
- which allows us to exploit the Inter-Lingual-
Index as described in the previous section.
These Base Concepts serve as a level of mul-
tilingual abstraction for the conceptual con-
straints of our events, and allow us to extend
the number of semantic classes from seven
(the MUC Named Entity classifications) to
1024 - the number of base concepts in EWN.
3.4 The NAMIC Architecture
The complexity of the overall NAMIC sys-
tem required the adoption of a distributed
computing paradigm in the design. The sys-
tem is a distributed object oriented system
where services (like text processing or Multi-
lingual Authoring) are provided by indepen-
dent components and asynchronous communi-
cation is allowed. Independent news streams
for the different languages (English, Spanish,
and Italian) are assumed. Language specific
processors (LPs) are thus responsible for text
processing and event matching in indepen-
dent text units in each stream. LPs com-
pile an objective representation (see Fig. 1)
for each source texts, including the detected
morphosyntactic information, categorisation
in news standards (IPTC classes) and descrip-
tion of the relevant events. Any later Au-
thoring activity is based on this canonical
representation of the news. In particular a
monolingual process is carried out within any
stream by the three monolingual Authoring
Engines (English AE, Spanish AE, and Ital-
ian AE). A second phase is foreseen to take
into account links across streams, i.e. multi-
lingual hyper-linking: a Multilingual Author-
ing Engine (M-AE) is here foreseen. Figure
1 represents the overall flow of information.
The Language Processors are composed of a
morphosyntactic (Eng, Ita and Spa MS) and
an event-matching component (EM). The lex-
ical interfaces (ELI, SLI and ItLI) to the uni-
fied Domain model are also used during event
matching.
The linguistic processors are in charge of
producing the objective representation of in-
coming news. This task is performed during
MS analysis by two main subprocessors:
• a modular and lexicalised shallow
morpho-syntactic parser (Basili et al.,
2000c), providing name entity match-
ing and extracting dependency graphs
from source sentences. Ambiguity is
controlled by part-of-speech tagging and
domain verb-subcategorisation frames
that guide the dependency recognition
phase.
• a statistical linear text classifier based
upon some of the derived linguistic fea-
tures (Basili et al., 2000a) (lemmas, POS
tags and proper nouns)
The results are then input to the event
matcher that by means of the discourse in-
terpreter (Humphreys et al., 1998) derive the
objective representation. As discussed in sec-
tion 3.1, coreferencing is a side effect of the
discourse interpretation (Humphreys et al.,
1998). It is based on the multilingual domain
model where relevant events are described and
nominal concepts represented.
The overall architecture is highly modular
and open to load balancing activity as well as
to adaptation and porting. The communica-
tion interfaces among the MS and EM com-
ponents as well as among the AEs and the M-
AE processors are specified via XML DTDs.
This allows for user-friendly uploading of a
back-end database with the detected material
as well as the easy design and management of
the front-end databases (available for tempo-
rary tasks, like event matching after MS). All
the servers are objects in a distributed archi-
tecture within a CORBA environment. The
current version includes the linguistic proces-
sors (MS and EM) for all the three languages.
The English and Italian linguistic processors
are fully object oriented modules based on
EnglishMS
SpanishMS
ItalianMS
EnglishAE
SpanishAE
ItalianAE
news ObjectiveRepresentation Monolingual Links
Multilingual Links
EnglishEM
SpanishEM
ItalianEM
DomainModel
ELI
SLI
ItLI
Multi-Lingual
AuthoringEngine
Language Processors
Figure 1: Namic Architecture
Java. They integrate libraries written in C,
C++, Prolog, and Perl for specific fun tional-
ities (e.g. parsing) running under a Windows
NT platform. The Spanish linguistic proces-
sor shares the discourse interpreter and the
text classifier with the other modules, while
the morpho syntactic component is currently
a Unix server based on Perl. The use of a dis-
tributed architecture under CORBA allowed
a flexible solution to its integration into the
overall architecture. The servers can be in-
stantiated in multiple copies throughout the
network if the amount of required computa-
tion exceeds the capability of a current con-
figuration. As the workload of a news stream
is not easily predictable, distribution and dy-
namic load balancing is the only realistic ap-
proach.
4 Discussion and Future Work
The above sections have provided the out-
line of a general NLP-based approach to auto-
matic authoring. The emphasis given to tra-
ditional capabilities of Information Extraction
depends on the relevance of news content in
the target Web service scenarios as well as
on their inherent multilinguality. The bet-
ter is the generalisation provided by the IE
component, the higher is the independence
from the text source language. As a result,
IE is here seen as a natural approach to cross-
lingual hypertextual authoring. Other works
in this area make extensive use of traditional
IR techniques (e.g. full text search) or rely
on already traced (i.e. manually coded) hy-
perlinks (e.g. (Chakrabarti et al., 1998; Klein-
berg, 1999)). The suggested NAMIC architec-
ture exploits linguistic capabilities for deriv-
ing entirely original (ex novo) resources, over
dynamic, previously unreleased, streams of in-
formation.
The result is a large-scale multilingual NLP
application capitalising existing methods and
resources within an advanced software engi-
neering process. The use of a distributed
Java/CORBA architecture makes the system
very attractive for its scalability and adaptiv-
ity. It results in a very complex (but realis-
tic) NLP architecture. Its organisation (lexi-
cal interfaces with respect to the multilingual
ontology) makes it very well suited for cus-
tomisation and porting to large domains. Al-
though the current version is a prototype, it
realises the complete set of core functionali-
ties, including the main IE steps and the dis-
tributed Java/CORBA layer.
It is worth noticing that a set of extensions
are made viable within the proposed architec-
ture. A first line is the extension of the avail-
able multilingual lexical knowledge. The Dis-
course Model can be used to better reflect on-
tological relationships within a particular do-
main. These relationships could be examined
to confirm known word sense usage as well
as to postulate/propose novel word sense us-
age. Using the mechanism for the addition of
events (as categorised by verbs) to the world
model, users can specify new events which can
be added to the IE system, to achieve User
Driven IE, and deliver a form of adaptive in-
formation extraction.
The instantiated domain models can be
thus used as a basis for ontological resource
expansion as a form of adaptive process.
For example, the stored instantiations of dis-
course models within a specific domain can be
compared: it may be thus possible to recog-
nise new sets of events or objects which are
not currently utilised within the system.
The evaluation strategy that is made possi-
ble within the NAMIC consortium will make
use of the current users (i.e. news agencies)
expertise. The agreed evaluation methods
will provide evidence about the viability of
the proposed large-scale IE-based approach to
authoring, as a valuable paradigm for infor-
mation access.
Acknowledgements
This research is funded by the European
Union, grant number IST-1999-12392. We
would also like to thank all of the partners
in the NAMIC consortium.

References
R. Basili, A. Moschitti, and M.T. Pazienza. 2000a.
Language sensitive text classification. In In
proceeding of 6th RIAO Conference (RIAO
2000), Content-Based Multimedia Information
Access, Coll ge de France, Paris, France.
R. Basili, M.T. Pazienza, and M. Vindigni. 2000b.
Corpus-driven learning of event recognition
rules. In Proc. of Machine Learning for Infor-
mation Extraction workshop, held jointly with
the ECAI2000, Berlin, Germany.
R. Basili, M.T. Pazienza, and F.M. Zanzotto.
2000c. Customizable modular lexicalized pars-
ing. In Proc. of the 6th International Workshop
on Parsing Technology, IWPT2000, Trento,
Italy.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg,
P. Raghavan, and S. Rajagopalan. 1998. Auto-
matic resource compilation by analysing hyper-
link structure and associated text. In Proceed-
ings of the 7th International World Wide Web
Conference, Brisbane, Australia.
C. Cunningham, R. Gaizauskas, K. Humphreys,
and Y. Wilks. 1999. Experience with a lan-
guage engineering architecture: 3 years of gate.
In Proceedings of the AISB’99 Workshop on
Reference Architectures and Data Standards for
NLP, Edinburgh, UK.
J. Daude, L. Padro, and G. Rigau. 2000. Map-
ping wordnets using structural information.
In Proceedings of the 38th Annual Meeting of
the Association for Computational Linguistics
ACL’00, Hong Kong, China.
R. Gaizauskas and K. Humphreys. 1996. Xi:
A simple prolog-based language for cross-
classification and inheritance. In Proceedings of
the 6th International Conference on Artificial
Intelligence: Methodologies, Systems, Applica-
tions (AIMSA96), pages 86–95.
R. Gaizauskas and Y. Wilks. 1998. Information
Extraction: Beyond Document Retrieval. Jour-
nal of Documentation, 54(1):70–105.
J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigar-
ran. 1998. Indexing with wordnet synsets
can improve text retrieval. In Proceedings of
the COLING/ACL’98 Workshop on Usage of
WordNet for NLP, Montreal, Canada.
K. Humphreys, R. Gaizauskas, S. Azzam,
C. Huyck, B. Mitchell, H. Cunningham, and
Y. Wilks. 1998. University of sheffield: De-
scription of the lasie-ii system as used for muc-7.
In Proceedings of the Seventh Message Under-
standing Conferences (MUC-7). Morgan Kauf-
man. Available at http://www.saic.com.
Jon M. Kleinberg. 1999. Authoritative sources
in a hyperlinked environment. Journal of the
ACM, 46(5):604–632.
G. Miller. 1990. Five papers on wordnet. Inter-
national Journal of Lexicography, 4(3).
1998. Proceedings of the Seventh Message Under-
standing Conference (MUC-7). Morgan Kauf-
man. Available at http://www.saic.com.
M.T. Pazienza, editor. 1997. Information Ex-
traction. A Multidisciplinary Approach to an
Emerging Information Technology. Number
1299 in LNAI. Springer-Verlag, Heidelberg,
Germany.
P. Vossen. 1998. EuroWordNet: A Multilin-
gual Database with Lexical Semantic Networks.
Kluwer Academic Publishers, Dordrecht.
