The Linguistic Annotation System of the Stockholm - Ume~ 
Corpus Project 
Gunnel K~llgren & Gunnar Eriksson 
Institute of Linguistics 
Stockholm University 
S-106 91 Stockholm 
gunnel@ling.su.se, gunnar@ling.su.se 
In the Stockholm - Ume~ Corpus project, SUC, 
we have developed and applied a system for repre- 
senting lexical and morphological information about 
word forms in unrestricted text. Our poster presents 
results and experiences from the application of the 
system to 300,000 word forms, a subpart of a larger 
corpus. 
The application of the system is carried out in two 
steps, an automatic lexical look up followed by 
homograph separation, which is done partly automat- 
ically, partly manually. Lexical and morphological 
analysis and disambiguation of Swedish is a rather 
complicated task, a fact which should hold for sev- 
eral other languages as well. Below a sample text is 
given, showing both the amount of information that 
has to be specified for each word form and the degree 
of ambiguity to he resolved. 
("<*sj ~ilv>" <161> 
("sj~ilv" NN NEU SIN IND ION) 
("silly" NN NEU PLU IND NOM) 
("sj~iv" 33 POS UTR SIN IND NON) 
("sj~.lv" PM N014)) 
("<r6kar>" <162> 
("r~d~a" VB PRS AKT) 
("r~k" NN UTR PLU IND NON)) 
("<hon>" <163> 
("hon" PN UTR SIN DEF SUB) 
C"ho" NN UTR SIN DEF NOM)) 
("<ul;>" <164> 
("ut" AS)) 
("<fSr>" <165> 
("fHr" PP) 
("fHr" AB) 
("fHr" Sl) 
("f6r" KI) 
("fHr" IN ~ SIN I~ IOM) 
("fHr" VB PRS AKT) 
("~Hr" VB zHP *KT)) 
("<en>" <166> 
("en" DT UTE SII lID) 
("en" RG UTR SIi lID IOM) 
("en" PI UTR SII lid SUB/OBJ) 
("en" AB) 
("en" NN UTR SII lid iOM)) 
("<k~Lkfa~are>" <167> 
("k~k_farare" NN UTR SIN IND NOM) 
("kkk_farare" NN UTR PLU lID NOM)) 
("<SON>" <168> 
("som" HP - - -) 
("sos" HA) 
("som" Ell)) 
("<nisshandlar>" <169> 
("miss_handla" VB PRS AKT) 
("miss_handel" NN UTR PLU IND NOM)) 
("<och>" <170> 
("och" KI)) 
("<fHr~Imjukar>" <171> 
("fHrHdmjuka" VB PP.S AKT)) 
("<henne>" <172> 
("hon" Pi UTR SI| DEF OBJ)) 
470 
