File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/w94-0103_metho.xml

Size: 30,155 bytes

Last Modified: 2025-10-06 14:13:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0103">
  <Title>References</Title>
  <Section position="3" start_page="0" end_page="23" type="metho">
    <SectionTitle>
1. The &amp;quot;Noisy Channel&amp;quot; 's
</SectionTitle>
    <Paragraph position="0"> All the researchers in the field of Computational Linguistics, no matter what their specific interest may be, must have noticed the impetuous advance of the promoter of statistically based methods in linguistics. This is evident not only because of the growing number of papers in many Computational Linguistic conferences and journals, but also because of the many specific initiatives, such as workshops, special issues, and interest groups.</Paragraph>
    <Paragraph position="1"> An historical account of this &amp;quot;empirical renaissance&amp;quot; is provide in \[Church and Mercer, 1993\]. The general motivations are: availability of large on-line texts, on one side, emphasis on scalability and concrete deliverables, on the other side.</Paragraph>
    <Paragraph position="2"> We agree on the claim, supported by the authors, that statistical methods potentially outperform knowledge based methods in terms of coverage and human cost. The human cost., however, is not zero. Most statistically based methods either rely on a more or less shallow level of linguistic preprocessing, or they need non trivial human intervention for an initial estimate of the parameters (training). This applies in particular to statistical methods based on Shannon's Noisy Channel Model (n-gram models). As far as coverage is concerned~ so far no method described in literature could demonstrate an adequate coverage of the linguistic phenomena being studied. For example, in collocational analysis, statistically refiable associations are obtained only for a small fragment of the corpus. The problem of&amp;quot; &amp;quot;low counts&amp;quot; (i.e. linguistic patterns that were never, or rarely found) has not been analyzed appropriately in most papers, as convincingly demonstrated in \[Dunning, 1993\].</Paragraph>
    <Paragraph position="3"> In addition, there are other performance figures, such as adequacy, accuracy and &amp;quot;linguistic appeal&amp;quot; of the acquired knowledge for a given application, for which the supremacy of statistics is not entirely demonstrated. Our major objection to purely statistically based approaches is in fact that they treat language expressions like stings of signals. At its extreme, this perspective may lead to results that by no means have practical interest, but give no contribution to the study of language.</Paragraph>
    <Paragraph position="4">  2...and the &amp;quot;Braying Donkey&amp;quot;'s On the other side of the barricade, there are the supporters of more philosophical, and theoretically sound, models of language. We hope these scholars will excuse us for categorising their very serious work under such a funny label. Our point was to playfully emphasise that the principal interest in human models of language communication motivated the study of rather odd language expressions, like the famous &amp;quot;Donkey Sentences &amp;quot;1. The importance of these sentences is not their frequency in spoken, or written language (which is probably close to zero), but the specific linguistic phenomena they represent.</Paragraph>
    <Paragraph position="5"> The supporters of theoretically based approaches cannot be said to ignore the problem of applicability and scalability but this is not a priority in their research. Some of these studies rely on statistical analyses to gain evidence of some phenomenon, or to support empirically a theoretical framework, but the depth of the lexical model posited eventually makes a truly automatic learning impossible or at least difficult on a vast scale.</Paragraph>
    <Paragraph position="6"> The ensign of this approach is Pustejovsky, who defined a theory of lexical semantics making use of a rich knowledge representation framework, called the qualia structure. Words in the lexicon are proposed to encode all the important aspects of meaning, ranging from their argument structure, primitive decomposition, and conceptual organisation. The theory of qualia has been presented in several papers, but the reader may refer to \[Pustejovsky and Boguraev, 1993\], for a rather complete and recent account of this research. Pustejovsky confronted with the problem of automatic acquisition more extensively in \[Pustejovsky et al. 1993\]. The experiment described, besides producing limited results (as remarked by the author itself), is hardly reproducible on a large scale, since it presupposes the identification of an appropriate conceptual schema that generalises the semantics of the word being studied.</Paragraph>
    <Paragraph position="7"> The difficulty to define sealable methods for lexical acquisition is an obvious drawback of using a rich lexical model. Admittedly, corpus research is seen by many authors in this area, as a tool to fine-tune lexical  structures and support theoretical hypothesis.</Paragraph>
    <Paragraph position="8"> 3. Adding semantics to the  corpus statistics recipe...</Paragraph>
    <Paragraph position="9"> Indeed, the growing literature in lexical statistics demonstrates that much can be done using purely statistical methods. This is appealing, since the need for heavy human intervention precluded to NLP techniques a substantial impact on real world applications. However, we should not forget that one of the ultimate objectives of Computational Linguistic is to acquire some deeper insight of human communication.</Paragraph>
    <Paragraph position="10"> Knowledge-based, or syraboHc, techniques should not be banished as impractical, since no computational system can ever learn anything interesting if it does not embed some, though primitive, semantic model 2.</Paragraph>
    <Paragraph position="11"> In the last few years, we promoted a more integrated use of statistically and knowledge based models in language learning. Though our major concern is applicability and scalability in NLP systems, we do not believe that the human can be entirely kept out of the loop. However, his/her contribution in defining the semantic bias of a lexical learning system should ideally be reduced to a limited amount of time constrained, well understood, actions, to be performed by easily founded professionals.</Paragraph>
    <Paragraph position="12"> Similar constraints are commonly accepted when customising Information Retrieval and Database systems.</Paragraph>
    <Paragraph position="13"> Since we are very much concerned with sealability asia\[ with what we call linguistic appeal, our effort has been to demonstrate that &amp;quot;some&amp;quot; semantic knowledge can be modelled at the price of limited human intervention, resulting in a higher informative power of the linguistic data extracted from corpora. With purely statistical approaches, the aequked lexical information has no finguisfic content per se until a human analyst assigns the correct interpretation to the data. Semantic modelling can be more or less coarse, but in any case it provides a means to categorise lwhere the poor donkey brays since it is beated all the time..</Paragraph>
    <Paragraph position="14">  language phenomena rather that sinking the linguist in milfions of data (collocations, ngrams, or the like), and it supports a more finguistically oriented large scale language analysis. Finally, symbolic computation, unlike for statistical computation, adds predictive value to the data, and ensures statistically reliable data even for relatively small corpora.</Paragraph>
    <Paragraph position="15"> Since in the past 3-4 years all our research was centred on finding a better balance between shallow (statistically based) and deep (knowledge based) methods for lexical learning, we cannot give for sake of brevity any complete account of the methods and algorithms that we propose. The interest reader is referred to \[Basili et al, 1993 b and c\], for a summary of ARIOSTO, an integrated tool for extensive acquisition of lexieal knowledge from corpora that we used to demonstrate and validate our approach.</Paragraph>
    <Paragraph position="16"> The learning algorithms that we def'med, acquire some useful type of lexical knowledge (disambiguation cues, selectional restrictions, word categories) through the statistical processing of syntactically and semantically tagged collocations. The statistical methods are based on distributional analysis (we defined a measure called mutual conditioned plausibility, a derivation of the well known mutual information), and cluster analysis (a COBWEB-like algorithm for word classification is presented in \[Basili et al, 1993,a\]). The knowledge based methods are morphosyntactic processing \[Basili et al, 1992b\] and some shallow level of semantic categorisation.</Paragraph>
    <Paragraph position="17"> Since the use of syntactic processing in combination with probability calculus is rather well established in corpus linguistics, we will not discuss here the particular methods, and measures, that we defined.</Paragraph>
    <Paragraph position="18"> Rather, we will concentrate on semantic categorisation, since this aspect more closely relates to the focus of this workshop: What knowledge can be represented symbolically and how can it be obtained on a large scale ? The title of the workshop, Combing symbolic and statistical approaches.., presupposes that, indeed, one such combination is desirable, and this was not so evident in the literature so far.</Paragraph>
    <Paragraph position="19"> However, the what-and-how issue raised by the workshop organisers is a crucial one. It seems there is no way around: the more semantics, the less coverage. Is that so true? We think that in part, it is, but not completely. For example, categorizing collocations via semantic tagging, as we propose, add predictive power to the collected collocadons, since it is possible to forecast the probability of collocations that have not been detected in the training corpus. Hence the coverage is, generally speaking, higher.</Paragraph>
    <Paragraph position="20"> In the next section we will discuss the problem of finding the best source for semantic categorization. There are many open issues here, that we believe an intersting matter of discussion for the workshop.</Paragraph>
    <Paragraph position="21"> In the last section we (briefly) present an example of very useful type of lexical knowledge that can be extracted by the use of semantic categorization in combination with statistical methods.</Paragraph>
    <Paragraph position="22"> 4. Sources of semantic categorization We first presented the idea of adding semantic tags in corpus analysis in \[B'asili et al. 1991 and 1992a\], but other contemporaneous and subsequent papers introduced some notion of semantic categorisation in corpus analysis. \[Boggess et al, 1991\] used rather fine-tuned seleetional restrictions to classify word pairs and triples detected by an n-gram model based part of speech tagger. \[Grishman 1992\] generalises automatically acquired word triples using a manually prepared full word taxonomy. More recently, the idea of using some kind of semantics seems to gain a wider popularity. \[Resnik and Hearst, 1993\] use Wordnet categories to tag syntactic associations detected by a shallow parser. \[Utsuro et al., 1993\] categorise words using the &amp;quot;Bunrui Goi Hyou&amp;quot; (Japanese) thesaurus.</Paragraph>
    <Paragraph position="23"> In ARIOSTO, we initially used hand assigned semantic categories for two italian corpora, since on-line thesaura are notcurrently available in Italian. For an English corpus, we later used Wordnet.</Paragraph>
    <Paragraph position="24"> We mark with semantic tags all the words that are included at least in one collocation extracted from each application corpus.</Paragraph>
    <Paragraph position="25"> In defining semantic tags, we pursued two contrasting requirements: portability and reduced manual cost, on one side, and the value-added to the data by the semantic  markers, on the other side. The compromise we conformed to is to select about 10-15 &amp;quot;naive&amp;quot; tags, that mediate at best between generality and domain-appropriateness.</Paragraph>
    <Paragraph position="26"> Hand tagging was performed on a commercial and a legal domain (hereafter CD and LD), both in Italian. Examples of tags in the CD are: MACHINE (grindstone, tractor, engine), BY_PRODUCT (wine, milk, juice). In the LD, examples are: DOCUMENT (law, comma, invoice)and REALESTATE (field, building, house).</Paragraph>
    <Paragraph position="27"> There are categories in common between the two domains, such as HUMAI~ENTITY, PLACE, etc. The appropriate level of generality for categories, is roughly selected according to the criterion that words in a domain should be evenly distributed among categories. For example, BY_PRODUCT is not at the same level as HUMAN EN1TY in a domain general classification, but in the CD there is a very large number of words in this class.</Paragraph>
    <Paragraph position="28"> For what concerns ambiguous words, many subtle ambiguities are eliminated because of the generality of the tags. Since all verbs are either ACTs or STATEs, one has no choices in classifying an ambiguous verb like make. This is obviously a simplification, and we will see later its consequences. On the other side, many ambiguous senses of make are not found in a given domain. For example, in the commercial domain, make essentially is used in the sense of manufacturing.</Paragraph>
    <Paragraph position="29"> Despite the generality of the tags used, we experiment that, while the categorisation of animates and concrete entities is relatively simple, words that do not relate with bodily experience, such as abstract entities and the majority of verbs, pose hard problems.</Paragraph>
    <Paragraph position="30"> An alternative to manual classification is using on-line thesaura, such as Roget's and Wordnet categories in English 3. We experimented Wordnet on our English domain (remote sensing abstracts, RSD).</Paragraph>
    <Paragraph position="31"> The use of domain-general categories, such as those found in thesaura, has its evident drawbacks, namely that the categorisation principles used by the linguists are inspired by philosophical concerns and personal intuitions, while the purpose of a type hierarchy in a NLP system is more practical, for example expressing at the highest level</Paragraph>
    <Section position="1" start_page="23" end_page="23" type="sub_section">
      <SectionTitle>
Iuflian
</SectionTitle>
      <Paragraph position="0"> of generality the selectional constrains of words in a given domain. For one such practical objective, a suitable categorization pnnciple is similarity in words usage.</Paragraph>
      <Paragraph position="1"> Though Wordnet categories rely also on a study of collocations in corpora (the Brown corpus), word similarity in contexts is only one of the classification pdncipia adopted, surely not prevailing. For example, the words resource, archive and file are used in the RSD almost interchangeably (e.g.</Paragraph>
      <Paragraph position="2"> access, use, read .from resource, archive, file). However, resource and archive have no common supertypC/ in Wordnet.</Paragraph>
      <Paragraph position="3"> Another pro.blem is over-ambiguity. Given a specific application, Wordnet tags create many unnecessary ambiguity. For example, we were rather surprised to find the word high classified as a PERSON (--soprano) and as an ORGANIZATION (=high school). &amp;quot;this wide-speetnma classification is very useful on a purely linguistic ground, but renders the classification unusable as it is, for most practical applications. In the RSD, we had 5311 different words of which 2796 are not classified in Wordnet because they are technical terms, proper nouns and labels. For the &amp;quot;known&amp;quot; words, the avergae ambiguity in Wordnet is 4.76 senses per word. In order to reduce part of the ambiguity, we (manually) selected 14 high-level Wordnet nodes, like for example:</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="23" end_page="27" type="metho">
    <SectionTitle>
COGNITION, ARTIFACT,
ABSTRACTION, PROPERTY, PERSON,
</SectionTitle>
    <Paragraph position="0"> that seemed appropriate for the domain. This reduced the average ambiguity to 1.67, which is still a bit too high (soprano ?), i.e.</Paragraph>
    <Paragraph position="1"> it does not reflect the ambiguity actually present in the domain. There is clearly the need of using some context-driven disambiguafion method to automatically reduce the ambiguity of Wordnet tags. For example, we are currently experimenting an algorithm to automatically select from Wordnet the &amp;quot;best level&amp;quot; categories for a given corpus, and eliminate part of the unwanted ambiguity. The algorithm is based on the Machine Learning method for word categorisation, inspired by the well known study on basic-level categories \[Rosch, 1978\], presented in \[Basili et al, 1993a\].</Paragraph>
    <Paragraph position="2"> Other methods that seem applicable to the problem at hand have been presented in the literature \[Yarowsky 1992\].</Paragraph>
    <Paragraph position="3"> 24.</Paragraph>
    <Paragraph position="4"> 5. Producing wine, statements, and data: on the acquisition of selectionai restrictions in sub languages Since our objective is to show that adding semantics to the standard corpus linguistics recipe (collocations + statistics) renders the acquired data more linguistically appealing, this section is devoted to the linguistic analysis of a case-based lexicon. The algorithm to acquire the lexicon, implemented in the ARIOSTQLEX system, has been extensively described in \[Basili et al, 1993c\]. In short, the algorithms works as follows: First, collocations extracted from the application corpus are clustered according to the semantic and syntactic tag 4 of one or both the co-occurring content words. The result are what we call clustered association d a t a . For example, V_prep_N(sell, to,shareholder) and V_prep_N(assign, to, tax-payer), occurring with frequency f/and.t2 respectively, are merged into a unique association V_prep_N(ACT, to, HUMAN ENTITY) with frequency fl+f2. The statistically relevant conceptual associations are presented to a linguist, that can replace syntactic patterns with the underlying conceptual relation (e.g. \[ACT\]&gt;(beneficiary)-&gt;\[HUMAl~ENTITY\]). null These coarse grained selectional restrictions are later used in AIOSTO LEX for a more refined lexical acquisition phase. We have shown in \[Basili et al, 1992a\] that in sub languages there are many unintuitive ways of relating concepts to each other, that would have been very hard to find without the help of an automatic procedure.</Paragraph>
    <Paragraph position="5"> Then, for each content word w, we acquire all the collocations in which it participates. We select among ambiguous patterns using a preference method described in \[Basili et al, 1993 b, d\]. The detected collocations for a word w are then generalised using the coarse grained selectional restrictions 4We did not discuss of syntactic tags for brevity. Our (not-so) shallow parser detects productive pairs and triples like verb subject and direct object (N_V and V N, respectively), prepositional triples between non adjacent words (N_prep_N, V_lxeP_~, etc.</Paragraph>
    <Paragraph position="6"> acquired during the previous phase. For example, the following collocations including the word measurement in the RSD:</Paragraph>
    <Paragraph position="8"> and V_prep_N( infer, from, measurement) let the ARIOSTO_LEX system learn the following selectional restriction: \[COGNrNONI &lt;-(/igurative_source)&lt;-\[measurement\], where COGNITION is a Wordnet category for the verbs determine, infer and derive, and figurative_source is one of the conceptual relations used. Notice that the use of conceptual relations is not strictly necessary, though it adds semantic value to the data. One could simply store the syntactic subeategorization of each word along with lhe semantic restriction on the accompanying word in a collocation, e.g.</Paragraph>
    <Paragraph position="9"> something like: measurement. (V_prep_N .from, COGNITION(V)). It is also possible to cluster, for each verb or verbal noun, all the syntactic subcategorization frames for which there is an evidence in the corpus. In this case, lexieal acquisition is entirely automatic.</Paragraph>
    <Paragraph position="10"> The selectional restrictions extensively acquired by ARIOSTQLEX are a useful type of lexical knowledge that could be used virtually in any NLP system. Importantly, the linguistic material acquired is linguistically appealing since it provides evidence for a systematic study of sub languages. From a cross analysis of words usage in three different domains we gained evidence that many linguistic patterns do not generalise across sub languages. Hence, the application corpus is an ideal source for lexical acqu~ition.</Paragraph>
    <Paragraph position="11">  In fig. 1 we show one of the screen out of ARIOSTO_LEX. The word shown is measurement, very frequent in the RSD, as presented to the linguist. Three windows show, respectively, the lexical entry that ARIOSTQLEX proposes to acquire, a list of accepted patterns for which only one example was found (lexical patterns are generalized only when at least two similar patterns are found), and a list of rejected patterns. The linguist can modify or accept any of these choices. Each acquire d selectional restriction is represented as follows: pre semJex(word, conceptual relation, semantic tag 5, direction, SE, CF) the first four arguments identify the selectional restriction and the direction of the conceptual relation, Le.: \[measurement\]&lt;-(OBJ)&lt;-\[COGNITION\] (e.g. calculate, setup, compare...a</Paragraph>
    <Paragraph position="13"> (e.g. measurement from satellite, aircraft, radar) SE and CF are two statistical measures of the semantic expectation and confidence of the acquired selectional restriction (see the aforementioned papers for details).</Paragraph>
    <Paragraph position="14"> ARIOSTO_LEX provides the linguist with several facilities to inspect and validate the acquired lexicon, such as examples of phrases from which a selectional restriction was derived, and other nice gadgets. For example, the central window in Figure 1 (opened only on demand) shows the Conceptual Graph of the acquired entry.</Paragraph>
    <Paragraph position="15"> The Conceptual Graph includes the extended Wordnet labels for each category.</Paragraph>
    <Paragraph position="16"> One very interesting matter for linguistic analysis is provided by a cross-comparison of words, as used in the three domains.</Paragraph>
    <Paragraph position="17"> Many words, particularly verbs, exhibit completely different patterns of usage. Here are some examples: The verb produrre (to produce) is relatively frequent in all the three domains, but exhibit very different selectional restrictions. In the  e.g. the satellite produced an image with high accuracy, the NASA produced the data..</Paragraph>
    <Paragraph position="18"> in the CD (commercial) we found: produce -&gt;(obj)-&gt; ARTIFACT -&gt;(agent)-&gt; HUMAN_ENTITY -&gt;(instruraent)-&gt;MACHINE 6 e.g.: la d/tta produce vino con macchinari propri (*the company produces wine with owned machinery) and in the LD (legal): produce -&gt;agent)-&gt;HUMAI~ENTITY -&gt;(theme)-&gt; DOCUMENT e.g.: il contn'buente deve produrre la dichiarazione (the tax payer must produce a statement) It is interesting to see which company the word &amp;quot;ground&amp;quot; keeps in the three domains. The RSD is mostly concerned with its physical properties, since we find patterns like: measure -&gt;(obj)-&gt; PROPERTY/A'ITR/BUTE &lt;-(characteristic)&lt;- ground (e.g. to measure the feature, emissivity, image ,surface of ground) In the CD, terreno (=ground) is the direct object of physical ACTs such as cultivate, reclaim, plough, etc. But is also found in patterns like: BYPRODUCT -&gt;(source)-&gt; terreno (e.g. patate, carote ed altn'prodotti del terreno = potatoes, carrots and other ground products) in the LD, terreno is a real estate, object of transactions, and taxable as such. The generaJised pattern is: 6MACHINE is the same as the Wordnet class INSTRUMENTALITY. Notice that we used Wordnet categories for our English corpus only later in our research. Perhaps we could fmd a Wordnet tag name for each of our previous manually assigned tags in the two Italian domains, but this would be only useful for presentation purposes. In fact, since there is not as yet an Italian version of Wordnet (though it will be available soon), we cannot classify automaO~lly.</Paragraph>
    <Paragraph position="19"> TRANSACTION-&gt;(obj)-&gt; terreno (vendere, acquistare, permutare terreno = sell, buy, exchange a ground) AMOUNT &lt;-(source)&lt;- terreno (e.g. costo, rendita, di terreno= price, revenue of ( =deriving from the ownership o~ ground) And what is managed in the three domains? In the RSD, one manages COGNITIVECONTENT, such as image, information, data etc. The manager is a human ORGANIZATION, but also an ARTIFACT (a system, an archive).</Paragraph>
    <Paragraph position="20"> In the CD, the pattern is:  (e.g. gestire la vendita di alimentari nel negozio.. = to manage the sale of food in shops ) Finally, in the LD, the pattern is: manage -&gt;(agent)-&gt;HUMAN_ENTITY -&gt;(obj)-&gt;\[AMOUNT,ABSTRACTION\] (e.g. gestione di tributi, fondi, credito, debito etc. = management of taxes, funding, credit, debit) \[ x:;r ~-;T4s~-t~(~,ur=~.~,~-;,-~-~,-d~i~/C;TTb0b * ;..Y~=:--,')..' , , pr w.sore. I e~(measur~er.t, th4m, A, &amp;quot;(&amp;quot;. ~.~,'_//, n I I, *. |lOOO&amp;quot;,&amp;quot;. 11000&amp;quot;,&amp;quot; *&amp;quot; ). Fe.sv~.. I vx(messw'we~nt, thyme,SO, &amp;quot;(&amp;quot; ,G.V.N,nl 1. &amp;quot;.?O00&amp;quot;, ' .?000&amp;quot;, &amp;quot;-* ). prv.~m, l vMmns4arvm*~t, U'~mm, ~, * (&amp;quot; ,G.V.H,nl I, &amp;quot;.11000 deg , *. riO00', ' *&amp;quot; ). pre.~,4~n. 1 oaCmea~ur'm;nnt, type.oC/,CGl~, &amp;quot;,'.&amp;quot; ,G.I(.P.N, of, &amp;quot;. 10000 deg , *. 10000&amp;quot;, &amp;quot;-&amp;quot; ). rw.e. r,n'a. I a ,~( qt4~ S~ rm~.ir.t ~ aoJoc?., PA, &amp;quot;)&amp;quot; ,G.f/.ri,n! I, &amp;quot;.22noo&amp;quot;, ' .~2000&amp;quot;, &amp;quot;-&amp;quot; ). I'u&amp;quot; ~.~.um.  |*~(men mr,.r~*r.t, anJm=~., Pit, *)&amp;quot;, G.H.P_I/,OF, &amp;quot;.3flOflO',&amp;quot; .32000&amp;quot;, &amp;quot;-' ). pre.se~, lt~(~l~sure~ent.~: |ect ,PRo &amp;quot;)&amp;quot; ,G_V.R.ni l o &amp;quot; .2000&amp;quot; , &amp;quot; .32000&amp;quot; ,'- &amp;quot; ). pr e_ |era. I e &lt;~ me~sur e~er.t. ::.|ec t. PRo &amp;quot;)',G.tJ.N.n! I. &amp;quot;.~OOO&amp;quot; ~&amp;quot;. 13000&amp;quot;,&amp;quot; -&amp;quot; ). i F~v~v~v~(nrvvs~.~`t*..~r~t~u~vc;~)~*G~P~M~ni~2~2~+~)~ w'v.vun_ lva{~v~',r.vr.t, :nw'vC/~vr Iv, (Cn. ~TTR. &amp;quot;) deg .G.&amp;.H,.t |, deg.2~000&amp;quot;, deg.20000 deg,'- deg). : rr4. I tinier,. 1 ex(.~sur ~.~nt ~ ~ns~rl~.~lr.t ~  |NS, &amp;quot;)' ,G_H,P.NI - frm' ~ ' .4000o ~ &amp;quot;. 3~000&amp;quot; , ,, ). pre_li~.~o.le~(~eas~,a'e~cent.character~st,l~,AT'TR Ira, ...... .~&amp;quot;~-.~.- :~-&amp;quot; ...... ,~m,r~l \] )re.l|r..~G. le~(:eas~'4~ent~charact4rfstta,t~,&amp;quot; r ~U|T 1 iro_ I I~c. 1 o&lt;(.~aour ecent, I n.~'ur=|n t, ZH$, &amp;quot;.~&amp;quot; ,I r.v,surm.t:- ) ~_1 It. ~a. 1 e,4(amam~-~ir~t t,~aarm4r~.l, &amp;quot;(&amp;quot; ~ G.N.P.I ,re_ I !~r.&lt;_ 1 e &lt; (~.~ m ~&amp;quot; eJc *n t ,l~u,-rmse, .1,&amp;quot; (&amp;quot;, G_i',/_P. ':** ~&amp;quot;e. 1 !r..c c~. 1 *,'.(tea suree eat, puC/'pOSe, ,~, &amp;quot;( &amp;quot;, G.V_P. p&amp;quot; e_ I le.~. I e x (.~ sur ~4n ,. purpose, .I. * ( *, G./~I..P. G. N. V(a.ea sur ~0r.41 ~ t ,~1 |. C/cmdtaC t, 1) G. N.~'( mea sur ~ll~.en t ,nl 1 ,COrrtlpO~, 1)  i G. N. V(m, vsur~n~er.t, ~t * I ~,*~blv,  |) G.N.V(m.vsstm't4.~r.t,n~ I, :.w.Hve, 1 ) G.N.V(mesc.urcm~nt,~l~ :, pot ~Qrm~ 1) G_N.VOu~:rar~er.t, nl : ~ r~-ov iml ~  |} ~.N_~'(maa~u~'#mqnt,nl I ,relate, 1) G,N.~.'(me~sure~r.ent ,nl 1 ~r esp(c)nle.~., 1)</Paragraph>
    <Paragraph position="22"> It is interesting to see how little in common these patterns have. Clearly, this information could not he derived from dictionaries or thesaura. Though the categories used to cluster patterns of use are very high level (especially for verbs), still they capture very well the specific phenomena of each sublanguage.</Paragraph>
  </Section>
  <Section position="5" start_page="27" end_page="27" type="metho">
    <SectionTitle>
6. Concluding remarks
</SectionTitle>
    <Paragraph position="0"> In this paper we supported the idea that &amp;quot;some amount&amp;quot; of symbofic knowledge (high-level semantic markers) can be added to the standard lexical statistics recipe with several advantages, among which categorization, predictive power, and linguitic appeal of the acquired knowledge.</Paragraph>
    <Paragraph position="1"> For sake of space, we could not provide all the evidence (algorithms, data and performance evaluation) to support our arguments. We briefly discussed, and gave examples, of our system for the semi-automatic acquisition, on a large scale, of selectionl restrictions. ARIOSTO_LEX has its merits and limitations. The merit is that it acquires extensively, with limited manual cost, a very useful type of semantic knowledge, usable virtually in any NLP system. We demonstrated with several examples that selectional restrictions do not generalize across sublanguages, and acquiring them by hand is often inintuitive and very time-consuming.</Paragraph>
    <Paragraph position="2"> The limitation is that the choice of the appropriate conceptual types is non trivial, even when selecting very high-level tags.</Paragraph>
    <Paragraph position="3"> On the other hand, selecting categories from on-line thesaura poses many problems, particularly because the categorization principia adopted, may not be adequate for the practical purposes of a NLP system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML