File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0513_metho.xml

Size: 12,670 bytes

Last Modified: 2025-10-06 14:15:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0513">
  <Title>Electronic Dictionaries and Linguistic Analysis of Italian Large Corpora</Title>
  <Section position="3" start_page="0" end_page="93" type="metho">
    <SectionTitle>
1. The Italian Electronic Dictionaries
</SectionTitle>
    <Paragraph position="0"> The DELI system contains several electronic dictionaries of simple words and of compound words The electromc dictionary of s~mple words, named DELAS, contains about 100 000 Itahan entries to which an alphanumencal code has been assigned This code refers to the grammatlcal category of the word and to ~ts mflect~onal paradigm What follows is an example of the DELAS d~ct~onary dottore N80 cortese A79 amare V3 dl PREP lentamente AVV the noun (N) dottore is given above m mascuhne singular canomc form The adjective (A) cortese is m the mascuhne smgular canomc form Verbs (V) are listed m the mflmtwe form, as amare Those items which do not inflect are assigned a code indicating only the grammatical category, as above shown for the preposition dt and the adverb lentamente The numerical code refers to specific mflechonal algorithms For example, code 80, assoclated to nouns, corresponds to the endings N80 ms ~ mp fp -e -essa -i -esse thus indicating that all nouns as dottore, i e campzone, professore, etc, inflect by adding to the root -e for the mascuhne smgular (ms), -essa for the feminine singular (fs), -l for the mascuhne plural (mp) and -esse for the feminine plural (fp) On the other hand, adjectives encoded A79, as cortese but also trtbale, etc, can be described by the following reflectional model</Paragraph>
    <Paragraph position="2"> m which the mascuhne and feminine singular (ms and Is) on one side, and the mascuhne and feminine plural (rap and fp) on the other, correspond to the homographic forms cortese and cortest The algorithm for verbs is more complicated since it has to refer to 40 forms referring to simple tenses Therefore, verbs like amare, abbandonare, tmparare, etc, are encoded as V3 which corresponds to the following endings and grammatical values</Paragraph>
    <Paragraph position="4"> For example, the first line of the list above mdIcates that In order to form the Indlcatwe Present (md/pres) of the verb amare, It IS necessary to delete the last three characters of the infinitive form of the verb and add o for the first singular person, t for the second singular person, a for the third singular person, zamo for the first plural, ate for the second plural, and finally ano for the thlrd plural Using DELAS and its mflecnon codes, a software allows to automatically generate all mflected forms The result will be the electromc dictionary of reflected forms of Italian simple words, named DELAF, which has the followmg structure</Paragraph>
    <Paragraph position="6"> characters between blank spaces or separators On the other hand, dictionaries of compound words contain those words which are formally defined as sequences of words, they contain spaces or separators Compound words are constrained sequences of words which can have either a metaphorical meaning as cavaUo dl battagha, which means &amp;quot;something at which somebody particularly excels, somebody's favourlte p~ece&amp;quot; or a &amp;quot;neutral&amp;quot; or techmcal meaning as carta dt credtto, m Enghsh credit card Compound words are constrained sequences of words, since the subst~tutmn of lex~cal elements within the sequence w~th synonyms most of the time produces unacceptable compounds, as the following examples show  the compound ~tems are followed by a symbol of part of speech (N), the separator &amp;quot;+&amp;quot; ~s followed by the internal structure of the compound The first two items are formed by a Noun, a Preposmon and a Noun (NPN), the third and fourth items are formed by two nouns (NN), the fifth item ts formed by a Noun and an AdJective (NA)I whale the last item is formed by an Adjective and a Noun (AN) Columns are followed by the gender and number the examples are either mascuhne singular (ms) or feminine sxngular (fs)  Finally, two marks indicate above morphological variations In gender and number If the varlatmns are accepted then the mark zs &amp;quot;+&amp;quot;, ff they are not accepted the mark ~s &amp;quot;-&amp;quot; The internal structure defines the element of the compound which inflects compounds which belong to the class NPN inflect the first noun, compounds whichbelong to the classes NA and AN reflect both elements, while compounds which belong to the NN class can either reflect both nouns, as hngua madre(fs) - hngue madrt(fp), or only the first noun, as pesce spada(ms) - pesct spada(mp) Once we codify the morphological behawour of compound nouns m such a way, a set of computatmnal routines allows to automatically generate the DELACF, that Is the electromc dlctmnanes of reflected forms of Italian compound nouns, which has the following format</Paragraph>
  </Section>
  <Section position="4" start_page="93" end_page="93" type="metho">
    <SectionTitle>
2. The Automatic Morpho-lexical Analysis
</SectionTitle>
    <Paragraph position="0"> Once dlctmnarles of simple and compound words are built, it is possible to apply them to texts by means of the programme INTEX of morpho-lexlcal analysis Th~s software has been developed by Max Sdberztem and allows to load electromc dictionaries of simple and of compound words structured m the way shown above INTEX applies both dictionaries to a text and builds the d~ctmnary of that text which wdl contain not only simple words but also all compound nouns present in the text This step allows to recogmze and hlghhght within a text all compounds - Che stone dlcono ~ - chiedo - Io non so mente So che Im ha un negoz~o, senza llnsegna lununosa Ma non so nemmeno dov'~ Me 1o sptega E' un negozlo dI pellamt, vahge e artlcoh da vlagglo Non ~ sulla pmzza della stazmne ma m una vm laterale, vlcmo al passagg~o a hvello dello scalo rnerc~ and also to build a frequency hst for them  Such an lndexatlon Is extremely reliable for the management of technical and scientific documentation Technical documents contain a lot of terminology which Includes mostly compound nouns INTEX gives us the possibility of loading more than one dictionary, so, the user can build not only a DELACF for generic compounds but also specialized dictionaries of compounds belonging to various fields such as Economy, Engineering, Computer Science, and so on It is then possible to analyze technical texts on the base of such dictionaries The following text in which compounds have been hlghhghted is an example Q/, drawn fron an article of the Itahan economics newspaper tl Sole 24 ore (the wlaole article contains 84 hnes) Pohtica economica., anno zero E l'assenza di impegm precis~ e credibih contro l'mflazione continua a tenere IFI tenslone i mercati Ne~ mesi che hanno preceduto la svalutaz~one della hra e' stato npetutamente e autorevolmente affermato che la stabiht~ del camblo rappresentava l'asse portante d~ tutta la pohtlca econo.mica ~tahana La h.nea di condotta segmta nelle prime settimane dal Governo Amato era sembrata coerente con tale enuncmzione  I pnm~ due punt~ erano Iondamentah e complementan, m quanto l'autonomia della Banca centrale nceveva una prec~sa carattenzzazlone (rafforzata dalla prospemva dl adeslone all'umone monetana europea) dalla pr!0nth assoluta dell'ob~ett~voantHnflaziomstico E' bene sottolmeare che tale Pr!Onth assoluta valeva anche nei confronti dell'obietuvo di nsanamento della finanza pubbhca Essa doveva m sostanza essere mtesa ne~ seguent~ termm~ a) la Banca centrale avrebbe rmpettato target dt crescita monetana coerentl con gh oblettivl mflazlomstici annuncmtl (cosa che non era fino allora avvenuta, nernmeno dopo l'adeslone alia banda nstretta dello Sme, che pure avrebbe dovuto comportare un accrescmto ngore della p0ht~ca monetana), b) nel far czb essa non si sarebbe curata degh effetti di breve periodo di tale condotta su~ tass._._.!~ dl lnteresse, e qum&amp; anche sulla finanza pubbhca, c) tl Governo avrebbe adottato con la massima urgenza provve&amp;mentl dl nsanamento della finanza pubbhca, senza tuttavta fare ncorso a m~sure che Incldessero sull'm&amp;ce del prezz~ al consumo Quest~ nchmmi hanno ormm sapore stonco, ma sono uUh per megho mettere a fuoco la situaz~one attuale In parucolare la prima affermaz~one (l'essere c~oe' la stabd~tb, del carabao asse portante di tutta la pohuca economica) ci sembra ancora p~enamente vahda rasse portante non c'e' pda e con esso e' sparita anche la pohtica econom~ca Schematlcamente sembra che esistano due alternative posslbdl dl pohtlca economica La prima (che nproduce m sostanza, pur helle con&amp;zlom mo&amp;ficate, la hnea pre-svalutazlone tratteggmta sopra) potrebbe arucolarsi nei seguenu termini a) aggmstamento fiscale come ob~ett~vo della mass~ma urgenza, b) naffermaz~one del ruolo autonomo della Banca centrale, degh lmpegm assuntt m vista del mercato umco (m part~colare m matena di hbera c~rcolaztone dei cap~tah), c) mass~mo contemmento deUe spmte, mflazlomsuche denvanti dalla svalutazione della hra e rmffermazione dl un obletUvo antHnflaziomstlco preciso (tradotto m un target ngoroso e tmpegnatwo, e qumdt cred~bde, d~ cresctta monetana) Part~colarmente importante quest'ulttmo punto, m quanto non e'affatto mewtabde che gh effett~ della svalutaz~one s~ traducano lntegralmente m una mflaztomst~ca aggtuntiva Se la cresc~ta monetarm e' tenuta sotto controllo, e graz~e agh effetti restnttiv~ della manovra d~ bflanclo, la svalutaz~one pub tradurs~ anz~che' m un fattore mflaz~omstlco, m uno d~ mutamento dei prezz~ lelatlvi La seconda lmea punta mvece a un abbassamento de~ tass~ dt mtetesse e a impart~re stimoh espansiw all'economia (entramb~ gh obtett~v~ potrebbero rlcevere una motwaz~one aggluntwa grazle al solhevo che, nel breve penodo, potrebbe~o portare alia tmanza pubbhca) Una tale hnea nchiederebbe certamente I'accantonamento,almeno temporaneo, di qualslasl ob~ettlVO ant~mflaziomstico E' anz~ probablle che i suo~ effettt p~fa sigmficauv~ e durevoh sarebbero quelh prodott~ dall'mflaz~one sulla dastnbuztone del red&amp;to, e soprattutto della ncchezza, e sul valore reale dello stock del deb~to pubbhco Sl notl chela seconda hnea e' certamente mcompat~bde con un r~torno m temp~ brew a un regime di cambio fisso A frequency hst which contams terminological compound nouns give us the posstbthty to Immediately understand the specific content of this article Such an index prowdes a p~cture of the content of the text (the mdex which follows was budt on the whole article)</Paragraph>
  </Section>
  <Section position="5" start_page="93" end_page="96" type="metho">
    <SectionTitle>
3. Local Grammars
</SectionTitle>
    <Paragraph position="0"> Electronic dlcttonartes give us the posslblhty of recognlzmg wlthm texts words and sequences of words as defined by dictionaries INTEX allows to recogntze combmatmns of simple and compound words, thanks to the interaction between dlctmnarles and grammars INTEX contains a tool whtch allows to construct local grammars on the model of fimte state automata These grammars can be based not only on words but also on the non-terminal symbols contained in the dictionaries For example, m order to identify all compound nouns followed by an adjective, which agrees in gender and number with them, we construct the following grammar ~ N~+NPN ~&gt;~ N~+NA ms&gt; \[~L-~ If we apply such a grammar to a text, INTEX will hlghhght all occurrences of th~s pattetn and subsequently construct concordances for that pattern trovare espressmne sm in accrescmu dffferenzlah dl interesse reah, sm In un deprezzamen utta la pohtlca econom~ca ~tahana La hnea dt condotta segmta helle prime setumane dal Gov ppresentava l'asse portante d~ tutta la polmca econormca ltahana La hnea dl candotta segul ant~-mflazmmsuca Se questa e' la polmca economica ltahana Sl sarebbe tentitl dl dire zmne s~ traducano mtegralmente in una spmta lnflazmmst~ca agglunttva Se la cresmta monet apttah), c) masslmo contemmento delle spmte mflazlomstmhe denvant~ dalla svalutazmne de vl sufflclentemeute sald~ per tollerare tasst d~ mteresse penahzzantt a qualche asta,accompag una scelta realmente ~mpegnat~va 2 II tasso dt mflazmne programmato e' stato portato dal 3, rzata dalla prospemva dl adeslone all'umone monetarla europea) dalla prmntg assoluta dell'  Hence, electromc dxctlonanes on one side, and the posslblhty of construct,.ng grammars whxch interact with dlcuonanes on the other gave us the posslbdlty of automatically analyzing large corpora, consldenng not only words but also sequences of words</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML