File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1087_metho.xml
Size: 12,802 bytes
Last Modified: 2025-10-06 14:11:50
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1087"> <Title>PROCESSING CLINICAl NARRAI\]}VES IN IIUNGAR\]:AN G4bor PrGsz~.ky National Lduca Liona\] Library arid Museum Computer Oepar l:hqen</Title> <Section position="3" start_page="0" end_page="365" type="metho"> <SectionTitle> 3. PROBLEMS \[\]F PARSING </SectionTitle> <Paragraph position="0"> The we\].\] known nlothods .Henerally ul:i\]ized for pars\]n O NLs are no I: eonvP.nlenl: for treal:ing lanouages like IlunoarJan , Finnish, Fsthon\]an or Japanese, c1:. (NelJ-Inarkka el: a\] i984),(lsujli et a\] 1984),(l'rOsz@ky \[984). Zn I:hese \]anguages, the sol:fixes carry out II/OSL of i:he LEXICON: ~lkalommala PERS sznB ~ > G 'his/her/its' <ADV1 times> G 'times' ~a~_~ <N2 person ... C 'father' infarktus <N2 Vdisease S 'infarctior1' PERS sing 3> G 'his/her/its' k~t NUN det 72 G '-two' nak GAS1 dat> g (dative) t FIN1 past sing~ G (past sg 3rd pets.</Paragraph> <Paragraph position="1"> Vol <V cop g 'was' Apj4nak k@t alkalommal volt infarktusa. INPUf: 'His father had infarction two times. -m-N--'ORPHOLOGI~A--~ /~k I I A arkt/~us IANALYSISI /.( \, a#lanaK k~i; alkalommal volt inf a OUTPUT : PEm ASqr &quot; I FAdegV 1 r 1FINq\[ N2 I pER ersl~n s~Hat |/~2 / bime~ ~degdp?s~FdisJ ~n~ J eU Figure 2 task of marking gramrnatical function, therefore, the word order -- strictly speaking, the phrase order -- null will be relatively free. So we must turn our attention to (i) the internal structure of the phrases and (it) the order of phrases (and ale intonation, of course only in speech) that plays an ~mportant role in expressing communicative functions.</Paragraph> <Paragraph position="2"> The basic idea of -the strategy we propose builds on the invariants of the sentence struchlre of free word order languages, that is, (i) the first thing to do is to recognize -the internal structure of the parts of speech and (ii) Wm second Js to interpret their relative order. \[his order is connected with the communicative ,o.,.~- ~ (topic, focus etc.) of thc structurc. The sang tactic analysis of free word order,sentences is based upon -the morphemes identified by morphological analysis. The lexicon cannot help us to give the actual functional role of a morpheme because of two reasons: (in All possible functional roles of a morpheme cannot be listed.</Paragraph> <Paragraph position="3"> (it) If there were severa\] possible roles in the description of morphemes nobody would know which of them to use actually.</Paragraph> </Section> <Section position="4" start_page="365" end_page="366" type="metho"> <SectionTitle> 4. UNKNOWN ELEMENTS </SectionTitle> <Paragraph position="0"> The problems of the unknown elements can arise not only in the ease of computational analysis, since people may read/hear morphemes never read/heard before, yet they can identify the actual syntactic role of them without any knowledge of any previous syntactical categorization.</Paragraph> <Paragraph position="1"> The~or word class of a word is statistical information about its occurrence in particular syntactic positions. For example, the word 'beteg' can be a noun ('patient') or an adjective ('sick','ill') in Hungarian.</Paragraph> <Paragraph position="2"> It is an adjective in adjectival use, that is without inflections or beZore adjectival suffixes: 'E16zSleg soha nem volt beteg.' ('He has never been ill before.') 'Hat napja fekszik betegen.' ('He has been laid up since six days.') The same morpheme can, however, be a noun before nominal suffixes: 'A betegnek hem volt infarktusa.' ('The patient has had no infarctions.') Although we consider categorization as a syntactic generalization, we do not claim that there are no independent syntactical categories. In agglutinative languages such categories are e.g.-the nominal suffixes just mentioned. These categories are not arbitrary, because one cannot introduce a new sufZix to the language, but can, however, use new stems Jn the sentence. If the parser knows these regularii;ies, then lexical categories will be used for control only.</Paragraph> </Section> <Section position="5" start_page="366" end_page="366" type="metho"> <SectionTitle> 5. SENTENCE SIRUCTURE IN AGGLUTINATIVE LANGUAGES </SectionTitle> <Paragraph position="0"> Below we will make use of Hungarian examples to show -the most important properties of a typical agglutinative language. In a simple serltence there can be only one finite verbal suffix. If we have a sentence con-raining -two of them, then we have to do with co-ordinate clauses or one sentence with a subordinate clause.</Paragraph> <Paragraph position="1"> Naturally, the finite suffix is J.mmed~ately preceded by a verbal stem. If the sentence has no finite verba\] suffixes, (in it contains a O-copula that is rather frequent, not orlly in medical -texts but also in the every-day Hungarian or (ii) there :i.s ~Jsis in the sentence. The non-finite verbal suffixes are also pre-ceded by a verbal stem. These elements can behave differently accord4ng to whether or not they influence the word order of other elements.</Paragraph> <Paragraph position="2"> We consider t, he noun as an element that stands before a nominal suffix ~. Sometimes the Iexieon does not categorize theJs morpheme as a noun. We consider this situation as a case of a missin 0 noun. Re-generation of missir/g elements is important because of identifying elliptical constructions. For example~ Hungarian adjectives can have nominal endings when no noun occurs in the structure.</Paragraph> <Paragraph position="3"> As it seems, most of the morphemes do net have a f~xed lexiea\], cateoory ~ because their positions in the sentence actually define their functional role. But we have some important lexical features: (i) Sterns . lhey are closed morphologically to the \].eft and open to the right (formal\]y: <stem ). &quot;Open&quot; means an abi:lity to join other elements. In the case u i&quot; ,,uu, l-lLhe u,,uz~ Liluue &quot;OLIiL~L&quot; elUIilL~llLW aL'U1 .~uL&quot; example, Wqe case sufPSixes.</Paragraph> <Paragraph position="4"> (it) guZfixes, rhey are closed morphologically to the right and open to the left (suffix) ), e.g. the case endings.</Paragraph> <Paragraph position="5"> (iii) _Open endings. They are open morphologically on both sides ( open ), e.g. the morphemes markingplura\]ity or possessivity.</Paragraph> <Paragraph position="6"> (iv) Closed elemnts. They are closed on both s~des ((clo~, e.g. adjectives, numerals, adverbials.</Paragraph> <Paragraph position="7"> So, if a closed side immediately precedes an open one or an open one a closed one, the parser has to correct the &quot;wrong&quot; sequence inserting an empty morpheme: (an (stem <closed> --~-<stem suffix><closed> (b) < closed> suffix)--~-<c\]osed> {stem suffix# Instance (an carl be, for example, a genitive caseinsertion (as this case ending can sometimes have an empty form in Hungarian) and instance (b) can be a noun insertion between an adjectival stem and a nominal suffix.</Paragraph> </Section> <Section position="6" start_page="366" end_page="366" type="metho"> <SectionTitle> 6. PARTS OF SPEECH </SectionTitle> <Paragraph position="0"> The surface scheme of a Hungarian sentence Js the following: ( <A~<S NT<V NF)')~<V F> ( ~A>'<S N~<V NE>*) ~ where A stands for adverbials, S for nominal and V for verbal stems, N for nominal case endings, F for finite and NE for non-finite verbal suffixes. Hence the t~ of the constituents are as follows: (in independent' adverbials (without any suffix), (it) non-finite verbs (e.g. infinitive, gerund), (iii) nominal groups with case ending, (iv) a verb plus a finite suffix (the main verb of the sentence).</Paragraph> <Paragraph position="1"> Having made clear the internal structure of the constituents, the parser can deal with the formal evaIuation of the connections between the constituents (e.g. verb and complements, possessives and possessors etc.). In the first part of the parsing we do not \]feed any S-symbols, more precisely, ally string over a particu\].ar set, the~of~ can serve as Ssymbol, l he ca:in parts of speech can be described with a llelp of the schemes (\])-(iv), but in fact, only (iJ) and (iii) are imporLanL. AdverbJa\].s of type (i)usually consist of one e\].emellt and every sewLence has one and only one strucl:ure of type (Jr). Sentences rather frequently consist of more than two constJtuenLs, but in a free word order language there is no)~_yj\]ical @ri~ ord_ ej'_ of Lhe,~;e constiLuui11:s. Our method is based Oll this observatJon. We do not describe the structural relations Jn the sentence sequentially from the J eft to I:he right end of the selrtencc, gut rules fo;:m blocks and these blocks are used Jn an order depending on tile elenlenLs of Uqe arLual sentence.</Paragraph> </Section> <Section position="7" start_page="366" end_page="366" type="metho"> <SectionTitle> 7. ANAGRAMMA </SectionTitle> <Paragraph position="0"> Between the morpho:LogJca\] ana\]y.sis and the parsing we nced a nornla\] izatJon procedure that iliserts the ndssiog morphemes ell the basis of the PSornlal lex\]cal properties of thc el emciF~.s of the ~nput string. IPS the input s l;rinO includes Jnt_e\]~ective signs or words and (J) thJ.s sion means subordJnatJon, then Jt seems to be obvious Lo take the embedded string out and handle it. \]:iko ao eldependent selrLence, or (ii) Lids sign or ward means co-ordJnat:i.orl, then we wJ.\]l elaboraLe the co-ordinate structures paralle\]ly.</Paragraph> <Paragraph position="1"> S(\], to analyze a simple selYLeuce of Ilungarian, the A_NAGR_A_MMA~rsn~ would beg:in wiLh tile quesLlon 'le there any subsi:rJng ef tile sol\]bunco ta hR parsed LhaL has the form of the first rule's :lct'L hand side?'. \[f Lhe answer ls 'yes', the r:ight side of the same rule is substituted as many tJme, u as the subs LrJn 0 occurs irl the seni:ence. For exampl.e: fhe rule: AOJ N2 --,~ N1 lhe sentence to be parsed: I\]ET ~ADJ N2 GAS\] I}E\[ ~AI\]J N2 CAS1 V FIN1 lhe result: gEl N\] gAS\] oKr N\] gAS1 V I-IN\].</Paragraph> <Paragraph position="2"> :1:PS the sentence does not contain the suhstrilkg, the next rule follows. In this way all rules can be applied once orlly, although we would prohab\]y have Lo use them mere LhaH onc;c. The repeated use of the l:'ules Call be realized with the he\].p of c__~c\]es: The kernel of the cycle is a sequential rule package and its condition is the quantity of rules applled al: the last pass over the cycle. IPS it is not O, then the algorithm continues at the first rule of the package.</Paragraph> <Paragraph position="3"> \]:f it :is O, \]:hat is, there we\]x.' no such app\].ications, the rule of the next number has -to be applied.</Paragraph> <Paragraph position="4"> A trace of an ANAGRAMS: Ihe parsing is over if (i) all o\]emerlts of Lhe actual. string to be parsed are \[rOlll tile dJst:ingu~shed set (e. O. \[;AS and FIN in the above examplE), or (ii) the algorithm ls after tile last rule and there is no acceptable cycle-end after this rule. We say that the algorithm canr~ol: interpret the sentence \]f there have remained other than distinguished elements, lhe parser can operal;e more quickly Jf the ru\].es in the same package give the descrJption of the same gramrnatical pheHomenon. Such modulcs consist oPS rules \]:he \]eft sides of which are sJmJ.\]ar. If a packaoe contains only rules whose left side does not contain any e\]emeet of Lhe senterlce to be parsed, then it carl be emJtl:ed. We can use this method o~ s:imp\] ificatJon wi thou\]; much ado, ow:ing to an }{-\]:ike l:ormalism that guarantees that llO flew symbols can be bo\['n ae a result of app\]JcatJoi~ of the rewriting rules. We use decreasing bar levels alike the formal derivation process does wJthe~ent.s.</Paragraph> <Paragraph position="5"> 8. I:VALUAT\]:\[\]N Tho evaluation illodule iU essentially a paLl:ernmatching aluori Lhm that identifies the \].Jnl< between (2) bhe predicates and their argumenLs, (\].i) the al/aphoric elemen I;s and l:heir anLecedents, and (J.iJ) the &quot;para\] \] el &quot; structures separated by the norma\].Jzer. The lexica\] fornls of predicates contain the surface case endings and tile secant:it role of \]:he needed cons I:\] LuenLs, IJ~mc.\]'ere Uqe algor:il:hm has te \].ook for \];hose constituents and order the new features giveH to them by bhepre-.</Paragraph> <Paragraph position="6"> d\] ca Le. The i den t\] f icai, i on of I:lle allLRcedents ef anaphol:iC clenlenLs \],s similar, but antecudents often occur in previous sentences. Therefore tile evaluator can se~ up a connection wi.th the analyzed form of the same paragraph.</Paragraph> </Section> class="xml-element"></Paper>