File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-4197_metho.xml
Size: 13,372 bytes
Last Modified: 2025-10-06 14:13:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-4197"> <Title>An Analysis of Indonesian Language for Interlingual Machine-Translation System</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Analysis Method </SectionTitle> <Paragraph position="0"> There are many ways of attacking the problem of natural langu age processing. At one end of the spectrum are analyz, ers that read the input sentences, very closely following every twist in syntax, trying to interpret every bit of information contained in the sentence. In most cases, these analyzers separate the syntactic and semantic parts of the analysis into separate consecutive stages, paying much more attention to the syntactic part at the expense of semantic \[Gersham,82\]. At the other end are the analyzers that skim through the text looking for certain types of ioforuaation andpaying attention only to the words and expression relevant to the task \[DeJong,79\]. This approach is very effective and intuitively corresponds to what people do wlrde skimming newspaper stories. However, the danger in tttis approach lies in the possibilty of misunderstanding wlmt is being stated.</Paragraph> <Paragraph position="1"> BIAS is a multi-level analyzer, similar to the first type describe above, with the ability to perfom~ reasoning in each level of analysis. The method used in BIAS is theoretically consistent with the Standard Theory and Case Grarumar as well as non-monotonic reasoning formalism. Theprocess starts with analysis of sound sequences and ends by producing its interlingaal reprc~ntation. In-depth discnssioo on each of the analysis phase in BIAS and the selection of appropriate linguistic theories follows.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Morphological Analysis Ph&~e </SectionTitle> <Paragraph position="0"> Preliminary analysis of Indonesian words poses an especially diffic~dt problem : the transformation of word category and its meaning as the t'esuIt of ,'fffixation. Although it seems better to segment input sentences beforehand, it is not natural in the general sense to do this process on the first basis, We h ave to combine the processes of phonological and morphological atvalysis in ord~ to extract the root word from an inflected form.</Paragraph> <Paragraph position="1"> The process will involve the following : an inflected word is an al yzed to give it s root word and affixes, all owing the system to recognize the altered structm'e aud meaning of rite inflected word.</Paragraph> <Paragraph position="2"> This phase uses the lexicon, monphological and phonological knowledge in the form of transform ation rules.</Paragraph> <Paragraph position="3"> Further, we observed the fnllowing word formationrules which indicate their characteristics : (at A word can be constnlcted using prefix, suffix or confix.</Paragraph> <Paragraph position="4"> (b) A word can be constructed using a repetition of root word as in 'kura-kura' (turtle), or repeating the word constructed in (at as in the case of 'berlari-lari' (jogging).</Paragraph> <Paragraph position="5"> Our analysis showed that rite complex types of word formation could lead to some problems while constnlctiog the structure of the lexicon \[Yosuf,88\]. It is evident that iu the lexicon, a word should be described briefly, st) that the search can be efficient. Hence, the lexicon should contain only it simple form of word which, in tlds case, is the root form.</Paragraph> <Paragraph position="6"> How can we deal with a word with affixation ? In our findings, the wnrd with affixation could be processed by using the following procedure.</Paragraph> <Paragraph position="7"> Algorithm: MorphO Input: w ord Output : root word, affixation and semantic markers o Assume that the word is a root word.</Paragraph> <Paragraph position="8"> If this word is in the dictio,utry, check whether it is in its root form or purely repetitive form.</Paragraph> <Paragraph position="9"> - Assume that the word is a word with some prefix.</Paragraph> <Paragraph position="10"> Check for the following cotulitions : - The root word is repetitive word and twt an idiom with afftxation.</Paragraph> <Paragraph position="11"> For example: ~C/,PS1uri-lari (jogging).</Paragraph> <Paragraph position="12"> - 1'he word with affixation and repetition For exanlple: 12PS~pukul-ptdculala (hit reciprocally) - A root word with afftxation or idiom with prefix.</Paragraph> <Paragraph position="13"> For exantple: pekerjaalt (occupation) llPSPS.tanggung-jawab (responsible for) - Idiom with sto&quot;tx or confix.</Paragraph> <Paragraph position="14"> For example: 12~mnggung-jawabRa (responsibility) Table 2 summarizes the morphology rules which have been formulated in BIAS. These rules are basic ; other rules which incorporate complex formation of words (,see also \[Tarigeal, 841) are being left for futher improvement, The general structure for a morphological rule of a given root word is described as follow :</Paragraph> <Paragraph position="16"> pukul (hit) me memukul active bawa (carry) di dibawa passive nama (name) ber bernama poasesive perlu (need) me-kan memerlukan active tran. baca (read) di-kan dlbacakan passive pegang (touch) ter terpegang accidental guna (use) ter-kan implicative main (play) memper kan purpose daya (trick) terper kan occidental</Paragraph> <Paragraph position="18"> The new semantic of formed word is derived from the semantic of root word and affixes. There are several filters being used for extraction of this semantic. In the examples, mem-i cause the word pukul which has action as its original semantic to become repetitive in its meaning when combined.</Paragraph> <Paragraph position="19"> In addition to morphological construction as described above, there are phonological rules which are handled in parallel in the morphological analysis phase, Thephonological rulesdetermine the transformation of phonetic structure of a root word for a given complex word. We include some examples to show its construction as in Table 3.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Syntactic A nalysis Phase </SectionTitle> <Paragraph position="0"> This phase covers those steps that affect the processing of sentences into structural descriptions or syntactical tree by using a grammatical description of linguistic structure. The major components are syntactic knowledge (grammar rules) and lexicon. There are several linguistic phenonema worth describing for Indonesian language. For instance, the language structure</Paragraph> <Paragraph position="2"> of Bahasa Indonesia has a different structure compared to English and other languages. One of the most significant difference is that the Indonesian language apply various rules to cozlstruct Adverb Phrase. Adjective Phrase and Relative Clauses.</Paragraph> <Paragraph position="3"> For example, in constructing Adverb Phrase. it is allowed to combine adverb and adjective in addition to adverb and verb. It is also possible to form Adjective Phrase using adjective followed by noun rather than the default order of notre and adjective. This notion reslflted from tile categorial ambiguity of some words. Examine the following phrases :</Paragraph> <Paragraph position="5"> BIAS use a bottom-up technique IMatsumoto.83\] in the syntactic analysis phase. The grammar rule written in Extrapositioo Grammar \[Pereira.81 \] is translated to a set of Horu clauses whicb ACRES DE COLING-92, NANTES. 23-28 AOt3&quot;r 1992 1 2 3 0 PROC. Or COLING-92, NANTES, Al;o. 23-28, 1992 will parse a sentence according to tile original grammar in bottom up and depth first manner.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Semantic Interpretation Phase </SectionTitle> <Paragraph position="0"> 'Ihis phasewill consistof themapping of the structural (syntactic) description of the sentence into an interlingual representation language. The goal of this phase is to construct a clear representation of tile exact meaning of a given selulence; hence, it is a lan~ua~e-indeoendent rel)resentation suitable fur a generation process uf target languages. In order to achieve this, we need commonsense knowledge, in addition to semantic knowledge.</Paragraph> <Paragraph position="1"> In Ballasa Indonesia tile verbal elements of tile sentence are tile major source of the structure: tile main verb in the proposition is the focus around which the other phrase, or cas~,revolve and file auxiliary verb contain much of the information about modality.</Paragraph> <Paragraph position="2"> Hence, the Case grlu'nmar is tile appropriate selectioa tor the semantic analysis part.</Paragraph> <Paragraph position="3"> Case frame are the mech~mism for identifying the specific cases allowed for any particular verb. The case frante fur each verb indicates the relationships which are required in any sentence in which the verb ,appears and those relationship which are optional.</Paragraph> <Paragraph position="4"> Let us look at some popular example sentences : Palu itu memukul paku itu.</Paragraph> <Paragraph position="5"> (the hammer) (hit) (the nail) Pakuitu dipukul oleh paluitu.</Paragraph> <Paragraph position="6"> (the nail) (was hit) (by) (the hanuner) Sese.rang memukul pakuitu dengan paluit..</Paragraph> <Paragraph position="7"> (someone) (hit) (tile nail) (with) (the hatmner) The verb, memukul(hit), ,allows three primary cases: agentive, instnmlental ,and objective. We have all three cases in the last sentence, but only two in the others. In fact, only one case is required with tiffs verb, Paku itu dipukul.</Paragraph> <Paragraph position="8"> (tile nail) (was hit) Thus the case franle for the verb memukul, by default :</Paragraph> <Paragraph position="10"> Further, some other case frames are also determine fnr words which combine pukul and other affixatioo, aa in the case of me nmkulkan, memukuli, memukul-mukulkan , etc.</Paragraph> <Paragraph position="11"> In addition to the standard cases described by Fillmore and Simmons \[Simruun,73\], we incorporate several oilier cases found in Indonesian language. These cases occur as the result of word inflection. Fur instance theconfix meN-kan, with the root wnrd beli ereate a wurd, membelikan , which carry themeaningof &quot;being beneficiary of the action&quot;. Someexamptes of these casestw.cific clul be found in the following sentences : 1. Benefaetive: Saya IItgmb_elit;,a~adik boneka ( I buy a doll for sister) 2. Incidental : Adi lC/.PSRglga~ di tangga ( I felt on the stair) 3. Cansative: Saya mempertanyakan masalah itu. ( I questioned that problem) 4. Intentional : Saya OZgltZlZPSIPS~.~ dia. ( I tricked him) The interligualrepresentalion for (1)isgiven in Figure 5.Note that each word is represmned by a concept and its attrthutes. 3. Representation and Inference We have colue to a point to d \[sctiss variuHs ty~s of representation language being used to represent the theories in each phase of tile analysis.</Paragraph> <Paragraph position="12"> In tile morphological ~malysis phase, it is appropriate to represent tile morphology and phonological rules with definite clauses wlfichhave first order logic as its basis. First order logic provides aclearlanguagetorepresentpropositionsor facts forthelexicon and also supports production-like rules for tile transformation rules. The syntactic part adapts file Extended Standard theory and hence, it is favorable to use first order logic to represent its knowledge. The u~ of Case Grammar in semantic analysis phase leads us to choose the network-based formalism as the representation. Sinunons and Hendrix \[Simmons,73 \] have provided a clear language for semantic network based on the Case Grammar. However, we also incorporate 'slot fillers' from the frames system \[Minsky,81\] as a solution to handle incomplete sentences.</Paragraph> <Paragraph position="13"> As the consequences of the selection of the representation method for the linguistic knowledge of Bahasa Indonesia, BIAS have multiple inference methods incorporate in each level of analysis. In syntactic and semantic analysis phase, defanlt reasoning is performed to solve the problem of incomplete knowledge. In this case, first order logic must be augmented with defanlt operators in order to penuit non-monotouicity. \[Reiter,78\] Because of space limitation, we leave out in.depth discussion on inference techniques (see\[Yusuf,91\] \[Schubert ,79\]), and present our summary of work in Table 5.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. Conclusion </SectionTitle> <Paragraph position="0"> The use of linguistic theories and appropriate knowledge representation techniques provide BIAS anew insight in attacking the problem of language analysis for interlingua machine translation system, especially for Bahasa Indonesia. Many representation formalism and reasoning system have been brought into consideration not only for a 'pure' sentence analysis but in order to design an effective and efficient intelligent system capable of capturing and reasoning with linguistic knowledge.</Paragraph> </Section> class="xml-element"></Paper>