An Analysis of Indonesian Language for 
Interlingual Machine-Translation System 
Hammam R. Yusuf* 
Computer Science Department 
UniversityofKentucky, Lexington, KY40506 
yu ~U f~ms. uky. edu 
ABSTRACT 
This paper presents BlAS (Bahasa Indonesia Analyzer System), 
an analysis systemfor lndonesian language suitable for multilin- 
gual machine translation system. BIAS is developed with a 
motivation to contribute to on-going cooperative research project 
in machine translation between Indonesia andotherAsian coun- 
tries.In addition,it mayserve tofosterNLPresearchinIndonesia. 
It startwith an overviewofvarious methodologiesforrepresen- 
ta lion of linguistic knowledg e atwlplausible strategies of automatic 
r easoningf or lndonesian language. We examine these methodoio- 
gies from the perspective of their relative advantage and their 
suitabilityforaninterlingualmochine-translation environment. 
BIAS is a multi-level analyzer which is developed not only to 
extract the syntactic and semantic structure of sentences but also 
to provide a unifying method for knowledge reasoning. Each 
phase of the analyzer is discussed with emphasis on Indonesian 
morphology andcase-grammaticalconstructians. 
1. Introduction 
Bahasa Indonesia (lndonesianlanguage) is anationallanguage for 
tile Republic of Indonesia which imites 27 cultural backgrounds. 
It is widely nsed bymore than 100millious speaker but unfortu- 
nately, does not gain much attention for its automatic processing 
by computers. In 1987, a co operative research in machine trans - 
lation with Japan sparks thenatural languageprocessingresearch 
in Indonesia. In support to the on going project of Multilingual 
Machine Translation Sytem for Asian Language organized by 
Center for International Cooperaliou in Computerization (CICC)- 
Japan and other Asian cotmtries (China.Indonesia,Malaysia and 
Thailand), we developed BIAS: an analysis prograni for Indone- 
sian language which output an interlingual representation. By 
incorporating interlingual an',dysis technology, we will be able 
to include BIAS as part of multi-language translation system in 
a very effective way. 
Tiff s paper describes the design consideration of BIAS from the 
view point of linguistic theories and knowledge representation 
formalism, The design is based on an interlingual approach to 
machine !ranslation which accepts input sentences in one lan- 
guage and produces sentences in other languages \[Figure 1 \]. In 
particular BIAS is a program that takes natural language text ms 
input and produces its tmderlying interlingual representation at 
a certain level of details that serve as a language-independent 
representation for the machine translation environment. 
ahasa Indonesia j/ 
Figare 1. BIAS and lntcrlingual Approach to MT 
The approach wtfich being used here is an approximation of basic 
linguistic theories such as Chomsky's Standard Theory 
* Supported by the Agency for the Assessment 
and Application of Tectmohigy, Jakarta. Indonesia 
ACTES DE COLING-92, NANTES, 23-28 AOt3T 1992 1 2 2 8 PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 
\[Chomsky,651, C,xse Grmnmar \[Fillmore, 67 \] and Definite Clause 
Gr;unmar \[Pereira,80\]. We also incorporates theuse of appropri- 
ate representation forutalism such as Frames \[Minsky,81 \] attd 
Semantic Network \[Quillian,68 \] for a suitable type of reasoning 
system. It is noted that eventhough there are many knowledge 
representation languages which are theoretically sufficient to 
describe any natural language, they need to be modified in their 
theory and implementation for a particular language such as 
Bahasalndonesia. 'late existence ofvarions theories and knowl- 
edge representation techniques lead us to consider sever',d 
models of reasoning formalism. This in turn may serve as an 
indicator to the expressive adequacy of ow chosen1 representalion. 
The rest of tiffs paper is organized as follows. Section 2 presents 
fran~ework of BIAS front the view point of linguistic theories. We 
will discuss in detail the language analysis method of BIAS. \[t 
will be followed by a discussion on representation formalism 
and reasoning techniques in Section 3. The paper ends with a 
conclusion. 
2. Analysis Method 
There are many ways of attacking the problem of natural langu age 
processing. At one end of the spectrum are analyz, ers that read the 
input sentences, very closely following every twist in syntax, 
trying to interpret every bit of information contained in the 
sentence. In most cases, these analyzers separate the syntactic 
and semantic parts of the analysis into separate consecutive 
stages, paying much more attention to the syntactic part at the 
expense of semantic \[Gersham,82\]. At the other end are the 
analyzers that skim through the text looking for certain types of 
ioforuaation andpaying attention only to the words and expression 
relevant to the task \[DeJong,79\]. This approach is very effective 
and intuitively corresponds to what people do wlrde skimming 
newspaper stories. However, the danger in tttis approach lies in 
the possibilty of misunderstanding wlmt is being stated. 
BIAS is a multi-level analyzer, similar to the first type describe 
above, with the ability to perfom~ reasoning in each level of 
analysis. The method used in BIAS is theoretically consistent 
with the Standard Theory and Case Grarumar as well as non- 
monotonic reasoning formalism. Theprocess starts with analysis 
of sound sequences and ends by producing its interlingaal 
reprc~ntation. In-depth discnssioo on each of the analysis phase 
in BIAS and the selection of appropriate linguistic theories 
follows. 
2.1 Morphological Analysis Ph&~e 
Preliminary analysis of Indonesian words poses an especially 
diffic~dt problem : the transformation of word category and its 
meaning as the t'esuIt of ,'fffixation. Although it seems better to 
segment input sentences beforehand, it is not natural in the general 
sense to do this process on the first basis, We h ave to combine the 
processes of phonological and morphological atvalysis in ord~ to 
extract the root word from an inflected form. 
The process will involve the following : an inflected word is 
an al yzed to give it s root word and affixes, all owing the system to 
recognize the altered structm'e aud meaning of rite inflected word. 
This phase uses the lexicon, monphological and phonological 
knowledge in the form of transform ation rules. 
Further, we observed the fnllowing word formationrules which 
indicate their characteristics : 
(at A word can be constnlcted using prefix, suffix or confix. 
(b) A word can be constructed using a repetition of root word as 
in 'kura-kura' (turtle), or repeating the word constructed in (at as 
in the case of 'berlari-lari' (jogging). 
Our analysis showed that rite complex types of word formation 
could lead to some problems while constnlctiog the structure of 
the lexicon \[Yosuf,88\]. It is evident that iu the lexicon, a word 
should be described briefly, st) that the search can be efficient. 
Hence, the lexicon should contain only it simple form of word 
which, in tlds case, is the root form. 
How can we deal with a word with affixation ? In our findings, 
the wnrd with affixation could be processed by using the 
following procedure. 
Algorithm: MorphO 
Input: w ord 
Output : root word, affixation and semantic markers 
o Assume that the word is a root word. 
If this word is in the dictio,utry, check whether it is in its root 
form or purely repetitive form. 
- Assume that the word is a word with some prefix. 
Check for the following cotulitions : 
- The root word is repetitive word and twt an idiom 
with afftxation. 
For example: ~¢,£1uri-lari (jogging). 
- 1'he word with affixation and repetition 
For exanlple: 12£~pukul-ptdculala (hit reciprocally) 
- A root word with afftxation or idiom with prefix. 
For exantple: pekerjaalt (occupation) 
ll££.tanggung-jawab (responsible for) 
- Idiom with sto"tx or confix. 
For example: 12~mnggung-jawabRa (responsibility) 
Table 2 summarizes the morphology rules which have been 
formulated in BIAS. These rules are basic ; other rules which 
incorporate complex formation of words (,see also \[Tarigeal, 841) 
are being left for futher improvement, The general structure for a 
morphological rule of a given root word is described as follow : 
(\[Affix\[ + \[Root Word + Semantic\] ) 
--> \[Word + NewSemantlc\] 
Examples : 
( \[mem + \[pukul + action\] )--> {memukul + active\] 
( \[mem-i + Ipukul + action\] ) --> \[memukuli + repetitive\] 
( \[mem-kan + Ipukul+ action\] ) --> \[memukulkan + causative\] 
( \[bet-an + \[pukul+ repetition\] --> \[berpukul-pukulan +reciprocal 
action\] 
AcI .'ES DE COL1NG-92. NANTES, 23-28 AOUI' 1992 1 2 2 9 PROC. OF COL1NG-92. NAI'ZrES. AUG. 23-28, 1992 
Table 2. Indonesian Morphological Construction 
Root form Prefix Sufix Confix Comptmd Term Semantic 
pukul (hit) me memukul active 
bawa (carry) di dibawa passive 
nama (name) ber bernama poasesive 
perlu (need) me-kan memerlukan active tran. 
baca (read) di-kan dlbacakan passive 
pegang (touch) ter terpegang accidental 
guna (use) ter-kan implicative 
main (play) memper kan purpose 
daya (trick) terper kan occidental 
Table 3. Phonological Rules 
Prefix Root Inflection 
CeN boat 
goreng 
Imrang 
tunggu 
sapu 
cukur 
pukul 
hasut 
bet usaha 
ruang 
uang 
ternak 
C = consonant of m and p 
N = phonological transformation 
The new semantic of formed word is derived from the semantic 
of root word and affixes. There are several filters being used for 
extraction of this semantic. In the examples, mem-i cause the 
word pukul which has action as its original semantic to 
become repetitive in its meaning when combined. 
In addition to morphological construction as described above, 
there are phonological rules which are handled in parallel in the 
morphological analysis phase, Thephonological rulesdetermine 
the transformation of phonetic structure of a root word for a given 
complex word. We include some examples to show its construc- 
tion as in Table 3. 
2.2 Syntactic A nalysis Phase 
This phase covers those steps that affect the processing of 
sentences into structural descriptions or syntactical tree by using 
a grammatical description of linguistic structure. The major 
components are syntactic knowledge (grammar rules) and lexi- 
con. There are several linguistic phenonema worth describing 
for Indonesian language. For instance, the language structure 
membuat(make) 
mellg~oreng (fried) 
me~gurang (s~bstract) 
menunggu (wait) 
meaxapu (sweep) 
mencakur (shave) 
pemukul (hitter) 
pelighasat (agitator) 
bet'usaha (effort) 
be~uang(room) 
be~ang (have money) 
beternak (lifestock) 
of Bahasa Indonesia has a different structure compared to 
English and other languages. One of the most significant 
difference is that the Indonesian language apply various rules 
to cozlstruct Adverb Phrase. Adjective Phrase and Relative 
Clauses. 
For example, in constructing Adverb Phrase. it is allowed to 
combine adverb and adjective in addition to adverb and verb. It is 
also possible to form Adjective Phrase using adjective followed 
by noun rather than the default order of notre and adjective. This 
notion reslflted from tile categorial ambiguity of some words. 
Examine the following phrases : 
rumah (N) merah (Adj) panjang (Adj) tangan (N) 
(house) (red) (long) (hand) 
cepat(Adv) merah (Adj) berjalan (V) cepat (Adv) 
(quickly) (red) (walk) (quickly) 
BIAS use a bottom-up technique IMatsumoto.83\] in the syntactic 
analysis phase. The grammar rule written in Extrapositioo 
Grammar \[Pereira.81 \] is translated to a set of Horu clauses whicb 
ACRES DE COLING-92, NANTES. 23-28 AOt3"r 1992 1 2 3 0 PROC. Or COLING-92, NANTES, Al;o. 23-28, 1992 
will parse a sentence according to tile original grammar in 
bottom up and depth first manner. 
2.3 Semantic Interpretation Phase 
'Ihis phasewill consistof themapping of the structural (syntactic) 
description of the sentence into an interlingual representation 
language. The goal of this phase is to construct a clear represen- 
tation of tile exact meaning of a given selulence; hence, it is a 
lan~ua~e-indeoendent rel)resentation suitable fur a generation 
process uf target languages. In order to achieve this, we need 
commonsense knowledge, in addition to semantic knowledge. 
In Ballasa Indonesia tile verbal elements of tile sentence are tile 
major source of the structure: tile main verb in the proposition is 
the focus around which the other phrase, or cas~,revolve and file 
auxiliary verb contain much of the information about modality. 
Hence, the Case grlu'nmar is tile appropriate selectioa tor the 
semantic analysis part. 
Case frame are the mech~mism for identifying the specific cases 
allowed for any particular verb. The case frante fur each verb 
indicates the relationships which are required in any sentence in 
which the verb ,appears and those relationship which are optional. 
Let us look at some popular example sentences : 
Palu itu memukul paku itu. 
(the hammer) (hit) (the nail) 
Pakuitu dipukul oleh paluitu. 
(the nail) (was hit) (by) (the hanuner) 
Sese.rang memukul pakuitu dengan paluit.. 
(someone) (hit) (tile nail) (with) (the hatmner) 
The verb, memukul(hit), ,allows three primary cases: agentive, 
instnmlental ,and objective. We have all three cases in the last 
sentence, but only two in the others. In fact, only one case is 
required with tiffs verb, 
Paku itu dipukul. 
(tile nail) (was hit) 
Thus the case franle for the verb memukul, by default : 
I memukul \[O (A) (1)\] 1 
Further, some other case frames are also determine fnr words 
which combine pukul and other affixatioo, aa in the case of 
me nmkulkan, memukuli, memukul-mukulkan , etc. 
In addition to the standard cases described by Fillmore and 
Simmons \[Simruun,73\], we incorporate several oilier cases found 
in Indonesian language. These cases occur as the result of word 
inflection. Fur instance theconfix meN-kan, with the root wnrd 
beli ereate a wurd, membelikan , which carry themeaningof 
"being beneficiary of the action". Someexamptes of these case- 
stw.cific clul be found in the following sentences : 
1. Benefaetive: Saya IItgmb_elit;,a~adik boneka ( I buy a doll for 
sister) 
2. Incidental : Adi l¢.£Rglga~ di tangga ( I felt on the stair) 
3. Cansative: Saya mempertanyakan masalah itu. ( I questioned 
that problem) 
4. Intentional : Saya OZgltZlZ£I£~.~ dia. ( I tricked him) 
The interligualrepresentalion for (1)isgiven in Figure 5.Note 
that each word is represmned by a concept and its attrthutes. 
3. Representation and Inference 
We have colue to a point to d \[sctiss variuHs ty~s of representation 
language being used to represent the theories in each phase of tile 
analysis. 
In tile morphological ~malysis phase, it is appropriate to represent 
tile morphology and phonological rules with definite clauses 
wlfichhave first order logic as its basis. First order logic provides 
aclearlanguagetorepresentpropositionsor facts forthelexicon 
and also supports production-like rules for tile transformation 
f \[+activu\]~ 
Figure4. Example of Interlingual representation 
Aca't::s El': COLING-92. N^NTES. 23k28 AOt'n" 1992 1 2 3 1 PJtoc. ov COLING-92, NAN'rES, AUG. 23-28, 1992 
Table 5. Level of Analysis, Representations and Inferences in BIAS 
Analysis Phase Theory Representation Inference 
Phonology 
Morphology 
Syntactic 
Semantic 
Standard Theory 
Standard Theory 
Extended Standard 
Theory 
Case Grammar 
Definite Clause / 
First order logic 
Definite Clause / 
First order logic 
Definite Clause 
Semantic Network 
with Slot Filler 
Deduction/ 
Induction 
Deduction/ 
Induction 
Deduction 
Default 
Default 
rules. The syntactic part adapts file Extended Standard theory 
and hence, it is favorable to use first order logic to represent 
its knowledge. The u~ of Case Grammar in semantic analysis 
phase leads us to choose the network-based formalism as the 
representation. Sinunons and Hendrix \[Simmons,73 \] have 
provided a clear language for semantic network based on the 
Case Grammar. However, we also incorporate 'slot fillers' 
from the frames system \[Minsky,81\] as a solution to handle 
incomplete sentences. 
As the consequences of the selection of the representation method 
for the linguistic knowledge of Bahasa Indonesia, BIAS have 
multiple inference methods incorporate in each level of analysis. 
In syntactic and semantic analysis phase, defanlt reasoning is 
performed to solve the problem of incomplete knowledge. In this 
case, first order logic must be augmented with defanlt operators 
in order to penuit non-monotouicity. \[Reiter,78\] 
Because of space limitation, we leave out in.depth discussion on 
inference techniques (see\[Yusuf,91\] \[Schubert ,79\]), and present 
our summary of work in Table 5. 
4. Conclusion 
The use of linguistic theories and appropriate knowledge repre- 
sentation techniques provide BIAS anew insight in attacking the 
problem of language analysis for interlingua machine translation 
system, especially for Bahasa Indonesia. Many representation 
formalism and reasoning system have been brought into consid- 
eration not only for a 'pure' sentence analysis but in order to design 
an effective and efficient intelligent system capable of capturing 
and reasoning with linguistic knowledge. 
Reference 
Brachman, Ronald J., On epistemological status of Semantic 
Network, in Associative Networks : Representation and Use of 
Knowledge by Computer, Academic Press, New York, 1979. 
Chomsky, Noam, Aspects of the Tneory of Syntax, MIT Press, 
Cambridge, 1965. 
DeJong,G.F., Skimming stories in real time : An experiment ill 
integrated understanding (Computer Science Report No.150), 
Yale University, New Haven, May 1979. 
Fillmore, Charles, The Case for Case, Universals in Linguistic 
Theory, Ed Emmon Bach and R.T. Harms, Holt,Rinehart, 
Winston, New York, 1-88, 1967. 
Gersham, A.V., A Framework for Conceptual Analyzer in 
Strategies for Natural Language Processing, Lawrence 
Erlbaum, 1982. 
Lockman, Abe and Klappboz, David, The Control of Inferencing 
in NLU, Computational Linguistic, ed. Cercone, Nick., Pergamon 
Press, Oxford, 1983. 
Matsumoto, Y. et. al., BUP: A Bottom Up Parser Embedded in 
Prolog, New Generation Computing Vol. 1 No.2, pp. 145 - 158,1983. 
Minsky, M., A Framework for Representing IOlowledge, Mind 
Design, pp.95 - 128, MIT Press, 1981. 
Pereira, F and Warren, D., Defiuite CIause Grammars for Lan- 
guage Analysis-- A Survey ofthe Formalism and aComparison 
with Augmented Transition Networks, Journal of Artificial Intel- 
ligence 13 (1980)pp.231-278. 
Pereira, F., Extraposition Granmlars, American Journal of Corn - 
putationalLinguisticsVol.7No.4 (1981)pp.243-256 
Quillian, M.R., Semantic Memory, Semautic Infornmtion Pro- 
cessing, Ed. Marvin Minsky, MIT Press, Cambridge, 1968. 
Reiter, R., On Reasoning by Default, Proc. TINLAP-2, Theoreti- 
cal Issues in Natural Language Processing-2, University of 
111inois at Urbana-Champaign, 210-278,1978. 
Selmbert,Letdmrt K, Randolph G.Goebel attd Nicholas J. Cercone, 
"The Structure and Organization of a Semantic Network for 
Corn prehension and Inference", Associative Networks, 121 - 175. 
Academic Press, 1979. 
Simmons, Robert F.,Semantic Network: Their Computation and 
Use for Understanding English Sentences, Computer Models o f 
Thought and Language, W.H.Freeman Co., SanFrancisco, 1973. 
Tarigan,S., Morfologi Bahasa Indonesia, Penerbit Gramedia, 
Jakarta- Indonesia, 1984. 
Yusuf, Hammam, Indonesian Elec~onic Dictionary, Technical 
Report, Agency for the Assessment and Application of Technol- 
ogy, Jakarta, 1988. 
Yusuf, Hammam, Analyzer for Ballasa Indonesia, Master 
Thesis, University of Kentucky, 1991. 
ACRES DE COLING-92. NANTES. 23-28 AO~r 1992 1 2 3 2 PROC. OF COLING-92. NANTES. AUG. 23-28. 1992 
