XML Viewer - c73-2031

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/73/c73-2031_metho.xml
Size: 44,572 bytes
Last Modified: 2025-10-06 14:11:10
<?xml version="1.0" standalone="yes"?>
<Paper uid="C73-2031">
  <Title>ENGT SIGURD FR.OM NUMBERS TO NUMERALS AND VICE VERSA</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
\]~ENGT SIGURD
FR.OM NUMBERS TO NUMERALS AND VICE VERSA
1. FROM MATHEMATICAL INTO LINGUISTIC REPRESENTATIONS -
AND VICE VERSA
</SectionTitle>
    <Paragraph position="0"> Universally used decimal representations, such as 5; 200; 856763; 189200000 are rendered (written, pronounced) differently in different languages. The number 5 is tllus five in English, fern in Swedish, cinq in French, ketsmala in Koyukon, biyar in Hausa and nga: in Burmese.</Paragraph>
    <Paragraph position="1"> Numbers can furthermore be written differently in different mathematical systems. The number 5 is written V in the Roman system and 101 in a system with base 2 (binary system).</Paragraph>
    <Paragraph position="2"> We will mainly be interested in the relations between the representations of the decimal system and Swedish numerals. Numerals in some other languages, such as English, German, French, Danish, Burmese, Hausa and Urdu will be touched upon briefly. In particular we are interested in rule systems (algorithms) that automatically convert mathematical representations into linguistic representations (numerals) or vice versa. Our interest in this area is part of a wider interest in Automatic Text Comprehension, 1 which in turn is part of the areas Automatical Language Translation and Artificial Intelligence. The study is focusing on problems that are due to the structure of numerals in (natural) languages. The technical problems that turn up when conversion rules are to be implemented on computer are not treated in this exploratory study.</Paragraph>
    <Paragraph position="3"> The practical applications of rule systems that convert mathematical representations into numerals or vice versa are automatic devices or robots of various types, e.g. the &amp;quot;automatic stock market announcer &amp;quot;, the &amp;quot;automatic cashier &amp;quot;, the &amp;quot;automatic accountant &amp;quot;, the 1 This work is related to the project ATC (Automatic Text Comprehension) of The Institute of Linguistics, University of Stockholm, which is supported by Humanistiska ForskningsrC/tdet. I am indebted to C. W. Welin, who is working on that project, for valuable comments.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
430 BENGT SIGURD
</SectionTitle>
    <Paragraph position="0"> &amp;quot; automatic telephone directory &amp;quot;. The automatic telephone directory would give you the number, if you furnish the name and address of a person. The system would have to identify the person by processes that correpond to optical and manual search in a directory. In the register of the directory the system would find the number and could pronounce it according to rules to be discussed later.</Paragraph>
    <Paragraph position="1"> The automatic directory would require phonetic identification procedures that have not been developed fully yet, at least not for large inventories of possible messages. In societies where numbers play an increasing role - telephone numbers, registration numbers, addresses, social security numbers, bank account numbers, zip codes - systems that can identify and pronounce numbers are getting increasingly interesting. Current experiences suggest that any conversation with robots has to take place within heavy constraints.</Paragraph>
    <Paragraph position="2"> Further examples of machines that could use the rule systems under discussion would include warning systems, systems that read the values of meters (temperature, altitude, humidity) aloud. A system that reads the speed of the car aloud or tells you the distance of an approaching crossing etc. in a mild but distinct voice might have certain advantages in difficult traffic situations. Presently almost all warning systems, traffic signs and signals, use optical signals. If acoustic signals are used they are not speech signals, only rings or buzzes. The human voice and the human language may have a certain attraction to human beings. Imagine the &amp;quot;speaking alarm clock&amp;quot; which tells you the time and reads the temperature etc. in an attractive (re)male voice available as an option. In most of the applications mentioned so far pronunciation has been involved. But systems of a simpler kind, which rewrite mathematical (decimal) number representations as numerals (in letters) would certainly also be useful. Banks and Post offices use numbers written both in mathematical and linguistic form, presumably because of the need for redundancy and security. This is at least the case in Sweden. It is required that a number, e.g. 1055 be written with letters as well as with figures. But these are almost our only opportunities for writing high numbers with letters. (Numbers are generally written with letters only if they are below ten or belong to the round numbers (see B. SIGURD, 1972)). Many persons hesitate when they are to write numbers with letters (words). Swedes would for instance hesitate between ett tusen ferntiofem, ett-tusen-femtiofem, ett(t)usenfemtiofem, and tusenfemtlofem. Fortunately the banks do not require any special spelling (only that you add Kronor after).</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 431
</SectionTitle>
    <Paragraph position="0"> There are many previous studies related to the problems under focus here, e.g. CORSTIUS (1968) where marly general problems are treated starting from Dutch, and C.-CH. ELERT (1970) where the morphology of Swedish numerals is treated. We have used presentations and analyses of numerals in various languages. But it is only for Swedish that our rules are outlined in any detail. The other languages are treated to indicate what problems one would have to face in an automatic translation system for numerals. A system that translates between mathematical representations and representations in Swedish, English, French, German (and a couple of other languages) will be implemented on computer in the near future.</Paragraph>
    <Paragraph position="1"> We will work with written forms in this study. Unfortunately it is not the case that such representations could be run through a speech synthesizer giving beautiful pronunciation. Nor are there any speech recognition devices that can render large or infinite inventories of spoken numerals as written ones presently. A forthcoming study by DE SERVA-LEIT.~O will shed light on the phonetic problems. Till then this examination of some of the fundamental problems may be of some importance. Some issues of interest to theoretical linguistics will also turn up.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. MATHEMATICAL AND LINGUISTIC REPRESENTATIONS OF NUMBERS
</SectionTitle>
    <Paragraph position="0"> Representations of numbers using digits (figures) will be called mathematical representations, while representations using letters (sounds) organized in morphemes, words and phrases as other linguistic material will be called linguistic representations. The number 123 is now written in mathematical representation. The equivalent linguistic representation in Swedish is (ett)hundratjugotre. There are often alternative representations within a language. The number 123 could thus also be rendered: ett-tv,~-tre, tolv-tre, or ett-tjugotre. It is above all in technical contexts (telegraphy, telephony) that the latter types are used. The difference has to do with difference in division. In the first (normal) case the whole row of digits is taken as a unit and the highest numeral (position word) offered by the language (hundra) is used giving: (ett)hundratjugotre. In the other (technical) cases the series of digits is divided into smaller groups which are treated separately. In the extreme case each digit is treated separately.</Paragraph>
    <Paragraph position="1"> We will call the different divisions of the group of digits different fusion. We can show the differences as follows: 123, 12 + 3, 1 + 23,</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
432 BENGT SIGURD
</SectionTitle>
    <Paragraph position="0"> 1 + 2 + 3. The first case implies total fusion, since the whole series of digits is taken into consideration. The last case implies no fusion; cases in between may be characterized as partial fusion. We might call the case where the digits are treated separately spelling, since this is the word used when a word is decomposed in letters and each letter is spoken separately. In Swedish we could devise a new word siffrering (&amp;quot; digitalization &amp;quot;) as an equivalent of bokstavering and stavning which mean rendering each letter by its name. Spelling (bokstavering), e.g.</Paragraph>
    <Paragraph position="1"> rendering the word bad as be-a-de, is often used for names and other words with low redundancy. Military forces would use special letter names with more redundancy, in Swedish for bad: Bertil, Adam, David.</Paragraph>
    <Paragraph position="2"> The following is a table showing mathematical and linguistic representations of numbers. The linguistic representations are Swedish numerals according to different fusion. We are mainly interested in the relations between decimal representations and linguistic ones, but for some lower numbers binary and t(oman representations are given.</Paragraph>
    <Paragraph position="3"> We will touch on the problem of pronouncing binary representations briefly. The relations between Koman mathematical representations and Latin numerals are interesting but very complicated, and we will not suggest any conversion rules for those representations here.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 433
</SectionTitle>
    <Paragraph position="0"> The fundamental difference between fused and non-fused expressions is the presence of position words: -tio(-ton), hundra, tusen, million ... in fused forms. The etymology of such words might be far from the decimal system. It is well known that there often are rival systems, above all the 20-system (vigesimal), the 12-system (cf. dussin, gross) and the 5-system (quinary). Many ways of grouping items may seem natural.</Paragraph>
    <Paragraph position="1"> There are no position words associated with the positions of binary representations, but considering the popularity of the binary system in the computer age one might suggest some, e.g. duo (or pair) for position 2 (counting from the right), quartet for position 3 (23) and octet for position 3 (28). Using these position words (group names) the number 11 (eleven), which is 1011 in binary form would be: (one) octet one duo one. We might alternatively construct new position words for the positions of binary numbers. One way would be to use the decimal words rio, hundra etc., changing them, by substituting b for the first letter, into bi(o), bunclra, busen, billion, billiard ('). In this system 12 (twelve), which is 1100 in binary form, would be: busenbundra. Since binary representation tends to get very long, an enormous number of position words would be needed. We refrain from developing further binary numerals.</Paragraph>
    <Paragraph position="2"> The choice of fusion form is apparently depending on the use of the number. The number 121457 is a telephone number. Such numbers are often divided into 2-groups, but 3-groups or separate pronunciation of each figure also occurs. It is a well-known fact that such high telephone numbers are difficult to get through correctly. In practice one introduces pauses at strategic points to facilitate communication.</Paragraph>
    <Paragraph position="3"> We will not discuss the communicative and mnemonic advantages of different systems here. Let us just mention that there are at least some cases, where the totally fused numerals give shorter expressions, e.g. en million compared to ett-noll-noll-noll-noll-noll-noll. The last example with all the zeros (noll) is probably difficult to get through over the telephone, since one might easily lose count. Position words have the advantage of indicating how far to the left we are. They are in a way the equivalents of the zeros used as position fillers in mathematical representations (zeros do not vary according to position). _As is obvious from the table above there are cases where partial or no fusion gives shorter expressions than expressions based on total fusion. It is often said that since there are infinitely many numbers and each number has a name (word, numeral) in the language, there are infinitely many  words in the language. If numbers can be infinitely long, words can be infinitely long too.</Paragraph>
    <Paragraph position="4"> Decimal fractions are rarely rendered by words, if written: threepoint-fourteen looks strange, and so does the Swedish equivalent with komma instead of point: tre-komma-fjorton. The figures after the decimal point (komma) are generally given separately, in particular, if there is a long row of decimal figures. For instance, if the value of pi is to be given by more than 2 decimals one turns to separate pronunciation of the figures immediately: compare 3,14 and 3,14159. For practical purposes it seems the rule that figures after the decimal point (komma) are pronounced without fusion is sufficient, and we will not discuss the matter further.</Paragraph>
    <Paragraph position="5"> The year 1718 is pronounced sjuttonhundraarton not ettusensjuhundraarton (just as in English). We will say that total fusion is applied, but not maximum fusion, since the maximum position word (tusen) offered by the language is not used. In Swedish the normal way of pro-nouncing 1066 would be (ett)tusen-sextiosex, but tiohundrasextiosex might be heard (equivalent to the English ten sixty-six except for the lack of position word in English). Non-maximum fusion is only allowed in (1000)1100-1900. The year 2000 is pronounced dr tvd tusen (not tjugo hundra). The year 2384 is naturally renderd as tvdtusentrehundradttiofyra. The reason for not allowing non-maximum fusion above 19 has to do with a concept of &amp;quot;primary &amp;quot; numerals. The primary numerals in Swedish (as in many other languages) include (words for) 1-9, the base number 10, 11-19 (the -teens, in Swedish: -tontalen: the first two are irregular in Swedish as in all the other Germanic languages; further details below).</Paragraph>
    <Paragraph position="6"> The pronunciation of dates is paralleled by the pronunciation of prices. A price, e.g. 1900 may be rendered as nitton-hundra (nineteen hundred) or ettusen niohundra (one thousand and nine hundred). The price 2300 (Kronor) may only be pronounced as tv8 tusen tre hundra (not tjugotrehundra). There is, however, an interesting alternative way of pronouncing prices. 2300 could be given as tvd-och-tre, 1900 could be given as ett-och-nio. A (used) car could thus have the price 1300, which may be rendered as trettonhundra, ett(t)usentrehundra or ett-och-tre. The contexts (or hidden units) are important. The expression tvd-och-tre could perhaps mean the same as tvd millioner trehundratusen (2300000) kronor, (tvd-komma-tre millioner), but not 230 kronor, nor 23 kronor.</Paragraph>
    <Paragraph position="7"> It might mean tvd kronor och tre i~re (2,03 kronor). But ambiguities of this sort are rare in practice. In order to avoid ambiguity one might</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 435
</SectionTitle>
    <Paragraph position="0"> pronounce the zero in such cases: tvd-octl-nolt-tre (with stress on nol O.</Paragraph>
    <Paragraph position="1"> Fortunately most prices of the type under discussion are unambiguous, e.g. tvd-och-tio (2,10), sexton-och-sjuttiofem (16,75).</Paragraph>
    <Paragraph position="2"> Numbers play an extremely important r61e in modern society. As the set of numbers often is restricted due to context, function etc. reduced ways of expression, or sublanguages, develop. A certain row of figures or numerals may be completely unambiguous among used car dealers, although it is ambiguous from the point of view of the total language, or means something different in another sublanguage. Detailed studies of the use of figures and numerals in different functions, contests and surroundings would certainly be rewarding.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. FROM MATHEMATICAL INTO SWEDISH REPRESENTATION.
GENERAL CONSIDERATIONS
</SectionTitle>
    <Paragraph position="0"> Separate pronunciation of each figure is no great problem, and we will concentrate on total fusion in the following. We begin by presen.ting the following table where position numbers and position words associated with positions are to be found. The term position word, of course, reflects our looking at the numerals from the point of view of the decimal positional representations. The words could also be called group names or number measures. For reasons of space we have not included any position word higher than milliard.</Paragraph>
    <Paragraph position="1"> example: 666  position word mill- mill- tu- hun- tio/ sexhundrasexlard ion sen dra -ton tiosex There is no position word for unit in Swedish. One might use stycken (pieces, units), but presumably since position word for units is redundant it is generally left out. As a name for the whole set Swedes use ental, but it is not possible to use en in the numerals, perhaps because en lacks plural. There is no point in stating that en is there in the underlying form but obligatorily deleted. (The numeral en could then be derived by deleting the numeral en before the position word en, which is an optional rule in many cases: (ett)tusen, (ett)hundra. But this seems unnatural sophistication.)  The steps between the higher position words are 108 (1000). We will see examples of languages with different steps later, such as Urdu. The only genuine position words are the first three. They are also Germanic, but they have not always been associated with the decimal system as is done now. Many Swedes would hesitate, when it comes to position words higher than million. Some might also have a feeling that languages use milliard, billion etc. differendy (which is also the case). The number 666 is represented in three different ways above. Let us call the normal decimal representation: positional decimal, or just decimal as usual. The representation 6.10 ~ + 6.101 + 6.100 may be called: analytic decimal, or just analytic representation, since it analyzes the number into the terms to be added. The terms do not have to be in a certain order or take certain positions. The representation sexhundrasextiosex is called (fused) linguistic representation as before. Our purpose is to derive linguistic representations from decimal ones but the relations between the analytic (decimal) representation and the linguistic one are also of interest. We will compare generative grammars for the different types of representations briefly.</Paragraph>
    <Paragraph position="2"> Decimal representations, the input or output of our conversion rules, may be generated by the following simple generative rules, where N is the number, d is digit. The number of digits in the strings (n) may  be infinite.</Paragraph>
    <Paragraph position="3"> (1) N- d (d) (d)...</Paragraph>
    <Paragraph position="4">  d ~1, 2, 3, 4, 5, 6, 7, 8, 9, 0 If numbers such as 000 and 007 are not to be permitted, we have to state that the first d must be # 0.</Paragraph>
    <Paragraph position="5"> The analytic decimal representations may be generated by the following rules, where (a) and (b) are variants with different order.  The b-variant can easily be derived from the decimal representation by inserting powers of 10 according to position number. We might perhaps consider the b-variant as a semantic interpretation of the decimal representation. Rules converting decimal representations into analytic ones would then seem related to the interpretative rules suggested</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERAES AND VICE VERSA 437
</SectionTitle>
    <Paragraph position="0"> in transformational grammar. The analytic representations would be the readings or the meaning of the decimal expressions.</Paragraph>
    <Paragraph position="1"> The &amp;quot; generative semantics&amp;quot; approach would then be to start from the analytic expressions and derive the normal decimal expressions.</Paragraph>
    <Paragraph position="2"> This can be done by deleting the powers of 10 from the b-variant, where the order then carries the information about the powers. The a-variant gives the units first and has to be reversed by a transformation in order to fit the standard decimal representation. Changing the decimal representation by mentioning units first would have some advantages in communication - presently the listener does not know anything about positions if numbers are given by pronouncing each digit separately as observed by C.-CH. ErERT (1970). Note, however, that the units are given first (in the left-most place) in many languages in the teens (tontalen), and in some languages eveal in higher numbers, e.g.</Paragraph>
    <Paragraph position="3"> German: ein-und-zwanzig.</Paragraph>
    <Paragraph position="4"> It is, of course, easy to generalize the rules (2) to cover systems with any base and any arrangement of the terms.</Paragraph>
    <Paragraph position="5"> Rules for Swedish numerals have been suggested in C.-CH. ELERT (1970). They are, however, too surface oriented for our purposes. We need &amp;quot;deep structure&amp;quot; rules for the numerals which are as close as possible to the decimal representations. Let us first suggest the following rules, covering numbers as great as millions.</Paragraph>
    <Paragraph position="6"> N-+ (d million) (d kundratusen) (d tiotusen) (d tusen) (d hundra) (d tio) (d) d --~ en, tvd, tre, fyra, fern, sex, sju, dtta, nio The coverage of the first rule can be increased by adding brackets with the proper content or by introducing recursion (e.g. N instead of d before millioner) substitution rules, such as tusen millioner ~ milliard, tusen milliarcler-+ billion (etc.). In order to derive surface structures from such deep representations we need the following main types of rules.</Paragraph>
    <Paragraph position="7">  1) Morphophonemic rules, which change sju tio into sjuttio, fyra into fjor before ton (fjorton) etc.</Paragraph>
    <Paragraph position="8"> 2) Reordering rules which change the order between units and tens in the teens (tontalen). Our deep structure would generate en tio fyra for 14, which has to be changed into fjorton with the units before ton, the representative of rio. (the en is deleted according to 3.) 438 BEN'GT SIGURD 3) Deletion rules, which obligatorily delete en before tto and optionally delete ett before hundra and tusen. We have used the com null pound position words: tiotusen, hundratusen, for positions 5, 6. Since the tusen only occurs in those compounds if there is no other tusen to the right in the string we must delete tusen in other cases.</Paragraph>
    <Paragraph position="9"> 4) Concord rules, which give the plural millioner etc. at proper places and the neuter form ett before neuter nouns such as hundra and barn &amp;quot;child &amp;quot;.</Paragraph>
    <Paragraph position="10"> It is possible to improve the deep structure rules suggested in various ways. Arguments could also be found in the pronunciation and orthographic rendering of the numerals. The following rules are established with such arguments in view.</Paragraph>
    <Paragraph position="12"> d ~ en, tvd, tre, fyra, fern, sex, sju, dtta, nio In this approach hundreds, tens and units are treated as a group (constituent) which may occur alone, before tusen, million(er), milliard(er) etc. The difference between rules (3a) and (3b) is perhaps best shown by the tree diagrams for an example, such as sex millioner sex hundra sex rio sex tusen sex hundra sex tio sex (Fig. 1).</Paragraph>
    <Paragraph position="13"> The following arguments support analysis according to 3b (cf. also C.-CH. EI.~RT, 1970).</Paragraph>
    <Paragraph position="14"> 1) Position words are spaced by 103 in Swedish (not necessarily in all languages) and for positions in between the fundamental group position words (hun&amp;a, tio) are used.</Paragraph>
    <Paragraph position="15"> 2) It corresponds to the division into 3-groups often made in decimal representations (666 666 666). M Swedish space is used; in English comma is available for the task, since point is used in decimal fractions.</Paragraph>
    <Paragraph position="16"> 3) Although the phonetic details are by no means clear, it seems the proper rules can be formulated easily within this framework. The main rule seems to be the following: The last (rightmost) d (simple number) within the group ((3) is always given the main stress. The others in the group are reduced accordingly. The G could perhaps best be treated as an attributive (Adjective Determiner) within an NP. The N within the NP is tusen, millioner, etc. (As observed above a head for the last group is lacking.) In high numbers several NP's are coordinat-</Paragraph>
  </Section>
  <Section position="10" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 439
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> d million d hundra d tio d tusen d hundra d tio d</Paragraph>
    <Paragraph position="4"> d million d hundra d tlo d tusen d hundra d tio d sex millioner sex hundra sex rio sex tusen sex hundra sex tio sex Fig. 1. Deep structure trees for the number 6 666 666: sexmillioner sexhundrasextiosextusen sexhundrasextiosex according to two different solutions (3a and 3b).</Paragraph>
    <Paragraph position="5"> ed. The constituents within G may also be considered as coordinated NP's. Some languages show the conjunction (and) between certain NP's. The phonetic properties of numerals will be treated in greater detail in a study by Dr SrRPA-LErr.~o (to appear).</Paragraph>
    <Paragraph position="6"> 4) The orthographic conventions for numerals are by no means clear. The parts rio/ton are treated (phonologically) as suffixes and always joined with the preceding number. The units are similarly joined with a preceding term rio (sextiosex), perhaps also with preceding hundra (sexhundrasex). As for higher terms it is hard to know: cases are difficult to come by. (Sexmillionersex or sex millioner sex?) The simple numeral</Paragraph>
  </Section>
  <Section position="11" start_page="0" end_page="0" type="metho">
    <SectionTitle>
440 BENGT SIGURD
</SectionTitle>
    <Paragraph position="0"> before hundra (the coefficient) may or may not be joined with hundra (sex hundra or sexhundra). The tens may or may not be joined with the preceding position word: sexhundra sextio or sexhundrasextio. In brief, the situation is unclear and detailed studies of usage are needed. Handbooks are conspicuously silent.</Paragraph>
    <Paragraph position="1"> Many persons probably join all numerals and write them as one long word as a radical solution in this situation. If conventions are to be introduced, they could easily be formulated within the tree structure suggested by (3b). A simple rule would be the following:join all words within the group G, but no other elements. This rule would introduce spaces where there are deep branchings in the tree. We would get the following division for our example: sex miUioner sexhundrasextiosex tusen sexhundrasextiosex. But it is possible to formulate other conventions if desired. Some languages use the hyphen (--) between certain constituents, as we will see.</Paragraph>
    <Paragraph position="2"> The relations between analytic representations and the deep structures described by rules 3a and 3b are fairly simple. Conversion rules operating on analytic representations would have to replace the powers of 10 by proper position words, replace the digits by proper simple numerals etc. Taking the non-analytic decimal representation as input to conversion rules, implies inserting the proper position words, but except for that there is litde difference, We will now focus our interest on automatic conversion schemes.</Paragraph>
  </Section>
  <Section position="12" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4. CONVERSION RULES CHANGING MATHEMATICAL INTO SWEDISH
REPRESENTATIONS
</SectionTitle>
    <Paragraph position="0"> We are now ready to test conversion rules written as generative rules or instructions of different kinds. We will discuss alternative solutions at some points. We distinguish between two blocks of rules: I) fusion rules, which insert position words and introduce syntactic structure; II) lexical rules, which substitute words (simple numerals) for figures (digits).</Paragraph>
    <Paragraph position="1"> Ia) Fusion rules.</Paragraph>
    <Paragraph position="2"> Count number of digits in decimal representation and mark positions. The (maximum) number of positions (digits) is determined by the position of the left-most digit which is not zero (~ (3).</Paragraph>
  </Section>
  <Section position="13" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 441
</SectionTitle>
    <Paragraph position="0"> milliarder after pos.</Paragraph>
    <Paragraph position="1"> 10, milliard if 1 in pos. 10 and higher positions empty.</Paragraph>
    <Paragraph position="2"> millioner after pos.</Paragraph>
    <Paragraph position="3"> 7, million if 1 in pos. 7 and higher post. empty. tusen after pos. 4.</Paragraph>
    <Paragraph position="4"> hundra after post. 3, if 1 in pos. 4 not 0 in pos. 3, else tusen after pos. 4. hundra after pos. 3.</Paragraph>
    <Paragraph position="5"> + tio after pos. 2 Apply the rules until all groups of digits have been dissolved (interspersed by position words). Determine positions anew for each group of digits.</Paragraph>
    <Paragraph position="7"> Normally the maximum position word is inserted, but as mentioned before prices and years may be different, if between 1100-1999, which is taken into account in the rules. Instead of recursive application of the rules, compound position words such as tiotusen, hundratusen could be inserted (under certain conditions).</Paragraph>
    <Paragraph position="8"> As an alternative we might use the following procedure, which has some advantages when rules of pronunciation and rules for separating and combining numerals are to be formulated. It is based on deep structure rules 3b (above) which assumes the division of high numbers into 3-groups (in Swedish often marked by space already).</Paragraph>
    <Paragraph position="10"> Apply the rules in order, repeatedly if necessary, until all digits have been processed.</Paragraph>
    <Paragraph position="11"> As observed before, the elements tio/ton are always joined with the preceding number and treated as suffixes (-ton numbers take the grave accent). This phenomenon is related to the lengthening of t in rio/ton (sjutton, sjuttio, nitton etc.). We have treated this as allomorphic choice (not by phonological rules) assuming allomorphs of tre, sju, nio ending in t. In a traditional generative grammar (meeting demands of economy and simplicity) the doubling of t should be handled by the same rule (dental gemination) that gives bott from bo + t (participle of bo) and bldtt from bld + t (neuter of bld). We have introduced + rio and + ton in our rules with the plus to be used (as internal juncture) for special purposes. There might be better ways of handling the situation; it is depending on how pronunciation and orthographic conventions are to be handled. We have some use of the plus sign, but have to delete it in some cases. In the rule that deletes en (ett) before rio (to avoid the impossible ett-tio) we also delete the plus sign. This gives the resultant tio the right status, and it is not treated as a suffix in e.g. hundratio. Notice that the o in tio can only be deleted (optionally) in the suffix tio (femti(o), sjutti(o)), not in cases such as hundratio where tio is stressed. The fusion rules do what human beings do when they face a string of digits. System (Ia) corresponds roughly to a procedure, where the person counts the number of digits to get an idea how many tens, hundreds, thousands, ten thousands, hundred thousands, millions etc. are in the number, the order of magnitude. Fusion rules Ib correspond to what is done by a person who divides the string of digits into 3-groups (if this is not done already), counts the number of 3-groups to find out whether the number is in the thousands, millions, milliards etc. Estimation of numerousness is easily handled in a glimpse if the number is five or less. Longer strings need counting. Division into groups of three digits is a practical method, which facilitates determination of the size of the number (see B. SICURD, 1972).</Paragraph>
  </Section>
  <Section position="14" start_page="0" end_page="0" type="metho">
    <SectionTitle>
444 BENGT SIGURD
5, FROM SWEDISH INTO MATHEMATICAL REPRESENTATION
</SectionTitle>
    <Paragraph position="0"> Let us now outline a system which is the reverse of the previous conversion system. Again we distinguish between two blocks of rules: I) lexical rules which identify numerals and substitute digits for them; II) fusion rules which identify the position words tusen, hundra etc.; delete these words using their information for placing the digits in proper positions.</Paragraph>
    <Paragraph position="1"> We might call these rules positioning rules. In addition a zero insertion rule is needed. Zeros must be inserted at all empty places in the decimal representation. We will apply the lexical rules first, but the reversed order or mixed order may have advantages.</Paragraph>
    <Paragraph position="3"> Apply the rules in order until all substitutable items have been changed.</Paragraph>
    <Paragraph position="4"> The examples show, within brackets; previous or following applications of rules.</Paragraph>
  </Section>
  <Section position="15" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 445
</SectionTitle>
    <Paragraph position="0"> position of the groups and the position of the digits within the groups in two steps.</Paragraph>
    <Paragraph position="1"> We will not dwell on the details any more. Implementation on computer will determine which analysis is preferable taking the properties of the computer, programming language, storage etc. into account.</Paragraph>
  </Section>
  <Section position="16" start_page="0" end_page="0" type="metho">
    <SectionTitle>
446 BENGT SIGURD
6. OUTLINES OF CONVERSION RULES FOR SOME LANGUAGES
</SectionTitle>
    <Paragraph position="0"> 6.1. English.</Paragraph>
    <Paragraph position="1"> Decimal representations could be considered as an interlingua in an automatic translation system between several languages. Translation between two related languages, such as Swedish and English (or all the Germanic languages) could be handled without using decimal representations, since word-for-word translation (and a few additional rules) would suffice. This will be obvious from the following examination, but we will not discuss such conversion rules for pairs of natural languages.</Paragraph>
    <Paragraph position="2"> I) Fusion rules for English could operate just like in Swedish.</Paragraph>
    <Paragraph position="3"> Within 3-groups we assume that hundred is inserted after position 3, + ty after pos. 2, etc.</Paragraph>
    <Paragraph position="4"> II. Lexical rules.</Paragraph>
  </Section>
  <Section position="17" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 447
</SectionTitle>
    <Paragraph position="0"> English has the same complication in the position word for position 2 as Swedish (and the other Germanic languages). In Swedish the elements tio (alone), ti(o) in e.g.femtio and ton in e.g.femton are assumed to be manifestations of the same position (word). Intuitively most Swedes certainly associate the first two (tio and -ti(o)). In English the manifestations ten and -teen seem the closest.</Paragraph>
    <Paragraph position="1"> English has a hyphen between units and preceding tens (twentyfive; twenty-one thousand). This may be handled by inserting a hyphen before units (the last position) in 3-groups, if tens are preceding. Similarly and must be inserted before tens or units preceded by htmdreds, thousands etc. (250630: two hundred and fifty thousand six hundred and thirty).</Paragraph>
    <Paragraph position="2"> Notice that one/a before hundred, thousand and higher words must not be deleted in English. We refrain from giving any rules for the choice between one and a.</Paragraph>
    <Paragraph position="3"> English is known to have special habits in telephone numbers, e.g.</Paragraph>
    <Paragraph position="4"> the use ofo (ou) for zero and double x as in:five ou double one for 5011. Such idiosyncrasies must be taken into account when constructing automatic recognition systems.</Paragraph>
    <Paragraph position="5">  The position words in German are million, tausend, hundert, -zig/-zehn. German shows perfect similarity between the number for 10 and the suffix in the teens (zehn). There is little allomorphic variation. We note the exceptions for 11, 12. We take zwan as an allomorph ofzwei, just as we took twen as an allomorph of two in English. One might go further and consider zw8 in German, twe in English to in Swedish as allo-</Paragraph>
  </Section>
  <Section position="18" start_page="0" end_page="29" type="metho">
    <SectionTitle>
448 BENGT SIGURD
</SectionTitle>
    <Paragraph position="0"> morphs in zwb'If, twelve, tolv respectively. If so e/f, eleven, elva may be analyzed to contain a variant of the word for 1 and something like -l(e)v-, This is etymologically correct (the last root being related to Swedish lgimna, English leave). But both intuitively and technically this seems unnecessary.</Paragraph>
    <Paragraph position="1"> The conjunction und is obligatory in German between units and tens (the units preceding the tens), optional between hundreds and lower terms. The unit word ein may optionally be deleted before hundert and tausend. It is easy to imagine the rules for German and we refrain from giving further details.</Paragraph>
    <Paragraph position="2">  Hyphen is used extensively between tens and units and between the elements in vigesimal numerals (e.g. quatre-vingt-dix). A connective et is used, but only before un after the numerals for: 20, 30, 40, 50, 60, 70. French has a morpheme ante for tens, but certain features must be associated with a vigesimal system. In the numbers for 11-16 one might distinguish a morpheme ze, which is in fact etymologically related to dix (E. Bouacmz, 1950, p. 68). French has thus three completely different manifestations of the position element for ten: ante, ze, dix, distributed somewhat irregularly.</Paragraph>
    <Paragraph position="3"> We give an outline of the conversion rules for French.</Paragraph>
    <Paragraph position="4">  There is great morphophonemic variation in these numerals as a result of the operation of various sound laws. In languages where the numerals are based on a different non-decimal system, such as the vigesimal or quinary, one might use a kind of adaptive rules between the decimal and the vigesimal, quinary etc. representations, which are suitable as underlying forms for the numerals. Since French only shows some fragments of a different system this is hardly necessary for French. The first block of rules within the lexical block can take care of irregularities. These rules apply to more than one item.</Paragraph>
    <Paragraph position="5">  Danish uses the conjunction og (and) between units and tens and between tens and hundreds. Units precede tens. The lexical rules would have to include the following:  It is tempting i:o do something more insightful to catch the vigesimal character of the later Danish numerals. Such rules should include the following facts. For 50, 60, 70, 80, 90 (y) the number of multiples of 20 (x) is determined, using the simple equation (4) y = x.20.</Paragraph>
    <Paragraph position="6"> The resulting value of x is rendered by the usual numerals (with ~ some minor morphophonemic changes), unless it is a fraction, such as 2.5, 3.5, 4.5. A fraction such as 2.5 is not rendered by to (og en)halv, but by halvtre. If z is the value before halv we can apply the following rule (5) z halv ~ halv (z + 1) This rule is also used e.g. in telling the time in Swedish. Instead of tvd och en halv Swedes say halv tre, when it is half past two. What the rule does is to change an additive representation into a subtractive.</Paragraph>
  </Section>
  <Section position="19" start_page="29" end_page="29" type="metho">
    <SectionTitle>
FROM NUMBERS TO NUMERALS AND VICE VERSA 451
</SectionTitle>
    <Paragraph position="0"> The multiplication sign is rendered by sinds in the Danish numerals and special reduction rules apply to give the short forms. Fortunately new numerals (more like the Swedish) are catching on in Denmark.</Paragraph>
    <Paragraph position="1">  The signs ' , : and . mark word tones. Aspirated voiceless stop or fricative changes into unaspirated voiced stop or fricative. This rule explains the variation between hse : ze for tens, and between hku :gu for units.</Paragraph>
    <Paragraph position="2"> Burmese numerals are well adapted to the decimal system. Restricted knowledge in Burmese makes us hesitate but Burmese numerals seem very simple. They look almost as if they were constructed at the writing-table. The morphological analysis of the words is evident, there are no exceptions, such as our elva and tolv. The morphemes for 1 and 2 have morphophonemic variation (special forms for compounding), but except for that there is little morphophonemic variation. The correspondences are direct, and conversion rules should be simple to write. We will only make some further comments.</Paragraph>
    <Paragraph position="3"> Burmese seems to have (unanalysable, non-compound) position words for high numbers, such as: 10000, 100 000, 1 000 000, 10 000 000 (unless we are mistaken). Burmese also seems to have a word for unit: hku. In Burmese the unit before 10 in the expression for 10 may option-</Paragraph>
  </Section>
  <Section position="20" start_page="29" end_page="29" type="metho">
    <SectionTitle>
452 BBNGT $1GURD
</SectionTitle>
    <Paragraph position="0"> ally be deleted. We have noted the obligatory deletion of 1 before 10 in several languages and the optional deletion of 1 before 100, 1000 in some languages. Burmese uses hse for + ty as well as + teen (and ten).</Paragraph>
    <Paragraph position="1"> This has the effect that some words, such as the words for 18 and 80 are minimal pairs: hse.hyi(18): hyi hse(80), at least from a segmental point of view. We do not have such minimal pairs in Swedish because 10 is represented by ton after the units in the teens, and by ti(o) before the units in the other cases. We note that traditional morphemic analysis cannot reasonably treat ton and ti(o) as allomorphs in Swedish, since they contrast as infemtio :femton (cf. C.-CI~. Er~RT, 1970, p. 154).</Paragraph>
    <Paragraph position="2">  It is clear from this table that the Hausa numbers contain various complications from our point of view. The morpheme for 10 (goma) may optionally appear in the teens. If it is deleted, it seems sha would have to take its place and be considered as a variant (manifestation) of 10. We cannot identify any simple numerals in the words for 20, 30 etc. We might take in as a representative of 10, but the rest has to be treated as a suppletive allomorph of 2, 3, 4 etc. We note that the multiplier (cocfi%icnt) is placed after its position word in numbers such as 2000, 100 000. (Do attributives follow their heads in Hausa?) Note that, as a consequence, the only difference between the words for 100000 and 1100 is the word da. Our knowledge of Hausa is restricted and we will not procede any further in this analysis.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML