File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2030_metho.xml

Size: 5,596 bytes

Last Modified: 2025-10-06 14:10:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2030">
  <Title>Type of Phonography Consonantal Polyconsonantal Alphabetic Core Syllabic Syllabic</Title>
  <Section position="4" start_page="117" end_page="119" type="metho">
    <SectionTitle>
3 Amount of Logography
</SectionTitle>
    <Paragraph position="0"> Amount of logography is rather more difficult.</Paragraph>
    <Paragraph position="1"> Roughly, logography is the capacity of a writing system to associate the symbols of a script directly  with the meanings of specific words rather than indirectly through their pronunciations. No one to our knowledge has proposed any justification for whether logography should be viewed continuously or discretely. Sproat (2000) believes that it is continuous, but acknowledges that this belief is more impressionistic than factual. In addition, it appears, according to Sproat's (2000) discussion that amount or degree of logography, whatever it is, says something about the relative frequency with which graphemic tokens are used semantically, rather than about the properties of individual graphemes in isolation. English, for example, has a very low degree of logography, but it does have logographic graphemes and graphemes that can be used in a logographic aspect. These include numerals (with or without phonographic complements as in &amp;quot;3rd,&amp;quot; which distinguishes &amp;quot;3&amp;quot; as &amp;quot;three&amp;quot; from &amp;quot;3&amp;quot; as &amp;quot;third&amp;quot;), dollar signs, and arguably some common abbreviations as &amp;quot;etc.&amp;quot; By contrast, type of phonography predicts a property that holds of every individual grapheme -- with few exceptions (such as symbols for word-initial vowels in CV syllabaries), graphemes in the same writing system are marching to the same drum in their phonographic dimension.</Paragraph>
    <Paragraph position="2"> Another reason that amount of logography is difficult to measure is that it is not entirely independent of the type of phonography. As the size of the phonological units encoded by graphemes increases, at some point a threshold is crossed wherein the unit is about the size of a word or another meaning-bearing unit, such as a bound morpheme. When this happens, the distinction between phonographic and logographic uses of such graphemes becomes a far more intensional one than in alphabetic writing systems such as English, where the boundary is quite clear. Egyptian hieroglyphics are well known for their use of rebus signs, for example, in which highly pictographic graphemes are used not for the concepts denoted by the pictures, but for concepts with words pronounced like the word for the depicted concept. There are very few writing systems indeed where the size of the phonological unit is word-sized and yet the writing system is still mostly phonographic;2 it could be argued that the distinc- null tems. The symbols are ordered by inverse frequency to separate the heads of the distributions better. The left-to-right order of the heads is as shown in the key. Nevertheless, one can distinguish pervasive semantical use from pervasive phonographic use. We do not have access to electronically encoded Modern Yi text, so to demonstrate the principle, we will use English text re-encoded so that each &amp;quot;grapheme&amp;quot; in the new encoding represents three consecutive graphemes (breaking at word boundaries) in the underlying natural text. We call this trigraph English, and it has no (intensional) logography. The principle is that, if graphemes are pervasively used in their semantical respect, then they will &amp;quot;clump&amp;quot; semantically just like words do. To measure this clumping, we use sample correlation coefficients. Given two random variables, X and Y , their correlation is given by their covariance, normalized by their sample standard deviations:</Paragraph>
    <Paragraph position="4"> For our purposes, each grapheme type is treated as a variable, and each document represents an observation. Each cell of the matrix of correlation coefficients then tells us the strength of the correlation between two grapheme types. For trigraph English, part of the correlation matrix is shown in Figure 3. Part of the correlation matrix for Mandarin  Chinese, which has a very high degree of logography, is shown in Figure 4. For both of the plots in  our example, counts for 2500 grapheme types were obtained from 1.63 million tokens of text (for English, trigraphed Brown corpus text, for Chinese, GB5-encoded text from an on-line newspaper).</Paragraph>
    <Paragraph position="5"> By adding the absolute values of the correlations over these matrices (normalized for number of graphemes), we obtain a measure of the extent of the correlation. Pervasive semantic clumping, which would be indicative of a high degree of logography, corresponds to a small extent of correlation -- in other words the correlation is pinpointed at semantically related logograms, rather than smeared over semantically orthogonal phonograms. In our example, these sums were repeated for several 2500-type samples from among the approximately 35,000 types in the trigraph English data, and the approximately 4,500 types in the Mandarin data. The average sum for trigraph English was 302,750 whereas for Mandarin Chinese it was 98,700. Visually, this difference is apparent in that the trigraph English matrix is &amp;quot;brighter&amp;quot; than the Mandarin one. From this we should conclude that Mandarin Chinese has a higher degree of logography than trigraph English.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML