File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2030_intro.xml
Size: 2,009 bytes
Last Modified: 2025-10-06 14:03:32
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2030"> <Title>Type of Phonography Consonantal Polyconsonantal Alphabetic Core Syllabic Syllabic</Title> <Section position="3" start_page="117" end_page="117" type="intro"> <SectionTitle> 2 Type of Phonography </SectionTitle> <Paragraph position="0"> Type of phonography, as it is expressed in Sproat's grid, is not a continuous dimension but a discrete choice by graphemes among several different phonographic encodings. These characterize not only the size of the phonological &quot;chunks&quot; encoded by a single grapheme (progressing left-to-right in Figure 1 roughly from small to large), but also whether vowels are explicitly encoded (poly/consonantal vs. the rest), and, in the case of vocalic syllabaries, whether codas as well as onsets are encoded (core syllabic vs. syllabic). While we cannot yet discriminate between all of these phonographic aspects (arguably, they are different dimensions in that a writing system may select a value from each one independently), size itself can be reliably estimated from the number of graphemes in the underlying script, or from this number in combination with the tails of grapheme distributions in representative documents. Figure 2, for example, graphs the frequencies of the grapheme types witnessed among the first 500 grapheme tokens of one document sampled from an on-line newspaper website in each of 8 different writing systems plus an Egyptian hieroglyphic document from an on-line repository. From left to right, we see the alphabetic and consonantal (small chunks) scripts, followed by the polyconsonantal Egyptian hieroglyphics, followed by core syllabic Japanese, and then syllabic Chinese.</Paragraph> <Paragraph position="1"> Korean was classified near Japanese because its Unicode representation atomically encodes the multisegment syllabic complexes that characterize most Hangul writing. A segmental encoding would appear closer to English.</Paragraph> </Section> class="xml-element"></Paper>