File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2017_intro.xml
Size: 5,526 bytes
Last Modified: 2025-10-06 14:03:41
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2017"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Analysis and Synthesis of the Distribution of Consonants over Languages: A Complex Network Approach</Title> <Section position="3" start_page="0" end_page="128" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Sound systems of the world's languages show remarkable regularities. Any arbitrary set of consonants and vowels does not make up the sound system of a particular language. Several lines of research suggest that cross-linguistic similarities get reflected in the consonant and vowel inventories of the languages all over the world (Greenberg, 1966; Pinker, 1994; Ladefoged and Maddieson, 1996). Previously it has been argued that these similarities are the results of certain general principles like maximal perceptual contrast (Lindblom and Maddieson, 1988), feature economy (Martinet, 1968; Boersma, 1998; Clements, 2004) and robustness (Jakobson and Halle, 1956; Chomsky and Halle, 1968). Maximal perceptual contrast between the phonemes of a language is desirable for proper perception in a noisy environment. In fact the organization of the vowel inventories across languages has been satisfactorily explained in terms of the single principle of maximal perceptual contrast (Jakobson, 1941; Wang, 1968).</Paragraph> <Paragraph position="1"> There have been several attempts to reason the observed patterns in consonant inventories since 1930s (Trubetzkoy, 1969/1939; Lindblom and Maddieson, 1988; Boersma, 1998; Flemming, 2002; Clements, 2004), but unlike the case of vowels, the structure of consonant inventories lacks a complete and holistic explanation (de Boer, 2000).</Paragraph> <Paragraph position="2"> Most of the works are confined to certain individual principles (Abry, 2003; Hinskens and Weijer, 2003) rather than formulating a general theory describing the structural patterns and/or their stability. Thus, the structure of the consonant inventories continues to be a complex jigsaw puzzle, though the parts and pieces are known.</Paragraph> <Paragraph position="3"> In this work we attempt to represent the cross-linguistic similarities that exist in the consonant inventories of the world's languages through a bipartite network named PlaNet (the Phoneme Language Network). PlaNet has two different sets of nodes, one labeled by the languages while the other labeled by the consonants. Edges run between these two sets depending on whether or not a particular consonant occurs in a particular language. This representation is motivated by similar modeling of certain complex phenomena observed in nature and society, such as, * Movie-actor network, where movies and actors constitute the two partitions and an edge between them signifies that a particular actor acted in a particular movie (Ramasco et al., 2004).</Paragraph> <Paragraph position="4"> * Article-author network, where the edges denote which person has authored which articles (Newman, 2001b).</Paragraph> <Paragraph position="5"> * Metabolic network of organisms, where the corresponding partitions are chemical compounds and metabolic reactions. Edges run between partitions depending on whether a particular compound is a substrate or result of a reaction (Jeong et al., 2000).</Paragraph> <Paragraph position="6"> Modeling of complex systems as networks has proved to be a comprehensive and emerging way of capturing the underlying generating mechanism of such systems (for a review on complex networks and their generation see (Albert and Barab'asi, 2002; Newman, 2003)). There have been some attempts as well to model the intricacies of human languages through complex networks. Word networks based on synonymy (Yook et al., 2001b), co-occurrence (Cancho et al., 2001), and phonemic edit-distance (Vitevitch, 2005) are examples of such attempts. The present work also uses the concept of complex networks to develop a platform for a holistic analysis as well as synthesis of the distribution of the consonants across the languages.</Paragraph> <Paragraph position="7"> In the current work, with the help of PlaNet we provide a systematic study of certain interesting features of the consonant inventories. An important property that we observe is the two regime power law degree distribution1 of the nodes labeled by the consonants. We try to explain this property in the light of the size of the consonant inventories coupled with the principle of preferential attachment (Barab'asi and Albert, 1999). Next we present a simplified mathematical model explaining the emergence of the two regimes. In order to support our analytical explanations, we also provide a synthesis model for PlaNet.</Paragraph> <Paragraph position="8"> The rest of the paper is organized into five sections. In section 2 we formally define PlaNet, outline its construction procedure and present some studies on its degree distribution. We dedicate section 3 to state and explain the inferences that can be drawn from the degree distribution studies of PlaNet. In section 4 we provide a simplified theoretical explanation of the analytical results ob- null tained. In section 5 we present a synthesis model for PlaNet to hold up the inferences that we draw in section 3. Finally we conclude in section 6 by summarizing our contributions, pointing out some of the implications of the current work and indicating the possible future directions.</Paragraph> </Section> class="xml-element"></Paper>