File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/69/c69-2101_metho.xml

Size: 29,965 bytes

Last Modified: 2025-10-06 14:11:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="C69-2101">
  <Title>MONTE CARLO SIMULATION OF LANGUAGE CHANGE IN TIKOPIA &amp; MAORI*</Title>
  <Section position="2" start_page="0" end_page="4" type="metho">
    <SectionTitle>
i. Introduction
</SectionTitle>
    <Paragraph position="0"> The use of Monte Carlo Simulation with micro socio-linguistic models permits testing of many hypotheses unverifiable by any other known method. The methodology underlying the research described in thispaper in outlined, and, to some extent, justified in \[20-22\]. Basically, the technique requires a simulation model with the following subcomponents: a) A stochastic socio-demographic model of a speech community for the starting date of the simulation.</Paragraph>
    <Paragraph position="1"> This model governs the conversational interaction patterns among members.</Paragraph>
    <Paragraph position="2"> b) A metamodel of significant historical events and changes during the simulated time period for use in generating periodic revisions in the basic model mentioned above.</Paragraph>
    <Paragraph position="3"> Sponsored in part by the National Science Foundation and the Wisconsin Alumni Research Foundation.</Paragraph>
    <Paragraph position="4"> c) Individual models of members of the society in the form of dynamically modifiable inputs to the parameters that serve as inputs to the rules of the basic model. The model of each individual also includes one or more grammars that may be filled with generative rules for several languages.</Paragraph>
    <Paragraph position="5"> d) A language learning component, both for children and adults. This module permits the generation and parsing of sentences using rules from the grammars of specified members of the simulation. The learning component makes it possible for a child born during the simulation to acquire the language or languages of his speech community through conversational interaction with other members of the society, and permits an adult either to modify one of his grammars in response to some contemporary linguistic innovation, or to acquire a new language with rules stored in a separate list. The learning component to be used in the system is a greatly improved version of the AUTOLING system \[22 ,23 \]* .t A preliminary testing of the slmulatlon method was successfully carried out using a hypothetical speech community containing 15 adults and 5 children. \[ 21 \] The behavioral model was extremely simple, as were the grammars (limited to a tiny subset of English). The learning model was also simplistic, involving the actual borrowing of full-fledged rules rather than their synthesis from fundamental analytic heuristics. The goal of this test, to attain linguistic and social stability through several generations, was attained. It was important because it demonstrated control of the model as a preliminary to innovations that might introduce linguistic or social change. (The particular simulation used a different kind of phrase structure rule notation than we currently use.) Now our research is directed toward the testing of the methodology through simulation of language change in a real speech community, in sufficient depth and detail that the predictions of the simulation will be subject to emprical verification. In our preliminary search for a suitable test / case we first selected the speech community on the island of Tikopia in the South Pacific. This community seemed ideal because of the existence of excellent functional ethnological studies by Raymond Firth that took place in 1928-29, 1951 and 1962 \[ 7-11 \], and because Tikopia was virtually untouched by World War II. Both the pertinent detail of Firth's studies and the relatively restricted and documented foreign contacts during this period seemed ideal for our work, and we put some effort into desig~ling a simulation system that could handle Tikopian Society and yet  have a basic generality. Unfortunately, Firth was unable to supply us with his linguistic field notes for Tikopia (little else of a suitable nature exists%.</Paragraph>
    <Paragraph position="6"> We then decided to switch to a simulation of language change among the Maori of New Zealand. The documentation for this group is voluminous and covers several centuries. Of particular value is the existence of census data on the Mario dating back to the nineteenth century. The time scale and detail level of the Maori model must be of a coarser sort than for Tikopia because of computer time and space demands, for it must account for a population 40 to 90 times greater than that of Tikopia over a time period of perhaps 150 years. However, we found that the design of our simulation system needed little or no modification for the Maori.</Paragraph>
    <Paragraph position="7"> We explicate the representation of both soci~linguistic situations in Section 3 to provide the reader with insight into the methodology.</Paragraph>
  </Section>
  <Section position="3" start_page="4" end_page="25" type="metho">
    <SectionTitle>
2. Language Learning Component
</SectionTitle>
    <Paragraph position="0"> The language learning logic of the AUTOLING System will furnish the basis for the learning component of the simulation system. AUTOLING is an automated linguistic fieldworker capable of learning generative grammars through teletype interaction with a live human informant. The program  is operational on the Burroughs 5500 computer*, and has been successfully tested on selected problems in English, Latin, Roglai, Indonesian, Thai and German. The discovery methods are heuristic rather than algorithmic, and the system is under continued modification. One subcomponent is capable of learning context free phrase structure rules in response to informant inputs consisting of sentences segmented into morphemes. An attempt is made to parse each informant input sentence on the basis of the current tentative grammar. If the rules are adequate, the program prints the fact in a teletype message. If no~ it posits rules that might enable the parsing process to be completed.</Paragraph>
    <Paragraph position="1"> These rules, and their more general ramifications for the grammar as a whole, are tested via productions offered to the informant for acceptability verification. Rejected sentences cause the newly posited rules to be discarded.</Paragraph>
    <Paragraph position="2"> Acceptance of false rules through incomplete testing can occur. At the present time, the program tests for such a possibility by attempting to parse various known illegal sentences. The most recently recorded ones are tested every time a new rule is coined. All illegal sentences are tested at periodic intervals. If the bad rules were coined too far in the Preliminary programming is in ALGOL for the Burroughs 5500 computer, eventually the program will be shifted to the compatible Burroughs 6500.</Paragraph>
    <Paragraph position="3">  past for correction, the program throws out its entire grammar, and reanalyses the entire corpus, using the illegal sentence responsible for the situation as one of the key controls on the new grammar. A later version of AUTOLING will make a stronger attempt to determine the specific culprit rules, and take corrective action in form of transformations or simple context-sensitive phrase structure rules. In fact, eventually, the system will learn a transformational grammar consisting of unordered phrase structure rules plus obligatory transformations that operate whenever conditions permit during the generation process. Also, a morphology learning component will be integrated int O the system.</Paragraph>
    <Paragraph position="4"> For the simulation system, the human informant is replaced by another grammar associated with another member o~ the community. While the system will contain only one learning program with its associated parsing and generation routines, each grammar associated with each member of the community might~on various occasions, serve as the grammar in which learning takes place, or as the grammar used to accept or reject the productions of an 'embryonic ' grammar. Learning feedback in an adult-adult conversation will not occur as often as in a child-adult context. The exact circumstances under which an individual's grammar learns or teaches are determined by the socio-demographic model.</Paragraph>
    <Paragraph position="5"> Special features that must be added t~ or modified in the AUTOLING system include the following: a. Multilin~ual Dictionary: For Tikopia, a list of Tikopian, English and Melanesian Pidgin morpheme equivalents. Any individual auditing new lexical items will add a list link in his grammar (which references terminal e~lements only indirectly) to the appropriate entries. Links to corresponding morphemes (if any exist) in other languages will be entered only if the person has actually been exposed to the form in conversation.</Paragraph>
    <Paragraph position="6"> For specialized vocabulary, the entries will also contain markers of the context in which the item is to be used. b. Sentence Generator: Both the Generator and the Parser use the same grammars. The generator selects non-terminal rewrite rules according to relative frequency parameters that are modified during the parsing process. Terminal elements are referred to by links to the dictionary. Some terminals are selected on the basis of the generation context, i.e., specialized vocabulary referring to items of material culture. Under some conditions, a terminal's translation equivalent in another language may be chosen.</Paragraph>
    <Paragraph position="7"> In the specialized case of normative learning, e.g., in a child-parent relationship, the generator will test newly formed rules by pertinent test productions offered to the normative teacher for acceptance or rejection.</Paragraph>
    <Paragraph position="8">  c. Parser: The parsing component may modify the frequency parameters pertinent to the generation process as a function of a particular rule's use in recent parsings.</Paragraph>
    <Paragraph position="9"> 3. Modelling Tikopia and Maori  Some generality in the system design would be necessary even if one intended to model only one society. In particular, the rules governing the interaction of members of %he population would undoubtedly be subject to frequent revision during the course of research as it might become apparent that some variables modelled were not pertinent, ana that ommitted ones were significant. A fully general system, capable of modelling any society, must contain, implicitly, a universal theory of socio-linguistic behavior. A basic assumption of our system is that an individual's group memberships constitute the major determinants of his conversation behavior. Therefore it is essential that the system provide an efficient means of describing an individual's age, sex, political, kin, work and social group memberships as well as data of a purely geographic nature.</Paragraph>
    <Paragraph position="10"> Specifically, for Tikopia, it seemed that age, sex, village, clan, religion, household, marital state, work groups, and social status were the key variables governing conversational interaction . We planned to simulate a thirty or thirty-five year time span in a model containing a population sample of about 120-165 people distributed  among three villages, representing about 1300 to 1800 people distributed among approximately 25 villages.</Paragraph>
    <Paragraph position="11"> The decision to construct the model with a few villages containing a large fraction of their real-world population (as opposed to more villages with fewer modeled people per village) was made on the basis of material contained in Firth \[ 7-11 \] indicating the village as the largest pertinent unit for our purposes. The decision to model three villages was based on the recognition of the subordinate, but real pertinence of inter-village relations. The problem of representing a complete multi-generation kinship structure for each individual also set a lower bound on the number of people per village.</Paragraph>
    <Paragraph position="12"> The actual method of crea}ing an initial population state is rather complex, and is described in Section 4.</Paragraph>
    <Paragraph position="13"> The researcher attempting to model Maori society is faced with the problem of finding pertinent data in a vast literature of essentially non-pertinent material. Fortunately, official government census information, dating back to the mid-ninteenth century, provide valuable demographic data.</Paragraph>
    <Paragraph position="14"> The population size demands a different kind of sampling than in the Tikopia model. The population ranged from 56,000 in 1857-8 to 167,000 in 1961. A study of the literature suggestS that the Maori-English linguistic acculturation  phenomena might best be modelled in the following way: a. Population: Sample size ranging from i00 to 300 Maori plus English speakers.</Paragraph>
    <Paragraph position="15"> b. Geographical Distribution: Two communities remote from white contact, plus the graduate creation of a city population group, and a group in an intermediate location.</Paragraph>
    <Paragraph position="16"> c. Key Social Variables: Tribe, hap~ or tribal sub null group, social class (aristocrat or commoner~ age group (child, young unmarried, young married to middle aged, elder) lineag~ work groups or occupatio~ and religion. The hap~, rather than the immediate family, appears to be the minimal significant social unit of organization for the goals of our simulation.</Paragraph>
    <Paragraph position="17"> In the case of city dwelling Maori, residence in the same city constitutes another group membership.</Paragraph>
    <Paragraph position="18"> d. Meta-model of Historical Change: Gradually increasing contact with English speakers, wars, gradual migrations to urban areas.</Paragraph>
    <Paragraph position="19">  4. Systgm Organization and Construction of the Data Base  The learning program, as it stands, demands an interaction between a live informant and teletype outputted questions. It is necessary, for the purpose of reducing the enormous computer time required for the successful ii simulation of change in linguistic patterns, to be able to break the current program into two parts -- one part that can read sentences input to it without asking for immediate help, and another which will generate sentences randomly, based on the rules that were formulated during the input stage.</Paragraph>
    <Paragraph position="20"> The portion of the program that is responsible for the generation of random sentences will also determine the context in which the sentence was spoken. Context is determined by defining the subclass of persons who would be listening to this sentence, and placing an indicator of this subclass in the file of sentences which are generated. The sentences will be placed in a file, that will later be passed against all individuals in the sample in order that particular aquaintances are able to &amp;quot;hear&amp;quot; what was said-at the same time creating rules which shall be used in the next generation pass.</Paragraph>
    <Paragraph position="21"> At major points in the process, events take place that need not be thought of during the normal cyclic activity. These involve the life and death routines, marriage ceremonies, arrival-departures, and recreation of the aquaintance lists that describe who is listened to. Because of the one-to-many character of speeches, it is possible to keep the aquaintance lists to a manageable size by listing only those persons whom one listens to, and not those who  are spoken to.</Paragraph>
    <Paragraph position="22"> Before we examine the conversation process further let us discuss the general problem of creating a sample for data that is available only in aggregate form.</Paragraph>
    <Section position="1" start_page="12" end_page="17" type="sub_section">
      <SectionTitle>
4.1 Sample Generation
</SectionTitle>
      <Paragraph position="0"> For many groups to be studied by the process described in this paper, samples do not exist. If any information exists at all about these groups it is often in the form of cross-tabulation tables published as an indication of census patterns, and is usually not given in its raw form.</Paragraph>
      <Paragraph position="1"> *The problem of creating a kinship structure is not of this uype. In the case of Tikopia it is essential to keep urack of kin relations with contemporaries that may owe unelr origin to links with common ancesters, perhaps 2 or 3 qenerations removed, who may be deceased at the start of the slmulation. The best automated method we could devise involves running an accelerated, prefatory, partial simuia=Ion of the society beginning several generations before nhe official start date. The only aspects modelled would he those governing birth and death, residence change and marrlage rules. Initially, all individuals would be assumed zc De unrelated, and marriage would take place with relatlve freedom. As the prefatory simulation progreSSes through successive generations, kin ties are createdjand the free choice of spouses disappears. By the time the presimulation is completed, the original starting populatlon is aeao, and each member of the main simulation population has a complete and consistent set of kinship relations. The level of dezail in the Maori situation does no~ demand this microcomputation of kinship (see Section 3).</Paragraph>
      <Paragraph position="2">  To model groups of people where it is impossible to collect raw data because of expense, time, or other complications such as the passage of time rendering the sample change (historical groups), it is often necessary to create a sample of people artificially. Since any such attempt will result in an incorrect sample, it is important to realize this beforehand and be on guard when viewing the results of the study against arriving at conclusions which are invalid. We can, however, obtain results that have some validity by restricting our discussion to those characteristics of the sample that we are able to insert into our sample creation process by the heuristic methods described below. We realize that heuristic processes are just that-there is no real guarantee of success in creating a sample which is totally accurate. But by prefacing theresults of our study with this disclaimer, and restricting our stated conclusions to those population characteristics which we know to be true, useful research can be expected.</Paragraph>
      <Paragraph position="3"> We can illustrate the sample creation best by an immediate example.</Paragraph>
      <Paragraph position="4"> Suppose we are interested in the study of linguistic patterns as they are formed with respect to three variables-age, sex, and marital status. It is necessary first to describe the catagories that are important to us for each of the variables in the model.</Paragraph>
      <Paragraph position="5">  If we posit that age does not influence linguistic patterns except in major catagories, we can break the ages into the three groups Young, Adult, and Elder.</Paragraph>
      <Paragraph position="6"> Since the other two variables Sex and Marital Status have well defined groupings (Male, Female; Married, Unmarried), we can define our task with the following table:  Defining a population artificially for the requirements of the simulation process involves the accurate choice of percentages of the total population for each of the above permuted catagories of variables. This can be done in many ways.</Paragraph>
      <Paragraph position="7">  i. By hand. The above percentages may be chosen by the researcher after careful reading of documents describing population characteristics.</Paragraph>
      <Paragraph position="8"> 2. By computer algorithm. There are often published statistics on populations that can be used to create appropriate percentages. Cross-tabulation tables are the most fruitful in this attempt, as they often contain all of the necessary information within them. If they do not, other population statistics such as correlation matrices may be used ~.g., lacking a published table displaying the relationship between Age and Income, a correlation coefficient of .46 is useful). Since some of the information may be either contradictory or of disproportionate value, it is necessary that a decision be made on the actual ~istribution characteristics. If tables are available showing the relationships, they should be used. But if tables are not available, or if the only available information about a particular relationship is in the form of another statistic, the preferable thing to do is to create the table by hand, based on research of the textaal material.</Paragraph>
      <Paragraph position="9"> For example, assume that we wish to build a file of persons as mentlon~d earlier. In reviewing the published tables, however, we cannot find a table relating Ag4~and Marital Status. We do find, on the other hand, that the correlation between Age and Marital Status is given as .43. Using this  information, together with research of the text, it may be possible to generate a table of the following form:  If we make no use of the knowledge of the correlation coefficient of .43 between Age and Marital Status, we may generate a sample that has serious faults. Not making use of it in this case would be similar to creating a table of the form:  not be distributed evenly with respect to age.</Paragraph>
      <Paragraph position="10"> If a process of random selection over the specified probability distributions (the relative frequency tables) is used to create the persons in our sample, it should be  possible to run a cross-tabulation on this data with the result being that we can reproduce the tables that we started with to create that data.</Paragraph>
      <Paragraph position="11"> After the process of sample creation is finished, we may produce a table of the form:  It can be seen that since there are few (the number is rounded to 0%) young married males, more information was used to arrive at these values than merely the use of the marginal distributions. Their use alone would imply that 1 there should be approximately 6~ % young married males.</Paragraph>
    </Section>
    <Section position="2" start_page="17" end_page="21" type="sub_section">
      <SectionTitle>
4.2 Aquaintance Lists.
</SectionTitle>
      <Paragraph position="0"> To model the linguistic patterns as they occur in the real world, it is necessary to account in some way for appropriate dissemination of information by insisting that each person speak for the most part with the same persons he spoke to in the past. This is a tedious process if done dynamically at the time the conversations are to take place in the computer simulation, we can show that it is parsimonious to create an &amp;quot;acquaintance list&amp;quot; of those persons who are in frequent contact with each individual, and to change t this acquaintance list at more infrequent intervals. The acquaintance lists may be updated together with other major actions, such as the birth and death routines, arrivals and departures, and the occurance of natural phenomena such as seasonal change.</Paragraph>
      <Paragraph position="1">  We may build the acquaintance list by a technique closely approximating that which occurs naturally by the &amp;quot;best fit&amp;quot; method in which two persons are said to be &amp;quot;acquaintances&amp;quot; if they have various attributes in common -- they may live near each other, work together, or belong to the same social group. If many attributes are in common, then, these people will be very likely to be forced to speak to one another whether or not they might be classified correctly as &amp;quot;friends&amp;quot;.</Paragraph>
      <Paragraph position="2"> More formally, we may define a person's attributes by his position in the sample space. For a sample of n variables, a person can be defined by t:he n-tuple (Y=I,V2,..., Vn). By a simple calculus, we can map this point from the integer n-space into the boolean m-space, where m is greater than or equal to n , and each variable now has the value 1 if the persons can be characterized by the truth of this attribute, and 0 otherwise. For example, the variable Ag~ in our example abov~ would be changed from one variable with three values to three variables with two values each. From Age: l=Young, 2=Adult, 3=Elder, we would construct Young in Age: l=True, 0=False; Adult in Age: l=True, 0=False; Elder in Age: l=True, 0=False.</Paragraph>
      <Paragraph position="3"> A person in our sample can now be characterized by the b~olean m-tuple (BI,B2,...,Bm). In order to determine which attributes that two persons have in common, it is necessary  to ADD (multiply) these to boolean vectors together. The resultant vector has l's in the positions where the two persons origionally both had had l's, and no place else are there l's.</Paragraph>
      <Paragraph position="4"> To account for the disproportionate import of the fact that two attributes are in common, and in some instances to correct for the fact that persons may be more likely to be acquaintances if they do not have two particular attributes in common (e.g., Sex), the resultant vector is multiplied by a third Weight vector W .</Paragraph>
      <Paragraph position="5"> The Resultant vector is summed to a scalar, and this number is compared to an externally specified &amp;quot;hit&amp;quot; value &amp;quot;H&amp;quot; to determine whether these two persons are said to be &amp;quot;acquaintances&amp;quot;. Example:</Paragraph>
      <Paragraph position="7"> In this case we see that since the value of our calculation X does not exceed or equal the hit value H , we reject these two persons as being acquaintances. This re- null jection can be easily changed into a more dynamic technique by the use of more sophisticated stochastic methods, such as the rejection being conditions on a random number exceeding the difference between the numbers X and H .</Paragraph>
      <Paragraph position="8"> Further selection is necessary to determine one-sided relationships. It may be possible that A is an acquaintance of B (B listens to A) but B is not an acquaintance of A (A does not listen to B), for instance if A is a village chief, and B is a non-destinguished village member. null</Paragraph>
    </Section>
    <Section position="3" start_page="21" end_page="25" type="sub_section">
      <SectionTitle>
4.3 Conversation Interaction
</SectionTitle>
      <Paragraph position="0"> The flow of the generation and parsing process is as follows (the only exceptions are in the case of normative learning were immediate auditor feedback is required): a. Conversation Creation: i. Generate all utterances from each grammar at one time, by passing the grammar file serially.</Paragraph>
      <Paragraph position="1"> A. The number of utterances for each pass is set as an external parameter deg'S&amp;quot; .</Paragraph>
      <Paragraph position="2">  The conversation creation routine will peruse the acquaintance lists of each persons to generate &amp;quot;listens&amp;quot; in the form of ordered triples (a,b,c), where</Paragraph>
      <Paragraph position="4"> This triple (a,b,c) will be placed in a file called the &amp;quot;listen&amp;quot; file. The &amp;quot;listen&amp;quot; file, when finished, will be a stack of entries in order by the first entry a i .</Paragraph>
      <Paragraph position="5">  For any two persons A and B, A Can listen to sentences produced by B in only one context.  i. Bring in the grammar for person A from second-level memory 2. Determine the address on second-level memory of the conversation specified by the triple (a,b,c)  and bring it into first-level memory (core). 3. Parse, or &amp;quot;listen&amp;quot; to the sentence. 4. Iterate on step 2 until all sentences are parsed. 5. Put the new grammar for this person on second-level memory .....</Paragraph>
      <Paragraph position="6"> 6. Get the next grammar from second-level memory and go to step 2.</Paragraph>
      <Paragraph position="7"> 7. If no next grammar, increment the time counter. 8. If time to recreate the acquaintance lists or other major events such as birth/death routines and arrivals/departures do so.</Paragraph>
      <Paragraph position="8"> 9. Iterate on step 1 until finished with entire simulation process.</Paragraph>
      <Paragraph position="9"> 5. Interpretation of Results  The key problem is determining the success or failure of a simulation. Assuming everything else has gone well, how does one compare the grammars of the population members to determine their mutual similarities and their relation to the language situation in contemporary, real world Maori Society? The design of the system offers a uniqu~ detailedsquantitative method for determining the similarity of the competence of speakers. Every legal sentence ever generated in the course of the simulation is saved by the system. At  the end of the simulation (or some other time) each individual must attempt to parse every legal sentence ever produced. Different individuals may expect to have varying degrees of success in their parsing attempts. Analysis of the results can offer a detaile~ objective picture of the dialect situation on the basis of common success or failure in parsing particular sentences. These results may be correlated with any socio-demographic factors recorded in the data base of the model.</Paragraph>
      <Paragraph position="10"> Given these rssults, one may then send the same list of sentences to New Zealand, and have the analogous test performed on a sample of the Maori population, asking informants to indicate the legal and illegal sentences.</Paragraph>
      <Paragraph position="11"> The results of the live testing may then be compared with the simulation results. Thus, the Monte Carlo simulation approach appears to offer Linguistics a strong empirical methodology for testing otherwise unverifiable hypotheses.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML