File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-0108_abstr.xml

Size: 10,432 bytes

Last Modified: 2025-10-06 13:41:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0108">
  <Title>Example-based Complexity--Syntax and Semantics as the Production of Ad-hoc Arrangements of Examples</Title>
  <Section position="1" start_page="0" end_page="48" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Computational linguists have traditionally sought to model language by finding underlying parameters which govern numerous examples. I describe a different approach which argues that numerous examples themselves, by virtue of their many possible arrangements, provide the only way to specify a sufficiently rich set of &amp;quot;parameters&amp;quot;.</Paragraph>
    <Paragraph position="1"> Essentially I argue for a different relationship between example and parameter. With examples primary, and parameterizafions of them secondary, the real &amp;quot;productions&amp;quot;. Rather than representing a redundant complexity, examples should actually be seen as a simplification, a basis for the numerous arrangements of their &amp;quot;parameterizations&amp;quot;.</Paragraph>
    <Paragraph position="2"> Another way of looking at it is to say I argue arrangements of examples, rather than simply revealing underlying parameters, represent in themselves an ignored resource for the modelling of syntactic, and semantic, complexity.</Paragraph>
    <Paragraph position="3"> I have implemented a small, working, &amp;quot;shallow parser&amp;quot; based on these ideas.</Paragraph>
    <Section position="1" start_page="0" end_page="48" type="sub_section">
      <SectionTitle>
Introduction--Machine Learning, Data,
and Parameterizations
</SectionTitle>
      <Paragraph position="0"> ! contrast my work with Machine Learning.</Paragraph>
      <Paragraph position="1"> There are similarities in the emphasis on the analysis of relationships among data, but there are also differences in the assumptions about the nature of the system. I think there has been a tacit assumption in Machine Learning that language system consists of underlying parameters which generate a variety of examples. My argument is that you can turn that relationship around and get a great deal more descriptive power in the form of varying parameterizations of the order in a set of examples.</Paragraph>
      <Paragraph position="2"> Under the umbrella of Machine Learning I include a wide variety of data based analyses of language which have become popular in recent years. Both distributed and statistical data based models fit in that category: back-propagation networks, Hidden Markov Models, maximum entropy parametefizafions. Apart from their emphasis on data, however, they have one thing in common, and in common with earlier symbolic attempts to codify language system.</Paragraph>
      <Paragraph position="3"> They all hypothesize parameters for distributions of data. I say it is worth considering that the essence of language is not in such underlying parameters but the collections of examples we seek them through. That there are no underlying parameters, only the chaos of example, much as is the case in a population of people (see also Kenneth Pike &amp;quot;analogies between linguistic structure and the structure of society&amp;quot;, in de Beaugrande ( 1991)).</Paragraph>
      <Paragraph position="4"> One way to describe this is to say that language might be &amp;quot;irreducibly distributed&amp;quot;. A system where a collection of examples is the smallest set which describes all its structure. Although there might be different levels of this independence (along with differing abilities to parameterize: viz. phonology, morphology, syntax). We might contrast irreducibly distributed systems with those which are parametrically distributed, like a letter recognition system. Certainly, however, we could contrast them with statistical, systems, where only the likelihood of the outcomes is variable.</Paragraph>
      <Paragraph position="5">  R from N and the Descriptive Power of Sets The best thing about such &amp;quot;irreducibly distributed&amp;quot; systems is their power.</Paragraph>
      <Paragraph position="6"> The number of combinations of R objects taken from N is C(N,R) = N!/(N-R)!R!. This is the number of &amp;quot;word association classes&amp;quot; N word associations can model, for instance.</Paragraph>
      <Paragraph position="7"> The idea that we can model syntactic classes as &amp;quot;word association classes&amp;quot; is not new. There are numerous studies dating from the early 1990's and before which take this approach e.g.</Paragraph>
      <Paragraph position="8"> Schuetze (1993), Finch (1993); and Powers (1996) lists references back to Pike's Tagmemics. What is different in my approach is the assumed relationship between these classes and the data which reveal them. If the variety of example can be generated by a small number of abstract parameters then we expect one set of relationships among that data to be more important than the others. If on the other hand we consider the full range of relationships possible among all the examples then we have an enormous range of structure at our disposal. Given the problems we have had describing language according to parameters, it is surprising that we have not more widely considered the attraction of this power.</Paragraph>
      <Paragraph position="9"> Consider the evidence that we need this power: a) Structure Collocation, phraseology. The data based analysis of language has bought home more and more strongly that some structure is beyond any logic we can enumerate. Face to face with the reality of use this realization has been most widely accepted in areas of linguistics which deal with language acquisition and teaching.</Paragraph>
      <Paragraph position="10"> Examples of relevant discussions are Pawley and Syder (1983), Nattinger (1980), Weinert (1995). We are talking about explaining why you might say &amp;quot;strong tea&amp;quot; but not &amp;quot;powerful tea&amp;quot;.</Paragraph>
      <Paragraph position="11"> In practical terms a processor based fundamentally on distributions should be able to tell that &amp;quot;strong tea&amp;quot; is idiomatic and &amp;quot;powerful tea&amp;quot; less so because the &amp;quot;word association distributions&amp;quot;, say, of &amp;quot;strong&amp;quot; and &amp;quot;powerful&amp;quot; are different in detail, though not in generalities. A system based on labels, an assumption of underlying parameters, will not be able to do that (for a set of labels smaller than the set of all such distinct utterances).</Paragraph>
      <Paragraph position="12"> An irreducibly distributed representation gives us the power to model collocation. We would need a different syntactic class for every collocational restriction otherwise.</Paragraph>
      <Paragraph position="13"> b) Meaning N!/(N-R)!R! groupings give you an essentially infinite set of configurations. We have the power to associate a different configuration with everything we might ever want to say, if we like. In fact, by default we will do so. This means we have the power to represent not only syntactic idiosyncrasy, but the complexity of meaning, directly.</Paragraph>
      <Paragraph position="14"> The idea of meaning implied by the association is interesting in itself, h is an organization of data. But this is reasonable. And if we accept it then we have a fundamental definition of meaning in terms we can quantify. Meaning is synonymous with an organization of data: events, observations. New organization equals new meaning.</Paragraph>
      <Paragraph position="15"> There is an interesting topical analogy to be made here: a Web search engine. In a sense any collection of documents found &amp;quot;represent&amp;quot; the meaning of a set of search keys. There are many more subtleties of collection possible than can ever be labeled in an index.</Paragraph>
      <Paragraph position="16"> In a way my argument is just.that if we want to model the full complexity of syntactic restriction, or semantic subjectivity, we have no choice but to demote categories from being central, make them a product, and base them on the reorganization of content much the way they are treated in most Web search engines.</Paragraph>
      <Paragraph position="17"> Such an irreducibly distributed definition explains many puzzling properties of thought. It provides a natural mechanism for how:  (meaning, and the primacy of data over parameter) and the vigorous &amp;quot;rebel&amp;quot; linguistic school of Systemic Functional Grammar. Most importantly in SFG the only irreducible definition of meaning, or structure, is a set of contrasts between events, or observations.</Paragraph>
      <Paragraph position="18"> Unfortunately in SFG an overemphasis on abstract parameters (function/meaning) means that in practice the flail power of contrasts among sets to model complexity is not applied. Nevertheless, there are strong parallels between my model and the core tenets of Systemic Functional Grammar. I find that a natural analysis according to the principles I have outlined above results in structure along lines of functional category. In fact the association groupings on which I base my analysis lead me to propose an &amp;quot;inverse&amp;quot; relationship (in a sense that can be precisely defined) between functional category, about which SFG is described, and categories based on syntactic regularities of the type which have traditionally been seen as important.</Paragraph>
      <Paragraph position="19"> A Simple &amp;quot;Association Parser&amp;quot; I have implemented a small &amp;quot;association parser&amp;quot; based on these principles and the initial results have been interesting. I provide a list of typical &amp;quot;parses&amp;quot; in the appendix. Essentially it scores the grammaticality and provides a structural breakdown of each string of words it is presented with. Among more interesting observations, as I mentioned above, is the fact that my parser seems to naturally identify structure along lines of functional equivalence. Rather like the kind of analysis a Systemic Functional Grammarian might favor.</Paragraph>
      <Paragraph position="20"> Since processing is essentially a search over a database for similar examples the main bottleneck is the inefficiency of a serial processor for nearest neighbor search. There are two key complexities. The search over one I have managed to reduce to linear time. The other remains to be resolved.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML