XML Viewer - j84-3001

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/84/j84-3001_concl.xml
Size: 15,141 bytes
Last Modified: 2025-10-06 13:56:03
<?xml version="1.0" standalone="yes"?>
<Paper uid="J84-3001">
  <Title>On the Mathematical Properties of Linguistic Theories I</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
8 Following Faueonnier, Ladusaw's denotation functions take as their
</SectionTitle>
    <Paragraph position="0"> values sets, ordered as usual. Sentences, for example, get as denotations the set of all worlds in which they are true.</Paragraph>
    <Paragraph position="1"> 9 There has always been interest in finite-state grammars to account for some perceptual constraints on sentence recognition, such as the difficulty of center-embedded sentences - e.g., &amp;quot;The rat that the cat that the dog chase ate died&amp;quot; (Langendoen 1975, Church 1981, Langendoen and Langsam 1984). They have also provided useful models in morphology (Kay 1983, Koskenniemi 1983) and phonology (Church 1983), 172 Computational Linguistics, Volume 10, Numbers 3-4, July-D~cember 1984 C. Raymond Perrault On the Mathematical Properties of Linguistic Theories on the basis of sentences containing such that. Postal and Langendoen (this issue, p. 177) do so with cases of sluicing* Pullum and Gazdar (1982) (convincingly, I believe) refute the first two cases by claiming that the constraints on which they are based do not in fact hold.</Paragraph>
    <Paragraph position="2"> Similarly, Pullum (this issue, p. 182) argues against Postal and Langendoen, and against Higginbotham, again on the basis of the linguistic facts* Pullum and Gazdar also consider the case of verb and noun-phrase ordering in Dutch; although they show that no evidence has been given suggesting that the weak generative capacity of Dutch is greater than context-free, the phrase structure trees generated by their fragment are not obviously adequate for a compositional semantic analysis. This point is also made by Bresnan et al. (1982).</Paragraph>
    <Paragraph position="3"> The most convincing evidence so far against the weak context-freeness of natural languages comes from Swiss-German* Shieber (1984) shows that, like Dutch, Swiss-German allows cross-serial order in subordinate clauses but also requires that objects be marked for case, as in German. Given that the verb hdlfed 'hell ~' takes a dative object while aastriiche 'paint' and lo'nd 'let' take accusative objects, we get the following subordinate clauses, which can be made into complete sentences by prefixing them with Jan sdit das 'Jan says that'* *.. mer em Hans es huus h~ilfed aastriiche *.. we Hans-DAT the house-ACC helped paint *.. we helped Hans paint the house * .. *me em Hans es huus 16nd aastriiche *.. we Hans-DAT the house-ACC let paint *.. we let Hans paint the house *.. mer d'chind em Hans es huus ltind h~ilfed aastriiche *.. we the children-ACC Hans-DAT the house-ACC let help paint *.. we let the children help Hans paint the house *.. *mer d'chind de Hans es huus ltmd h~ilfed aastriiche *.. we the children-ACC Hans-ACC the house-ACC let help paint *.. we let the children help Hans paint the house The proof that Swiss-German (SG) is not context-free is classic: intersect SG with the following regular language: Jan s~iit das mer (d'chind)*(em Hans)* es huus htind wele (laa)*(hfilfe)* aastriche.</Paragraph>
    <Paragraph position="4"> With some care, Shieber argues from the data that SG 13 L is the language Jan s~iit das mer (d'chind) m (em Hans) m es huus h~ind wele (laa) m (h~ilfe) m aastriche.</Paragraph>
    <Paragraph position="5"> which is not context-free. Since context-free languages are closed under intersection with regular languages, Swiss-German is not context-free either.</Paragraph>
    <Paragraph position="6"> Hintikka (1977) claims that English is not recursive, let alone context-free, based on the distribution of the words any and every. His account of why John knows everything is grammatical while John knows anything is not, is that any can appear only in contexts where replacing it with every changes the meaning* If equivalence of meaning is taken to be logical equivalence, this means that gramm~/ticality is dependent on the determination of equivalence of logical formulas, an undecidable problem* Several responses could be made to Hintikka's claim* One is to argue, as did Ladusaw (1979), that the constraint is semantic, not syntactic* Another route, followed by Chomsky (1980), is to claim that a simpler solution is available, namely, one that replaces logical equivalence with syntactic identity of some kind of logical form. This is the basis for Linebarger's analysis. 9.3. Metatheoretical results as upper bounds In the preceding section, we discussed ways in which formal results about syntactic theories can be used against them on the grounds that they show them to be insufficiently powerful to account for the observed data. Now, given a theory that is powerful enough, can its formal properties be used against it on the basis that it fails to exclude impossible languages? The classic case of an argument of this form is Peters and Ritchie's argument against the TG model, discussed in Section 4.</Paragraph>
    <Paragraph position="7"> More generally, the premises are the following:  too large to be the class of possible languages* One conclusion from this argument is that theory T is incorrect, i.e., that assumption (3) fails* Chomsky rejects assumption (1) instead, insisting that the possible languages are those that can be learned, tdeg Although Chomsky also claims that the class of possible languages is finite, tt the crucial concern here is that, finite or not, the class of possible languages could contain languages that are not recursive, or even not recursively enumerabie. For example, let L be a non-recursive language and L r its complement (also non-recursive).</Paragraph>
    <Paragraph position="8"> Let s be some string of L and s p some string of L r. The procedure by which the subject chobses L if s is encountered before s r and L r otherwise will learn one of L or L ~. 10 Learning algorithms can be compared along several dimensions. For a mathematical framework for learnability theory, see Osherson et al. (1983).</Paragraph>
    <Paragraph position="9"> 11 Actually, finiteness is claimed for the class of core grammars, from which the possible languages are assumed to be derived. Core languages and possible languages would be the same only &amp;quot;under idealized conditions that are never realized in fact in the real world of heterogeneous speech communities .... Each actual 'language' will incorporate a periphery of borrowings, historical residues, inventions, and so on, which we can hardly expect to - and indeed would not want to - incorporate within a principled theory of UG.&amp;quot; (Chomsky 1981: 8) Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 173 C. Raymond Perrault On the Mathematical Properties of Linguistic Theories Chomsky (1980) argues convincingly that there is no case for natural languages being necessarily recursive.</Paragraph>
    <Paragraph position="10"> Nevertheless, languages might just happen to be recursive. Putnam (1961) gives three reasons he claims &amp;quot;point in this direction&amp;quot;: 1. &amp;quot;Speakers can presumably classify sentences as acceptable or unacceptable, deviant or nondeviant, et cetera, without reliance on extra-linguistic contexts. There are of course exceptions to this rule...&amp;quot;, 2. Grammaticality judgments can be made for nonsense sentences, 3. Grammars can be learned.</Paragraph>
    <Paragraph position="11"> The first reason is most puzzlir~g. The reference to &amp;quot;extra-linguistic context&amp;quot; is irrelevant; without it, reason (1) seems to be asserting th~/t acceptability can be decided except where it cannot be. With respect to the second reason, the fact that grammaticality judgments could be made for some nonsense sentences in no way affects the question of whether they can be made for all grammatical sentences. Finally, languages could be learnable without being recursive, as it is possible that all the rules that need to be acquired could be on the basis of sentences for which the recognition procedure succeeds.</Paragraph>
    <Paragraph position="12"> Peters and Ritchie (1973a) contains a suggestive but hardly conclusive case for contingent recursivity:  1. Every TG has an exponentially bounded cycling function, and thus generates only recursive languages, 2. Every natural language has a descriptively adequate TG, and 3. The complexity of languages investigated so far is  typical of the class.</Paragraph>
    <Paragraph position="13"> If learnability rather than recognizability is the defining characteristic of possible languages, no claim refuting a theory on the grounds that it allows difficult languages will bear any weight, unless it can also be shown that possible languages are in fact easier to recognize than the recognizability theory predicts them to be. However, our everyday experience with language understanding leads us to think that syntactic recognition is a computationally efficient process - an observation, of course, that is the basis for Marcus's claim (1980) that a large part of it can be done in linear time, if not in real time. How are we to reconcile this with the O(g)-results we have for most theories, where g is at least quadratic? t2 These intuitive conclusions are based on observations (1) of &amp;quot;everyday&amp;quot; sentences, (2) where some nonsyntactic processing is done in parallel, (3) by the human processor. Each of these points is important.</Paragraph>
    <Paragraph position="14"> 12 It has already been pointed out that O(g) results are upper bounds, and showing that a recognition problem, for example, is O(g) does not mean that, for any Language, it is necessary to reach the upper-bound. Better upper-bounds can be achieved by tighter proofs, not just by better algorithms.</Paragraph>
    <Paragraph position="15"> Although recognition may appear to be done in real time for most sentences encountered day to day, the O-results are asymptotic worst-case measures. It is therefore essential to obtain measures of recognition times for a variety of strings of words, whether sentences or not, and especially see if there are short, difficult ones. There are at least two cases of interest here. The first is that of garden-path sentences such as The horse raced past the barn fell and Have the students who failed the exam take the supplementary, which are globally unambiguous but locally ambiguous. These appear to be psychologically difficult. Another case is that of sentences that, in most grammars, are ambiguous because of attachment choices, such as those discussed by Church and Patil (1982). Finding one parse of these sentences is easy, but finding them all may be exponentially difficult. Psychological measures show these sentences not to be difficult, suggesting that not all parses are constructed or that they can all be examined in parallel.</Paragraph>
    <Paragraph position="16"> O-results depend on some underlying machine model, and most of the results known for language recognition have been obtained on RAMs. Can implementation changes improve things on relevant range? As mentioned above, the sequential models are all polynomially related, and no problem not having a polynomial time solution on a sequential machine is likely to have one on a parallel machine limited to at most a polynomial number of processors, at least if P is not equal to NP.</Paragraph>
    <Paragraph position="17"> Both these results restrict the improvement one can obtain by changing implementation, but are of little use in comparing algorithms of low complexity. Berwick and Weinberg (1982) give examples of how algorithms of low complexity may have different implementations differing by large constant factors. In particular, changes in the form of the grammar and in its representation may have this effect.</Paragraph>
    <Paragraph position="18"> It is well-known that implementation of machines with infinite storage on finite devices leads to a change in specification. A context-free parser implemented on a machine with finite memory will have a bounded stack and therefore recognize only finite-state languages. The language recognized by the implemented machine could therefore be recognized by another machine in linear time. Although one would rarely use this strategy as a design principle, a variant of it is more plausible: use a restriction of the general method for a subset of the inputs and revert to the general method when the special case fails. Marcus's parser (1980) with its bounded look-ahead is a good example. Sentences parsable within the allowed look-ahead have &amp;quot;quick&amp;quot; parses, but some grammatical sentences, such as &amp;quot;garden path&amp;quot; sentences cannot be recognized without an extension to the mechanism that would distort the complexity measures. A consequence of the possibility of implementation of this character is that observations of their operation ought to show &amp;quot;discontinuities&amp;quot; in the processing time, depending on whether an input is in or out of the restricted subset. 174 Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 C. Raymond Perrault On the Mathematical Properties of Linguistic Theories There is obviously much more of this story to be told.</Paragraph>
    <Paragraph position="19"> Allow me to speculate as to how it might go. We may end up with a space of linguistic theories, differing in the idealization of the data they assume, in the way they decompose constraints, and in the procedural specifications they postulate. I take it that two theories may differ in that the second simply provides more detail than the first as to how constraints specified by the first are to be used. Our observations, in particular our measurements of necessary resources, are drawn from the &amp;quot;ultimate implementation&amp;quot;, but this does not mean that the &amp;quot;ultimately low-level theory&amp;quot; is necessarily the most informative, or that less procedural theories are not useful stepping stones to more pr6cedural ones.</Paragraph>
    <Paragraph position="20"> It is also not clear that theories of different computational power may not be useful as descriptions of different parts of the syntactic apparatus. For example, it may be easier to learn statements of constraints within the framework of a general machine. The constraints once learned might then be subjected to transformation to produce more efficient special-purpose processors also imposing resource limitations.</Paragraph>
    <Paragraph position="21"> Whatever we decide to make of existing formal results, it is clear that continuing contact with the complexity community is important. The driving problems there are the P = NP question, the determination of lower bounds, the study of time-space tradeoffs, and the complexity of parallel computations. We still have some methodological house-cleaning to do, but I don't see how we can avoid being affected by the outcome of their investigations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML