File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/89/e89-1035_concl.xml
Size: 4,231 bytes
Last Modified: 2025-10-06 13:56:21
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1035"> <Title>THE SYNTACTIC REGULARITY OF ENGLISH NOUN PHRASES</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> CONCLUSION </SectionTitle> <Paragraph position="0"> Our results demonstrate quite clearly that a feature-based unification grammar employing a recursive and 'deeper' style of analysis captures the relevant generalisatious more efficiently than the analysis and implicit formalism employed by Sampson (1987a). We have reduced approximately 700 types to between 36 or 54 grammatical generalisations about NPs and shown that a minimally modified generative grammar developed (largely) independently of the test corpus is capable of covering 96.88% of the sample considered. We can demonstrate concretely why this should be so by considering the distinct single-constituent NP types from the treebank data exemplified by DT* JJ N*, DT* JJ JJ N*, and so forth. In the ANLT grammar this potentially infinite set of types is analysed through the recursive application of four rules of the following broad type: NP -> DET N1, N1 -> AP N1, AP -> A, N1 -> N. Thus a potentially infinite set of NP types is reduced to 4 grammatical generalisations.</Paragraph> <Paragraph position="1"> We do not wish to claim that we have developed a 'watertight' perfect grammar of the English NP (although we do feel that the ANLT grammar has withstood this evaluation very well). There is still the 3.12% or 312 NPs that we are unable, at present, to analyse, and there is good reason to believe that &quot;all grammars leak&quot; slightly. However, there is little evidence in our results to suggest that a few rule-governed grammatical generafisations about naturally occurring NPs of English - 262 do not effectively demarcate grammatical examples; or to suggest that the enterprise of generative grammar is doomed because of the high proportion of rules required to deal with residual, particular cases. On the contrary, our analysis of the failures demonstrates that, for the most part, they are not parsed because of oversights in the ANLT grammar, rather than because they are deviant in syntactically mysterious ways.</Paragraph> <Paragraph position="2"> Sampson (1987a:226) concludes that the &quot;onus must surely be on those who believe in the possibility of NL analysis by means of comprehensive generative grammars to explain why they suppose that the shape of constituent type/token distribution curves will be markedly different from the shallow straight line suggested by our limited - but not insignificant database.&quot; However, Sampson's result is suggested by lds analysis of this data, not the data itself. In this paper, we have demonstrated that a more satisfactory analysis of essentially the same data-base leads to precisely the opposite conclusion.</Paragraph> <Paragraph position="3"> In other respects, the conclusions we should draw from this experiment are less positive. The development of wide-coverage grammars for robust parsing of unrestricted text will only be achieved through extensive evaluation using naturally occurring data. This, in turn, rests on the availability of suitably structured corpora from which the relevant data can be extracted automatically and on suitable software for semi-automatically testing rules against this data. The ANLT batch-mode parsing system proved completely inadequate to the latter task (largely because it was developed to check the grammar against a hand constructed set of short illustrative, deliberately unambiguous examples).</Paragraph> <Paragraph position="4"> Sampson (1987a) was able to perform a more sophisticated analysis of the treebank sample precisely because the original structuring of the data corresponded to his 'theory of grammar and grammatical analysis'.</Paragraph> <Paragraph position="5"> The problems we have had making use of his analysis to preliminarily classify the same data in order to evaluate the ANLT NP grammar highlight the impossibility of developing a corpus databank structured in some grammatically 'descriptive' or 'uncontroversial' fashion (pace Sampson, 1987b).</Paragraph> </Section> class="xml-element"></Paper>