File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-3126_metho.xml
Size: 21,170 bytes
Last Modified: 2025-10-06 14:13:00
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-3126"> <Title>A COMPUTATIONAL MODEL OF LANGUAGE DATA ORIENTED PARSING</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> A COMPUTATIONAL MODEL OF LANGUAGE DATA ORIENTED PARSING RENS BOlt* </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PERFORMANCE: Abstract </SectionTitle> <Paragraph position="0"> 1)ata Oriented Parsing (IX)P) is a model where no abstract rules, but language ext~riences in the ti3ru~ of all ,'malyzed COlpUS, constitute the basis for langnage processing. Analyzing a new input means that the system attempts to find tile most probable way to reconstruct the input out of frugments that alr&quot;c~y exist ill the corpus. Disambiguation occurs as a side-effect.</Paragraph> <Paragraph position="1"> DOP can be implemented by using colivelllional parsing strategies.</Paragraph> <Paragraph position="2"> In~oducfion This paper tommlizes the model for natural Imlgnage introduced m \[Sclm 199o\]. Since that article is written in Dutch, we will translate Some parts of it more or less literally in this introduction. According to Scba, the current tradition of language processing systems is based on linguistically motivated competence models of natural Imlguages. &quot;llte problems that these systems lull iato, suggest file necessity of a more perfommnce oriented model of language processing, that takes into account the statistical properties of real language use. qllerefore Scha proposes a system ritat makes use of an annotated corpus. AnMyzing a new input means that the system attempts to find the most probable way to reconstruct the input out of fragments that already exist in the corpus.</Paragraph> <Paragraph position="3"> The problems with competence grammars that are mentioned in Scha's aiticle, include the explosion of ambiguities, the fact tilat Itunmn judgemeats on grammaticality are not stable, that competence granunars do not account for language eh~alge, alld that no existing rule-based grammar gives a descriptively 'adequate characterization of an actual language. According to Scha, tile deveh,pment of a fornml gnatunar fur natural latlguage gets more difficult ,as tire grammar gets larger. When the number of phenotnena one has already takea into account gets larger, the number of iareractions that must be considered when ,me tries to introduce all account of a new pllenomenon grows accordingly.</Paragraph> <Paragraph position="4"> As to tile problem of ,'mtbiguity, it has turned out that as soon as a formal gratmnar clmracterizes a non-trivial part of a natural language, almost every input sentence of reasonable length gets ml re\]manageably large number of different structural analyses (and * The author wishes to thank his colleagues at the Department of Computational Linguistics of the Ilaiversity of Amsterdam for many fruitful discussions, and, in particular, Remko Scha, Martin van den Berg, Kwee Tjoe l,iong and Frodenk Somsen for valuable comments on earlier w~'rsions of this paper. semantical interpretations). I &quot;lids is problenmtic since most of these interpretations ~re not perceived as lVossible by a hunmn language user, while there are no systematic reasons 111 exclude tileln on syutactic or sematltic grounds. Often it is just a ntatter of relative implausibility: tile only reason why a certain iarerpmtarion of a sentence is not perceived, is that aanther interprctatilm is much more plausible.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Competence and Performance </SectionTitle> <Paragraph position="0"> 'tale lhnriations of the current language procossing systerus are not suprising: riley are the direct consequence of rile fact that these systems implement Chart\]sky's notion of a coutpetence grmnmar. The formal grilnuuars that constitute the subject-nmtter of theoretieal linguistics, aim at characterizing the clnnpetencc of tile langnage user. But the preferences language users have m the case of ambiguous sentences, are paradigm instances of perfonatmce phenomena.</Paragraph> <Paragraph position="1"> In order to build effective lauguage processing systems we nmst intplement performanec-grammars, rather than competence gratumars, qlaese performance granmuus shouM not only contain information on the structural possibilities of file general I~mgnage system, but also on details of actual language use in a language conmmnity, and of tile language experiences of an individual, which cause this individual to have certain expectations on what kinds of uUerances are going to occur, and what slractures and interpretations these utterances are going to have.</Paragraph> <Paragraph position="2"> Therc is all alternative linguistic tradition tluat has always focused on the concrete details of actual language use: file statistical tradition. In this approach, syntactic structure is usually ignored; only 'superficial' stalistical properties of a large coqms are described: file probability that a certain word is followed by a certain other word, the probability that a certain sequence of two words is followed by a ce~ml word, etc. (Markovcludns, see e.g. \[Bahl 1983\]). This approach bus perforumd succesfully ill certain practical tasks, such ,as selecting the most probable sentence from the outputs of a speech recognition coruptment. It will be clear that this approach is not suitable for mmly other tasks, because no uotion of syntactic structme is used. Aud there are statistical dependencies within the sentences of a corpus, that cam extend over all arbitrarily long sequence of words; this is ignored by file Markov-approach. The challenge is now to develop a theory of language processlag that does justice to tile statistieM ,as well as to tile structural aspects of langange.</Paragraph> <Paragraph position="3"> 1 In \[Martin 19791 it is reported that their t~ser generated 455 different lxuses for tile sentence &quot;lAst the sales of products produced in 1973 with the products produced in 1972&quot;.</Paragraph> <Paragraph position="4"> ACRES DE COLING-92, NANTES, 23-28 no(tr 1992 8 5 5 PROC. OV COLING-92. NAN rES, AUG. 23-28, 1992 The Synthesis of Syntax and Statistics The idea that a synthesis between syntactic and statistical approaches could be useful has incidentally been proposed before, but has not been worked out very well so far. The only technical elaboration of this idea that exists at the moment, the notion of a probabilisdc gtamnmr, is of a rather simplistic nature. A probabilistic grammar is simply a juxtaposition of the most fundamental syntactic notion and the most fundamental statistical notion: it is an &quot;old-fashioned&quot; context free grammar, that describes syntactic structures by means of a set of abstract rewrite rules that are now provided with probabilities that correspond to the applicationprobabilities of the rules (see e.g. \[Jeliuek 1990\]). As long as a probabilistic grammar only assigns probabilities to individual rewrite rules, the grammar cannot account for all statistical properties of a language corpus. It is, for instance, not possible to indicate how the probability of syntactic structures or lexical items depends on their syntacticflexical context.</Paragraph> <Paragraph position="5"> As a consequence of this, it is not possible to recognize frequent phrases and figures of speech as such - a disappointing property, for one would prefer that such phrases and figures of speech would get a high priority in the ranking of the possible syntactic analyses of a sentence. Some improvements can be made by applying the Markov-approach to rewrite rules, as is found in the work of \[Salomaa 1969\] and \[Magerman 1991\].</Paragraph> <Paragraph position="6"> Nevertheless, any approach which ties probabilities to rewrite rules will never be able to acconunodate all statistical dependencies. Optimal statistical estimations can only be achieved if tile statistics are applied to different kinds of units than rewrite rules. It is interesting to note that also in the field of theoretical linguistics tile necessity to use other kinds of structural units has been put forward. The clearest articulation of this idea is found in the work of \[Fillmore 1988\].</Paragraph> <Paragraph position="7"> From a linguistic point of view that emphasizes the syntactic complexities caused by idiomatic and semi-idiomatic expressions, Fillmore et al. arrive at the proposal to describe language not by means of a set of rewrite rules, but by meaus of a set of constructions. A construction is a tree-strncture: a fragment of a constituent-structure that can comprise more than one level. This tree is labeled with syntactic, semantic and pragnmtic categories and feature-values.</Paragraph> <Paragraph position="8"> Lexical items can be specified as part of a construction.</Paragraph> <Paragraph position="9"> Constructions can be idiomatic in nature: the meaning of a larger constituent can be specified without being constructed front the meanings of its sub-constituents.</Paragraph> <Paragraph position="10"> Fillmore's ideas still show the influence of the tradition of formal grammars: the constructions are schemata, and the combinatorics of putting the constructions together looks very much like a context free gramnmr. But the way in which Filhnore generalizes the notion of grmnmar resolves the problems we found in the current statistical grammars: if a constrnction-granunar is combined with statistical notions it is perhaps possible to represent all statistical information. This is one of the central ideas behind our approach.</Paragraph> <Paragraph position="11"> A New Approach: Data Oriented Parsing The starting-point of our approach is the idea indicated above, that when a human language user analyzes sentences, there is a strong preference for the recognition of sentences, constituents and patterns that occurred before in the experience of the language user. There is a statistical component in language processing that prefers more frequent structures and interpretations to less frequently perceived alternatives.</Paragraph> <Paragraph position="12"> The information we ideally would like to use in order to model the language performance uf a natural language user, comprises therefore an enumeration of all lexical items and syntactic/semantic structures ever experieaced by the language user, with their frequency of occurrence. In practice this means: a very large corpus of sentences with their syntactic analyses and semantic interpretatious. Every senteace comprises a large number of constructions: not only the whole sentence and all its constituents, but also the patterns that can be abstracted from the analyzed sentence by introducing 'free variables' for lexical elements or complex constituents.</Paragraph> <Paragraph position="13"> Parsing then does not happen by applying grammatical rules to rite input sentence, but by constructing an optinml analogy between the input sentence and as many corpus sentences ,as possible.</Paragraph> <Paragraph position="14"> Sometimes the system shall need to abstract away from most of the properties of the trees in the corpus, and sometimes a part of tile input is found literally in the corpus, and can be treated as one unit in the parsing process. Thus the system tries to combine constructions from the corpus so as to reconstruct the input sentence as 'well' as possible. ~llte preferred parse out of all parses of the input sentence is obtained by maximizing file conditional probability of a parse given the sentence.</Paragraph> <Paragraph position="15"> Finally, the preferred parse is added to the corpus, bringing it into a new 'state'.</Paragraph> <Paragraph position="16"> To illustrate the basic idea, consider the following extremely simple exmnple. Assume that the whole corpus consists of only the following two trees: In order to come to fomml definitions of p,'u'se and prefettedparse we tirst specify some basic notions.</Paragraph> <Paragraph position="17"> Labels We distinguish between file set of lexical l,lbels L and the ~t of non-lexical labels IV. Lexical labels represent words. Non-lexical l',fl~els represent syi~tactic and/or semantic mid/or i)ragnlalie infonnatiou, depending on file kind of corpns being used. We write J~ for l, ul~ SUing Given a set of hlbcls ~, a string is all u-tuple of elements of ~: (LI,...,L n) ~ ~u. All input string is ml nquple of elements of L: (l,t,._,Ln) ~ I, n. A Collckttellatio\[l ~ Gill big defined OI( sllil(gS US usual:</Paragraph> <Paragraph position="19"> Given a set of labels J~, the set of trees is defined as tile snmllest set Tree sucl~ that (1) ifI,~, then (l,,O)~Tree (2) if L6&quot;~, tl,..,,tneTi'ce., then (l,,(ll,...,tn))eT~ee For a set of trees 77cc over a ~t of labels ~, we define a function root. ~/i-ee-9~ mid a tuuction le;tves: ~l?ee~L n</Paragraph> <Paragraph position="21"> A corpus C is a multiset of trees, ill file ~nse that ally tree can occur zero, nile or more times. 'File lt~tves of every tree in a corpus is ml element of Ln: it consfimtes the string of wo(ds of which that tree is the amdysis that seemed most appropriate for understanding tile striug ill the context in which it was uttered.</Paragraph> <Paragraph position="22"> Construction8 Ill order to define the Constowtions of a tree, we need two additional notions: Subffees and l~tttems,</Paragraph> <Paragraph position="24"> We Slulll use tile lbllowing notation for a constnlction of a tree in a corpus: tee =tier ~nc()&quot; tc(.imstmctionsO0.</Paragraph> <Paragraph position="25"> Example: consider tree T. qhe trees T 1 and T 2 m~ conslnletions of T, while '\['3 is not.</Paragraph> <Paragraph position="27"> If t and u are trees, such Ilmt tile le\[tmost non-lcxic;ll leMof t is equal to the mot of n, then tou is the tree that results from substituting this leaf in t by tree u. The i)mtial function o:'l~eexTree-47ivc is called ~mlposJtion.</Paragraph> <Paragraph position="28"> We will write (toU)ov ;Ls touov, and ill general (..((tloQ)o(~)o..)otn as tl~t2o(~o...otn.</Paragraph> <Paragraph position="30"> such that 1' = tto...ot n. A tuple (fl,...,t n) of such constructions is said to generate par~ Tof s. Note that different tuples of constructions Gm generate the .,vante parse. The set of par~s of s will( respect to C, P,'use(s,C), is given by</Paragraph> <Paragraph position="32"> &quot;File set of tuples of C(nlstructions that generate a parse</Paragraph> <Paragraph position="34"> All input string can have several parses and every such parse can be generated by ~veral different c()mbinations ()f COllstruclious lrOlll tile corpus. What we are interested in, is, given an input string s, tile probability that arbiffury conlbinations of COllSIxuctions fro((I tile colpus generate a celtain prose 25 of s. Thus we are interested ill tile colldJtkmal prolXlbility of a pm'se 1)given s, with as probability space tile set of constructions of O'ees in the corpus.</Paragraph> <Paragraph position="35"> l,et '/~ be a parse of iupet string s, and supl~)se timt 15 can exhaustively be generated by k tuples of constructions: 1iqges(15,C) = ((tl l,..,thn), (t21,..,12n2) ..... (tkh..,tknt)}. Thell 7) occurs ill&quot; (tll,...,tlnl) or (t21,...,ten 2) or .... or (Ikl,,.,tknk) occur, aud (thl,...,tlmt) (~culs iff thl and th2 and ....</Paragraph> <Paragraph position="36"> ACrl!s ol.: COLING-92. NAN rEs. 23-28 AOt~:f 1992 8 5 7 l)mlc. OF COI,ING-92, NANTES, AUC/}. 23-28. I992 and t/mh Occur (hall,k\]). Thus the probability of Ti is given by</Paragraph> <Paragraph position="38"> In shortened form: P(Ti) = P(u (el tlxl) ) p=l q=l Tile events tpq are no__L mutually exclusive, since conslructions can overlap, and can include other constructions. The formula for tim joint probability of events E i is given by:</Paragraph> <Paragraph position="40"> Tile formula for the probability of combination of events</Paragraph> <Paragraph position="42"> We will use Bayes' decomposition formula to derive the conditional probability of &quot;1) given s. Let 7/~ and Tj be parses of s; the conditional probability of T i given s, is illen given by:</Paragraph> <Paragraph position="44"> A parse 1)of s with nmxinml conditional probability P(Tils) is called a preferred parse of s.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Implementation </SectionTitle> <Paragraph position="0"> Several different implementations of DOP are possible.</Paragraph> <Paragraph position="1"> In \[Scholtes 1992\] a neural net implementation of DOP is proposed, ltere we will show that conventional rule-based parsing strategies can be applied tn DOP, by converting constructions into rules. A construction Can be seen as a production rule, where the lefthand-side of the rule is constituted by the root of rite construction and the righthand-side is constituted by the leaves of the construction. The only exmt condition is that of every such rule its corresponding construction should be remembered in order to generate a parse-tree for the input string (by composing the constructions that correspond to the rules ilmt are applied). For a construction t, the corresponding production rule is given by root(t) ~ leaves(O In order to calculate the pteterredparse of an input string by maximizing the conditional probability, all parses with all possible tuples of constructions must be generated, which becomes highly inefficient. Often we are not interested in all parses of an alnbiguous input string, neither in their exact probabilities, but only in which parse is the preferred parse. Thus we would like to have a strategy fllat estimates file top of the probability hierarchy of parses. &quot;llais call be achieved by using Monte Carlo techniques (see e.g. \[Hammersley 1964\]): we estimate the preferred parse by taking random samples frotn the space of possibilities. This will give us a more effective approach dian exhaustively calculating the probabilities.</Paragraph> <Paragraph position="2"> Although DOP has not yet been tested thoroughly 2, we call already predict sonic of its capabilities. In DOP, the probability of a parse depends on all tuples of coustructious that generate that parse. ~lhe more different ways in which a parse can be generated, the lligher its probability. This implies that a parse which can (also) be generated by relatively large constructions is favoured over a parse which can only be generated by relatively small constructions. This means that prepositiotml plu'ase attxichments arid figures of speech can be processed adequately by I)OP.</Paragraph> <Paragraph position="3"> As 1o the problem of hmguage acquisition, this ntight seem problematic for DOP: with all &quot;already analyzed corpus, only adult language behaviour can be simulated. The problem of language acquisition is itt our perspective the problem of the acquisition of an initial corpus, in which non-linguistic input and pragmatics should play na important role.</Paragraph> <Paragraph position="4"> An additional remark should be devoted here to formal granlmars and disambiguation. Much work has been done to extend rule-based granunars with selectional restrictions such that the explosion of ambiguities is constrabled considerably, llowever, to represent semantic and pragmatic constraints is a very expensive task. No one has ever succeeded in doing so except in relatively small grammars. Furthermore, a basic question renmins as to whether it is possible to formally etlcode all of die syntactic, semantic alld pragmatic infomlation needed for disambiguation. In DOP, the additional infornmtion that one can draw from a corpus of hand-marked structural annotations is that one can by-pass the necessity for modelling world knowledge, since this will autonmtically enter into the disarnbiguation of structures by Imnd. Extracting constructions from these structures, and combining them in the most probable way, taking into account all possible statistical dependencies between them, preserves this world knowledge in the best possible way.</Paragraph> <Paragraph position="5"> In conclusion, it may be interesting to note that our idea of using past lallguage experiences instead of rules, has much in cormnon with Stich's ideas about language (\[Stich 1971\]). lu Stich's view, judgements of gralnmaticality are not determined by applying a precompiled set of gratmuar rules, but rather have the character of a perceptual judgement on the question to what extent rite judged sentonce 'lotiks like' the sentences the language user has in his head as examples of granlmaticality. The cot)crete language experiences of file past of a language user determine how a new utterance is processed; there is no evidence for file assumption that past language experiences are generalized into a consistent theory that defines the</Paragraph> </Section> </Section> class="xml-element"></Paper>