File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1039_metho.xml
Size: 30,725 bytes
Last Modified: 2025-10-06 14:12:53
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1039"> <Title>SYNTACTIC PREFERENCES FOR ROBUST I)ARSING WIT\[t SEMANTIC PREFERENCES</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Robust parsing faces what seems to be a &quot;paradox&quot;. On one hand, it needs to tolerate input sentences which more or less break syntactic and semantic constraints; that is, given m~ ill-formed sentence, the parser should attempt to interpret it somehow rather thml simply reject it because it violates some constraints. In order to do this it allows syntactic and semantic constralnts to be relaxed. On the other hand, robust parsing still need to be able to disambiguate. For the sake of disambiguation, stronger syntactic and ,semantic coilstraints are required, since the stronger these constraints are, the greater the number of unlikely readings that can be rejected, and the more likely the correct reading is selected.</Paragraph> <Paragraph position="1"> A lot of work related to this problem has been done. The most commonly used snategy is to use the strongest constraints first to parse the input. If the parser fails, the violated constraints are relaxed in order to recover the failed process m~d arrive at a successful parse (CarboneU & Hayes, 1983) (Huang, 1988) (Kwasny & Sondheimer, 1981) (Weischedel & Sondheimer, 1983). The major problem with this approach is that it is difficult to tell which constraint is actually being violated and, therefore, needs to be relaxed when the parser fails. Tiffs l)roblem is more serious with the parser using backtracking.</Paragraph> <Paragraph position="2"> Another strategy is based on Preference Semantics (Wilks, 1975, 1978) (Fass & Wilks, 1983).</Paragraph> <Paragraph position="3"> The idea is that all the possible sentence readings can (though not necessarily) be produced.</Paragraph> <Paragraph position="4"> All the &quot;readings are scored according to how mm~y preference satisfactions they contmn&quot;, and &quot;the best reading (that is, the one with the most preference satisfactions) is taken, even if it contains preference violations.&quot; Selfridge (1986) and Slator (1988) 'also use this strategy. One important advantage of this approach is that within a uniform mechanism, the semantic constraints can both be used maximally for the sake of disambiguation m~d be gracefully relaxed when nece,ssmy for the sake of robusmess. However, how to extend the preference philosophy to syntactic constraints have not been addressed.</Paragraph> <Paragraph position="5"> There are two li-equently used approaches to incorporate syntactic constraints in systems using semantic preferences. The first ,~s in (Slao tot, 1988), is to use a weak version of a rather typical syntactic module. The problem with this approach is that the syntactic constraints here still suffer from the problem of &quot;robust parsing paradox&quot;. Another problem with this approach is that it shifts more bin-dens of disambiguation to semm~tic preferences because the syntactic constralnts need to be weak enough in order not to reject too many noisy inputs. The second approach, as in (Selfridge, 1986), is to try all the possible combinations without any structural preference. Tfie problein hexe is computational complexity especially with long sentences which have conlplex or recnrsive slJllctares in them.</Paragraph> <Paragraph position="6"> Cormectionism has also shown some appealing results (Dyer. 1991) (Waltz & Pollack. 1985) (l,ehnelt, 1991) on robusmess. However, there are some very difficult problems which they pose, such as the difficulty with complex structures (especially recursive structures), the difficulty with variables, variable fiindings and ratification, and the difficulty with schemes and their instantiations. These problems need to be solved before such an approach can be used. in practical natural language processing systems ACRES DE COLING-92, NA~rt~s, 23-28 AO~r 1992 2 3 9 Pl~oc. ot: COLING-92, NAbrrEs, AuG. 23-28. 1992 especially those which involve syntactic processing (Dyer, 1991).</Paragraph> <Paragraph position="7"> In this paper we will propose a framework of representing syntactic preferences 1 that will keep all the virtues of robustness Preference Semantics suggests. Furthermore, such preferences can be learned.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Sketch of Our Approach </SectionTitle> <Paragraph position="0"> As we seen the idea of Preference Semantics about robustness is to enable the system to 1) generate (potentially) all the possible readings for all the possible inputs, 2) find out among all the possible readings which one is the most appropriate one and generate it first. To meet this goal for syntactic constraints we want to accomplish the following: 1. A formalism with a practically feasible amount of symbolic mampulatable rules which will have enough generative power to accept any input and produce all the possible structures of it (section 3.1).</Paragraph> <Paragraph position="1"> 2. A scheme to associate weights with each of these rules such that the weights of all the rules that are used to produce a structure will reflect how preferable (syntactically speaking) the whole structure is (section 2.1).</Paragraph> <Paragraph position="2"> 3. A algorithm that will incorporate these rules and weights with semantic preferences so that the overall best reading will be generated first. (section 2.2).</Paragraph> <Paragraph position="3"> 4. A method to train the weights (section 2.1 and 3.3).</Paragraph> <Paragraph position="4"> 2.1. Coding Syntactic Constraints The most popular method to encode syntactic constraints is probably phrase structure grammar. One way to make it robust is to assign for all the grammar rules a cost. So, a rule designed for well formed inputs will have less cost than the rules designed for less well formed inputs; that is to say that a higher cost grammar rule will be less preferred than a lower cost grammar rule. However there are two serious problems with this naive method. First, if all the ill-formedness will have a set of special grammar rules designed for them, to capture a reasonable I We u~ syntactic preferences in a broader sense which not only includes syntactic feature agreements but also the order of the constituents in the sentence and the way in which different constituents are combined. On the other hand, is yOU will ram, the absence of nonterminals and syntactic derivation tree~ will also di.staoce us from a strong syntax theory. variety of ill-fornledness, we need an unreasonably amount of grammar rules. Second, it is not clear how we can find the costs for all these rules. Next we will show how we will solve these two problems.</Paragraph> <Paragraph position="5"> To solve the first problem, our approach abandons the use of phrase structure grammar, and instead a new grammar formalism is used. The idea of this new formalism is to reduce syntactic constructions to a relative small set of more primitive rides (P-rules), so that each syntactic construction can be produced by the application of a particular sequence of these P-rules. Section 3.1 will give more details about this formalism, and we win just list some of its relevant characteristics here: 1. There are no nonterminals in this formalism. 2. Each P-ntle take only three parameters to specify a pattern on which its action will be fired. All these three parameters will be one of the parts of speech (or word types), such as noun, verb, adverb, etc. For example: (noun, verb, noun) ~> action3 is a P-rule. 3. Each rule has one action. There are only three possible actions.</Paragraph> <Paragraph position="6"> Since the number of parts of speech is generally limited 2, the total rule space (the number of all the possible P-rules) is rather small 3. So, it is possible for us to keep all of the possible P-rules in the grammar rule base of the parser. The output of the parser is a parsing tree in which each node corresponds to a word in the input sentence and the edge represents the dependency or the attachment relation between the words in the sentence. One important property of this formalism is that given any input sentence, for all of the normal parsing trees that can be built on it (the definition of normal parsing tree is given in next section), there exists a sequence of P-rules, such that the tree will be derived by applying this rule sequence. This means that we can now use this small P-rule set to guide the parsing 4 without pre-excluding any kinds of ill-formed inputs for which a normal parsing tree does exist.</Paragraph> <Paragraph position="7"> 2 For example in Longnmn Dictionary of Contemporary English (1978) there are only 14 different parts of speech. Given that interjection can be taken care by Jt preprocessor and the complex parts like v adv, v adv prep, v adv;prep, v prep can be represented as v plus anme syntactic feature.s, this number can be further mduce, d to 9.</Paragraph> <Paragraph position="8"> lot of searching paths sanctioned by p-rules will be pruned by semantic prefurences. In this sense the AcrEs DE COLING-92, NANTES, 23-28 AOt3X 1992 2 4 0 PRec, OF COLING-92, NANTES, AUG. 23-28, 1992 To solve the second problem, we will use some techniques bon'owed from the connectionist cmnp. As we have mentioned before each rule takes three parameters to specify a pattern.</Paragraph> <Paragraph position="9"> Each of these parameters will associate with a vector of syntactic feature 5 roles. So, every P-rule will have a feature rote vector: (FFlt ...... Fln, F21 ...... F2n, F31 ......</Paragraph> <Paragraph position="10"> Each feature role is in turn associated with a weight. So each P-rule also has a vector of weights:</Paragraph> <Paragraph position="12"> If a rule is applicable at a certain situation, the feature roles will be filled by the corresponding feature values given by the language objects matched out by the pattern of the rule. The value of each feature can be either 1 or 0, Let F' i . denotes the value filling role F i * at one partic~ar application. The cost of al~plying that rule at that situation will be: -Y. Wij * F'ij (*) The weights can be trained by a training algorithm similar to that used in the single layer perceptron network.</Paragraph> <Paragraph position="13"> To summarize, the syntactic constraints are encoded partly as P-rules and partly as the weights associated with each p-ride. The P-rnles are used to guide the parsing and tell which constituent should combined with which co~tstituent. There are two types of syntactic preferences encoded in the weights: the preferences of the language to different P-rules and the preferences of each P-rule to different syntactic features. &quot;l~e P-rule is still symbolic and working on a stack. So, the ability of recursion is kept.</Paragraph> <Paragraph position="14"> 2.2. Organization of the Parser At the topmost level, the parser is a standard searching algorithm. Due to its efficiency, A* algorithm (Nilsson, 1980) is used. However, it is worth noting that any other standard searching algorithm can also be used. For any searching parsing is actually guided by both syntactic preferenccs and semantic preferences.</Paragraph> <Paragraph position="15"> 5 The use of term &quot;syntactic feature&quot; is only to make it distinct from semantic preference and semantic type . We, by no means, try to exclude th~ use of those features which are generally considered as somantle feantres but which will help to make the right syntactic choice.</Paragraph> <Paragraph position="16"> algorithm, the following things need to be specilied: 1. what the states hi the searching space are: 2. what the initial state is; 3. what the action roles are aed taking one state how they create new states; 4. what the cost tff creating a new node is (please note that the cost is actually charged to the edge of the searching graph); 5.</Paragraph> <Paragraph position="17"> what the final states are.</Paragraph> <Paragraph position="18"> In the pmser, tile searching states are pairs like: (<partial resutt>,<unparsed sentence>) null The pml:ial re.suit is represented a.s one or more parsing trees which arc kept on a working stack.</Paragraph> <Paragraph position="19"> Details of this representation are given in next section. The initial state is a pair of an empty stack and a sentence to be parsed. The action of the searching process is to look at the next read-in word and the roots of the top two taees on the wolking stack, which represent the two most recent /ound active constituents. The searching will then search for all the applicable P-rules based on what it sees. All P-rules it has found will then be fired. The action part of these P-rules will decide bow to manipulate the trees on the stack and the current read-in. At will also decide whether it needs to read in next word.</Paragraph> <Paragraph position="20"> Therefore, for each of these P-rules being fired, a new state is created. The cost of creating this new state is following: the cost of applying the P-rule on its father state; tile degree of violation of any new local, preference just being found in the newly built trees; the cost of contextual preference 6 associated with file new consumed read in sense. All these costs are tmrmalized and added to yield the total cost of creating this new state. The reason for normalization is that, for example, the cost of applying P-rule (see section 2.1) needs to be norntalized to be positive as it is required by A* algorithm. Tile relative magnitudes of different types of costs need also be balanced, so that one kind of cosls will not overwhelm the others. The final states of the searching are the states where the unparsed sentence is nil and the working stack has only one complete parsing tree on it. Obviously the output of this searching algorithm is the reading (represented as the n'ee) of dte input sentence which violates the least amount of Sylltactic and semantic preferences.</Paragraph> <Paragraph position="21"> So far the heuristics used in the A* searching is simply taken to be: 6 For the idea of contextual prefe.rence~ see (Slator, 1988).</Paragraph> <Paragraph position="22"> AcrEs DE COLING-92, NANTES, 23-28 AOl'Yr 1992 2 4 1 t'l~nc. Ov COIANG-92, NANTES, AUG. 23-28, 1992 tX/C where ct is a constant. I is the length of the unparsed sentence, c is the average cost for parsing each word.</Paragraph> <Paragraph position="23"> It is needed to mention that the input sentence we talked above is actually a list of lexical frames. Each lexical flame contains the following information about the word it stands for: the part of speech, syntactic features, local preferences, contextual preferences, etc. Since words can have more than one senses and one lexical frame will only represent one sense, for each read-in word, the parser will have to work for each of its senses and produce all of their children. null 3. Some Details 3.1. P-rules, Parsing Trees, and P-parsers.</Paragraph> <Paragraph position="24"> Given an input string a on a vocabulary E 7, a parsing tree of the input is a trees with all its nodes corresponding to one of the words in the input string. For example, one of the parsing trees for input abcde is:</Paragraph> <Paragraph position="26"> Here the head of each list is the root of the tree or the subtree, and the tail of the list are its children. Intuitively the parenthood relation reflects the attachment or dependency relation in the input expression. For example, given an input expression a small dog, the dog will be the father of both a and small. The parsing tree is more formally defined as follows: Definition (Parsing Tree): Given an input string a, the parsing tree on a is recursively defined as following: I. (a) is a parsing tree on a, if a is in 2. (a T 1 T 2 ...... Ti) is a parsing tree on a, if a is in a and T k (l~..<i) is also a parsing tree on a.</Paragraph> <Paragraph position="27"> Definition (Complete Parsing Tree): Suppose T is a parsing tree on a. If for all the a in a, a is also in T, then T is a complete parsing tree on To reduce the computational complexity, we will limit our attentions to only one special type of parsing trees, namely the normal parsing tree. 7 in tho actual pars~ I~ is tho ~t of all tho parts of Ipeech.</Paragraph> <Paragraph position="28"> S The order of children for any node is significant hera, and wo will usa LIsP convention to rapre.lasnt the par~ing ~aC/ in this paper.</Paragraph> <Paragraph position="29"> It should not be difficult to see and prove from the following definition that normal parsing trees are simply the trees which do not have &quot;crossing dependencies&quot; (Maling & Zaenen, 1982).</Paragraph> <Paragraph position="30"> Definition: (Normal Parsing Tree): Suppose T is a parsing tree on a. T is a normal parsing tree on a iff for all the nodes in T, if they have children, say T.. T, then all the nodes in T 1'&quot; &quot;&quot; ' . . I must be appeared ~fore all nodes m Tjm the input stnng a, where l<i<j~,k.</Paragraph> <Paragraph position="31"> The P-parser has an input tape, and the reading head is always moving forward. The P-parser also keeps a stack of parsing trees which will represent the parsing result of the part of the input which has already been read. Besides there is a working register in the P-parser to store the current read in word. As you will see later, the read-in word is actually transformed into a U'ee before it being stored in the working register. The configurauon of the P-parser is thus defined as a triple \[<the stack>,<content of working register>,<unparsed input>\]. The P-rule is used to specify how the P-parser works. If the mput is of a vocabulary E, the P-rules on it can be defined as follows: Definition (P-rule): A P-rule on E is a member of set E' xE' xE' xA, where E' isE to \[nil} and A is the set of actions defined below.</Paragraph> <Paragraph position="32"> Definition (Action): Actions are defined as functions of type E ~ E, where E is the set of configurations. There are total three different Action 1 simply pushes the current read-in on the stack and then reads the next word from the input. Action 2 attaches top of the stack as the first child of the current read-in stored in the working register. Action 3 pops the top of the stack first and then attaches it to the second top of the stack as its last child.</Paragraph> <Paragraph position="33"> The initial configuration for the P-parser is \[nil, (list (car a)), (cdr ct)\], where the a is the input string to be parsed. There is a set of P-rules in each P-parser to specify its behavior. The P-parser will work non-deterministically. A P-rule can be fired iff its three parameters march the roots of the top two parsing trees on the stack and the root of the tree in the working register ACRES DE COLING-92, NANTES, 23-28 Aotrr 1992 2 4 2 PROC. O1: COLING-92, NANTES, AUG. 23-28, 1992 respectively. Note the P-rule does not care about the unparsed input part in the configuration triple. A cmtfiguratiml is a final configuration iff the unparsed input string and the working register are all empty mid the stack only has one parsing tree on it. A input is grammatical if and only if the P-parser can reach from the initial configuration to a final configuration by applying a sequence of P-rules taken from its grammar set. If there are more than one possible final states for a given input sWing and the parsing txees produced at these states are different, then we say that the input swing is ambiguous. Here is a simple example to show how the P-parser works. You may need a pen and a piece of paper to work it out. Given input abcde (a good friend of mine, for example), the parsing tree (c Most properties of this formalism will not be of the interests of this paper, except this theorem: Theorem: A P-parser with all the P-rules on E as its grammar set cm~ produce all the complete normal parsing trees for any input sWing on Z.</Paragraph> <Paragraph position="34"> PROOF: It is easy to prove this by induction on the length of the input string, r2 The theorem tells that a P-parser with all the possible P-rules as its rule set can produce for any input string all of its possible parsing trees which have no crossing dependencies. This alone may not be very useful since this P-parser will also accept all the strings over E. However, as we have shown in the last section, with a proper weighting scheme, this P-parser offers a suitable framework for coding syntactic preferences. null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2. Syntactic Preferences in the Weights of P-rules </SectionTitle> <Paragraph position="0"> Each lexical item will have a vector (of a fixed length) containing syntactic features. The value of the each feature is either 1 or 0. Each of the three parameters in the P-rule is associated with a vector (of the same length) of syntactic feature roles. Each of these syntactic feature roles is associated with a weighL Each weight thus encodes a preference of the P-rule toward the particular syntactic feature it corresponds to. A higher weight means the P-rule will be more sensitive to the appearance of this syntactic feature, and the result of applying this rule therefore will be more competitive when it does appeaa'. The preferences of the language to dilC ferent P-rules arc 'also ~ellected in weight,s.</Paragraph> <Paragraph position="1"> Instead of being reflected in each individual weight, they are reflected in the distribution of weights in P-rules. A P-rule with a higher average weight is generally more favored than a P-ntle with a lower average weight, since the higher tile average weight is, tile lower the cost of applying the P-rule tends to be. It is also necessary to emphasize that these two types of preferences are closely integrated.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3. Weight Learning </SectionTitle> <Paragraph position="0"> The weights of ,all the P-rules will be trained.</Paragraph> <Paragraph position="1"> The training is supervised. Tllere are two different methods to train the weights, The first method takes an input string and a correct pars~ ing tree for that stnng. We will use an algoritlmt to computer a sequence of P-rules that will produce the given parsing tree from the given input string. &quot;\[qfis algorithm is not difficult to design, and due to the space limits we will not present it here. After the P-rule sequence is produced, each of P-rule in the sexluence will get a bonus 5. The bonus will farther be distributed to the weights in the P-rule in the same fashion as that in the single layer perceptron network, that is: AWij = ~llSF'ij where r I is the learning factor.</Paragraph> <Paragraph position="2"> The second method requires less interventions.</Paragraph> <Paragraph position="3"> The parser is let to work on its own. 'llae user tufty need to tell the parser whether the output it gives is correct. If the result is correct, all the P-rules used to construct this outpnt will get a bonus and all the rules used during the parsing which did not contribute to tim output will get a punishinent. If the answer is wrong, all the Po rules used to construct this artswer will get a punishment, and the parser will continue to search for a second best guess. The Ixmus and punishment will be diswibuted to the weights of the P-rule in the san~e maimer as that in the first method.</Paragraph> <Paragraph position="4"> The first method is more appropriate at the early stage of the training when most of the rules have about the stone weights. It will take much longer for the parser to find the correct result on its own at this time. On the other hand, the second method needs less interventions. So, it will be more appropriate to be used whenever it can work reasonably well.</Paragraph> <Paragraph position="5"> It is well known that the single layea&quot; perception net can not be trained to deal with exclusive-or.</Paragraph> <Paragraph position="6"> The same situation will also happea here. Since exclusive-or relations do exist between syntactic AcrEs DE COLING-92, NAm'.ES, 23-28 AO~&quot; 1992 2 4 3 I'ROC. OF COLING-92, NANTES, AU(L 23-28, 1992 features, we need to solve this problem. The simplest solution is to make multiple copies for the P-role, and, hopefully, each copy will converge to each side of the exclusive-or relation.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4. Measuring Preference Violations </SectionTitle> <Paragraph position="0"> The syntactic preference violation is measured by formula (*) in section 2. Both action 2 and action 3 of P-rule actions make an aaachrnent, and some semantic preferences may be either violated or satisfied by the attachment. So, after each application of such p-ride actions, the parser needs to check whether there are some new preferences being violated or satisfied. If there are, it will computer the cost of these violations and report it to the searching process.</Paragraph> <Paragraph position="1"> Similarly, eacli action 1 will cause a new contextual preference being reported. The measurement for the violation degree of both local preferences and contextual preferences is basically taken from PREMO (Slator, 1988), which we will not repeat here.</Paragraph> <Paragraph position="2"> 4. Conclusion and Comparisons Preference Semantics offers an excellent frame null work for robust parsing. However, how to make full use of syntactic constraints has not been addressed. Using weights to code syntactic constraints on a relatively small set of P-rules (from which all the possible syntactic structures can be derived) enables us to expand the philosophy of Preference Semantics from semantic constraints to syntactic constraints. The weighting system not only reflects the preferences of the language, say English, to different P-rnles but also reflects the preferences of each P-rule to each syntactic feature. Besides of this, it also offers a nice interface so that we can integrate the application of these syntactic preferences nicely with the application of both local semantic preferences and contextual preferences by using a highly efficient searching algorithm.</Paragraph> <Paragraph position="3"> This project has also shown that some of the techniques commonly associated with the connectionism, such as coding information as weights, training the weights, and so on, can also be used to benefit symbolic computing. The result is gaining the robustness and adaptability while not losing the advantages of symbolic computing such as recursion, variable binding, etc.</Paragraph> <Paragraph position="4"> The notion 'syntactic preference' has been used in (Pereira, 1985) (Frazier & Fodor, 1978) (Kimball. 1973) to describe the preference between Right Association and Minimal Attachment. Our approach shares some similarities with (Pereira, 1985), in that MA and RA simply &quot;corresponds to two precise roles on how to choose between alternative parsing actions&quot; at certain parsing configurations. However, he did not offer a framework for how one of them will be preferred. According to our model the preference between them will be based on the weights associated with the two rules, the syntactic features of the words involved and the semantic preferences found between these words. Besides, the idea of syntactic preferences in this paper is more general than the one used in their work, since it includes not only the preference between MA or RA but other kinds of syntactic preferences as well.</Paragraph> <Paragraph position="5"> Wilks, Huang and Fass (1985) showed that prepositmnal phrase attachments are possible with only semantic information. In their approach syntactical preferences are limited to the order of matchings and the default attaching. Their atxaching algorithm can be seen as a special case of the model we proposed here, in that, if they are correct, the preferences between the sequences of rules used for RA and MA would turn out to be very similar so that the semantic preferences involved would generally over shadow their effects.</Paragraph> <Paragraph position="6"> There are some significant differences between our approach and some hybrid systems (Kwasny & FaJsal, 1989) (Simmons & Yu, 1990). First, our approach is not a hybrid approach. Everything in our approach is still symbolic computing. Second, in our approach the costs of the applications of syntactic constraints are passed to a global search process. The search process will consider these costs along with the costs given by other types of constraints and make a decision globally. In a hybrid system, the syntactic decision is made by the network locally, and there is no intervention from the semantic processing. Third, our parser is non-deterministic while the parsers in hybrid systems are deterministic since there is no easy way to do backtracking. It is also easy to see that without the intervention of semantic processing, lacking the ability of backtracking is hardly an acceptable s~ategy for a practical natural language parser. Finally, in our approach each P-rule has its own set of weights, while in the hybrid systems all the grammar rules share a common network, and it is quite likely that this net will be overloaded with information when a reasonably large grammar is used.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> ACTES DE COIANG-92, NANTES, 23-28 ^Ot~T 1992 2 4 4 PROC. OF COLING-92, NANTES, Aun. 23-28, 1992 Acknowledgment </SectionTitle> <Paragraph position="0"> All the experiments for this project are carried on the platform given by PREMO which was designed and implemented by Brian Slator when he was here in CRL. The author &quot;also wants to thank Dr Yorick WiNs, Dr David Farwell, Dr John Barnden and one referee for their comments. Of course, the author is solely responsi~ ble for all the mistakes in the paper.</Paragraph> <Paragraph position="1"> Actas DE COLING-92, NANTES, 23-28 AoOr 1992 2 4 5 I'R(){:. OF COLING.92, NAN-rES, AUU. 23-28, 1992</Paragraph> </Section> class="xml-element"></Paper>