File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/69/c69-4001_concl.xml
Size: 10,105 bytes
Last Modified: 2025-10-06 13:55:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C69-4001"> <Title>A PROGRESS REPORT ON THE USE OF ENGLISH IN INFORMATION RETRIEVAL</Title> <Section position="3" start_page="0" end_page="0" type="concl"> <SectionTitle> 3. Rosenbaum and Lochak (1966). </SectionTitle> <Paragraph position="0"> grammar, see Rosenbaum (1968).</Paragraph> <Paragraph position="1"> I For the latest version of this about. However, this approach is also unsatisfactory for practical reasons, even if an easy way to build such an interactive system were known. Under a time-sharing environment, which is the only practical environment for on-line systems of this kind, every interruption and interaction will cost time, and the total effect will make the system so slow and cumbersome to make it impractical.</Paragraph> <Paragraph position="2"> In tills paper, we will propose some additional devices for the automatic resolution of ambiguities. These devices are now being studied and implemented at the IBM Boston Programming Center. Ideally, one should not have to arbitrarily restrict the types of sentences Which the user of the system may input to the grammar, i.e., the grammar should be able to parse any sentence of any length. Implementation of this ideal goal is, however, presently untenable. We will outline here our efforts to approach this goal to the extent which is possible under the present state of the art.</Paragraph> <Paragraph position="3"> The grammar of Proto-RELADES was a standard recognition grammar with separate phrase structure and transformational components; that is, phrase structure rules would apply to the input sentence and produce a surfacestructure. The latter would then be the input to the transformational component andthe output of this component would be the deep structure of the sentence. Our new experimental grammar combines these two components into one integrated system of rules.</Paragraph> <Paragraph position="4"> To understand the implication of this, we must look at the form and -3-nature of the rules in this gramnar.. Each rule in this grammar has the following format: (1) Li: A'BC ~ D'E ~ F $X$ @Y@ *** IL n This rule has a label L i and a GOTO instruction L n. The function of the rule can be paraphrased as follows: Check to see that the elements ABC are to the left of the pointer I&quot; in the input sentence and that the elements D and E are to the right of it (there is no upper limit to the number of the elements to the left and right of the pointer; there must be at least one element to left of the horizontal arrow ~ .) If this is the case, then if condition X is satisfied, perform action Y and create a node F to dominate over the s)nubols between the two dots (') on the left of the arrow (X and Y can be null). Next, move the pointer to the right according to the number of the stars (*) at the tail end of the rule and go to the rule labeled L n. If this rule does not apply, the control will pass on to the next rule in the sequence, i.e., to Li+ I.</Paragraph> <Paragraph position="5"> We see at once that this rule format permits one to write context sensitive rules constrained by some conditioning factors and also build local transformations in the Y part of the rule. The traffic in the rule application is controlled by the GOTO label L n. Underlying this system of rules is the &quot;reductions analysis&quot; (RA) recognizer which reads the rules and applies them to the input sentence resulting in a tree structure (P-marker) representing the deep structure of the sentence. -4-The RA in our system is an extension of the model proposed by Cheatham (1968). Culicover (1969) and Lewis (1969) have written and implemented a grammar which uses these rules with exclusively local transformations. The net result of this grammar is that a canonical deep structure is produced for the input sentence without the generation of the intermediate surface structure. In terms of computer efficiency and speed, this is a significant step. The theoretical significance of such a recognition grammar has yet to be studied.</Paragraph> <Paragraph position="6"> The ambiguities can be resolved by the following interactions, all of which are automatic internal and, therefore, fast interactions, except the last one. In a fully-generalized system, all these interactions must be implemented in a manner that they will tradeoff against each other for reducing the complexity and increasing the speed.</Paragraph> <Paragraph position="7"> The final interaction on list (2), i.e., human interaction, which is the last resort in this system can be omitted or its use greatly restricted in many practical situations. The interactions are with: (2) (i) the lexicon (ii) the date base (iii) the system (iv) the human user Lexical entries have a certain number of features which play a role in the structural analysis of the input sentence. This is based on the already well-known proposal of Chomsky (1965) for syntactic features. A simple example of a semantic feature of a sort is given -5below: null (3) John wrote the book on the shelf.</Paragraph> <Paragraph position="8"> If the word shelf in the lexicon has a feature or features denoting that it is a place for storing books, etc., but normally people do not write on it or reside on it, then in the process of the analysis of (3) the prepositional phrase on the shelf will be recognized as modifying the noun boo___k_kand not the verb write or the proper noun John. The trouble with this solution is obvious: there will be too many simple and complex features for each entry in the dictionary 4, and we run into severe problems for practical applications. This is why we want to reduce the reliance on the dictionary features to the minimum and tradeoff as far as possible with the other interactions listed under (2) above.</Paragraph> <Paragraph position="9"> Interaction with the data base will provide the discourse background and may turn out to be the most significant and practical means for resolving ambiguities. For our system, this category of interaction includes looking up in micro-glossaries; that is, specialized glossaries containing the jargon of each narrow field of application. Again, a highly simplified example of interaction with the data base is the following. Suppose that the input sentence was (4) Do you have any books on paintings by Smith? Somewhere in the process of the derivation of the underlying structure 4. For a fractional grammar of English with partial features specified, see Rosenbaum (1968).</Paragraph> <Paragraph position="10"> -6! null (and the interpretation) of the sentence in (4) it becomes necessary to decide whether the phrase by Smith modifies books or ~aintings, that is whether the question is about books by Smith or about paintings by Smith. At this point, the system can look into the data base and see, for example, whether Smith occurs under the column for authors or for painters and resolve the ambiguity accordingly.</Paragraph> <Paragraph position="11"> Interaction with the system is similar to the interaction with the data base except that here we question the capabilities of the underlying system in order to resolve the!ambiguity. Consider the following example: (5) Do you have any documents on computers? The ambiguity in (5) is, among others, in whether we want documents written about computers or we are referring to piles of documents on the top of computers. Now the underlying system which analyzes and interprets (5) and produces the answer to the question has certain capabilities; for example, it has computer routines for searching lists of titles, authors, etc., printing data, and whatever else there is. However, if the system does not have a facility for &quot;looking&quot; on the top of the computers in search of documents, we can reject that interpretation and adopt one which concerns documents containing information about computers.</Paragraph> <Paragraph position="12"> The human interaction becomes necessary only when none of the above devices resolve the ambiguity; for example, in the case of the -7data base sample in sentence (4) above when the data base has the name Smith under both the author and painter columns. In this case, the system should formulate some sort of simple question to ask the human user before the final interpretation is effected; for example: &quot;Do you mean books by Smith or paintings by Smith or both?&quot; But, as I mentioned above, we have found in practice that, within a specified discourse and with a properly organized lexicon and data base, the need for taking this last resort seldom arises; and that is why systems such as Proto-RELADES and Woods (1967) can have significant practical claims.</Paragraph> <Paragraph position="13"> In summary, we visualize a restricted but completely practical natural language system for communication with a computer and information retrieval with a general lexicon and specialized micro-glossaries. Certain restrictions in the lexicon and in the micro-glossaries will prevent wild generation of all possible and obscure (or unlikely) analyses but will permit generation of all the reasonable analyses for each input sentence. Interactions with the lexicon, the data base (i.e., the subject of the discourse) and system will further eliminate the various analyses for eacll sentence until one analysis is left. In such cases when the system is unable to reduce the query to one analysis, the human user is asked to help in clarifying the ambiguity. -8-I would like to close this paper, however, with a word of caution. No linguist and no serious conq~utational linguist will claim that he knows how to build a system such as outlined above for a completely unrestricted processing of a natural language. The stress throughout this paper has been on practicality. We visualize a restricted natural language system of the sort which is fully practical and useful for many applications in information sciences.</Paragraph> <Paragraph position="14"> -9-</Paragraph> </Section> class="xml-element"></Paper>