File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/87/j87-3002_concl.xml
Size: 3,749 bytes
Last Modified: 2025-10-06 13:56:15
<?xml version="1.0" standalone="yes"?> <Paper uid="J87-3002"> <Title>LARGE LEXICONS FOR NATURAL LANGUAGE PROCESSING: UTILISING THE GRAMMAR CODING SYSTEM OF LDOCE</Title> <Section position="8" start_page="31" end_page="31" type="concl"> <SectionTitle> 7 CONCLUSION </SectionTitle> <Paragraph position="0"> Most applications for natural language processing systems will require vocabularies substantially larger than those typically developed for theoretical or demonstration purposes and it is often not practical, and certainly never desirable, to generate these by hand. The evaluation of the LDOCE grammar coding system suggests that it is sufficiently detailed and accurate (for verbs) to make the on-line production of the syntactic component of lexical entries both viable and labour saving. However, the success rate of the programs described above in producing useful lexical entries for a parsing system depends directly on the accuracy of the code assignments in the source dictionary. Correcting the mistakes and omissions in these assignments would be a non-trivial exercise. This is part of the motivation for adopting the interactive, rather than batch mode, approach to using the tape for lexicon development. We envisage eventually using the system to generate lexical entries in a semi-automatic fashion, allowing the user to intervene and correct errors during the actual process of constructing lexical entries, so that gradually a reliable and relatively error-free large lexicon for automated natural language processing systems containing detailed grammatical information can be constructed from LDOCE.</Paragraph> <Paragraph position="1"> Clearly, there is much more work to be done with LDOCE in the extension of the use of grammar codes and the improvement of the word sense classification system. Similarly, there is a considerable amount of information in LDOCE which we have not exploited systematically as yet; for example, the box codes, which contain selection restrictions for verbs or the subject codes, which classify word senses according to the Merriam-Webster codes for subject matter (see Walker and Amsler (1983) for a suggested use for these).</Paragraph> <Paragraph position="2"> The large amount of semi-formalised information concerning the interpretation of noun compounds and idioms also represents a rich and potentially very useful source of information for natural language processing systems. In particular, we intend to investigate the automatic generation of phrasal analysis rules from the information on idiomatic word usage.</Paragraph> <Paragraph position="3"> In the longer term, it is clear that neither the contents nor form of any existing published dictionary meet all the requirements of a natural language processing system. A substantial component of the research reported above has been devoted to restructuring LDOCE to make it more suitable for automatic analysis. However, even after this process much of the information in LDOCE remains difficult to access, essentially because it is aimed at a human reader, as opposed to a computer system. This suggests that the automatic construction of dictionaries from published sources intended for other purposes will have a limited life unless lexicography is heavily influenced by the requirements of automated natural language analysis. In the longer term, therefore, the automatic construction of dictionaries for natural language processing systems may need to be based on techniques for the automatic analysis of large corpora (eg. Leech et al., 1983). However, in the short term, the approach outlined in this paper will allow us to produce a relatively sophisticated and useful dictionary rapidly.</Paragraph> </Section> class="xml-element"></Paper>