File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1004_intro.xml
Size: 2,786 bytes
Last Modified: 2025-10-06 14:05:41
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1004"> <Title>Lexicons for Human Language Technology</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> This paper will give an overview of current LDC efforts to develop lexical resources and describe some efforts now in the planning stage. Readers are invited to join an on-going discussion of priorities, methods and even formats for our present and future efforts in this area.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1. Intellectual Property Rights </SectionTitle> <Paragraph position="0"> Since lexicons, unlike text and speech databases, are likely to be incorporated (perhaps in derived form) in commercial HLT products, intellectual property rights come to center stage. The LDC's charter as a consortium requires us to leverage the U.S. government's investment by sharing the cost of resource development among our members. This forces us to limit usage of such resources to consortium members, or others who have paid an appropriate fee. However, we also want to encourage rapid development and broad exploitation of commercial HLT technology. Therefore, we need to protect our members' investment in research based on LDC resources by ensuring their rights to future commercial exploitation without additional license negotiations, royalty payments, or other intellectual property issues.</Paragraph> <Paragraph position="1"> This contrasts with the general practice for research use of machine-readable dictionaries, in which all rights to derived works are typically reserved to the publisher. For this reason, our lexicons will not be derived from existing lexicons, except as permitted by normal provisions of copyright law, or in case we are able to purchase uppropriate rights from the owner of the existing resource.</Paragraph> <Paragraph position="2"> This also contrast with our practice with respect to text databases, where we have negotiated agreements to distribute for research purposes many bodies of text whose copyright remains with the original owner. The difference here is that the text corpora themselves will not typically be incorporated in future products, and our understanding of the applicable law (which we openly explain to information providers) is that language models trained on such text are free of any IPR taint.</Paragraph> <Paragraph position="3"> We have worked hard, in consultation with our members, to develop an appropriate license for LDC lexicons. A copy of the draft license agreement for COM-LEX syntax will be furnished on request to the author, or ldc@unagi.cis.upenn.edu.</Paragraph> </Section> </Section> class="xml-element"></Paper>