File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0124_intro.xml
Size: 1,641 bytes
Last Modified: 2025-10-06 14:06:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0124"> <Title>Analysis of Unknown Lexical Items using Morphological and Syntactic Information with the TIMIT Corpus</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> One of the problems facing natural language parsing (NLP) systems is the appearance of unknown words; words that appear in sentences, but are not contained within the lexicon for the system. This problem is one that will only get worse as NLP systems are used for more on-line computer applications. New words are continually added to the language, and people will often use words that a parsing system may not expect.</Paragraph> <Paragraph position="1"> This paper will empirically investigate how well a dictionary of closed-class words, syntactic parsing rules, and a morphological recognizer can parse sentences containing unknown words in natural language processing tasks. Syntactic knowledge can be used to aid in the analysis of unknown words--sentence structure can be a strong clue as to the possible part of speech of an unknown word. The distinction between closed-class and open-class words should help to refine the possibilities for an unknown word and enhance the information provided by the syntactic knowledge. Morphological recognition can also be helpful in predicting possible parts of speech for many unknown words. We expect that these three knowledge sources will greatly improve our parser's ability to process and cope with words that are not in the system lexicon.</Paragraph> </Section> class="xml-element"></Paper>