File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0124_concl.xml
Size: 2,025 bytes
Last Modified: 2025-10-06 13:57:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0124"> <Title>Analysis of Unknown Lexical Items using Morphological and Syntactic Information with the TIMIT Corpus</Title> <Section position="9" start_page="269" end_page="269" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We have shown that morphological recognition, the distinction between closed-class and open-class words, and syntactic knowledge are powerful tools in hz~ndling unknown words, especially when we use a post-mortem method of determining the probable lexical classes of words. These knowledge sources allow us to determine parts of speech for unknown words without using domain-specific knowledge in the TIMIT corpus. The insertion rate can be drastically reduced with only a moderate increase in the deletion rate. Obviously, there is a trade-off between the deletion rate and the insertion rate. This tradeoff can be manipulated by altering the morphological rules to place more importance on a low deletion rate or a low insertion rate, by modifying our post-mortem approach to obtain finer control over the process of handling unknown words, or by considering additional knowledge sources. This issue should be of interest for any researcher developing a parsing system that will need to deal with unknown words.</Paragraph> <Paragraph position="1"> Future work will investigate the effectiveness of the morphologicai recognizer. We would like to compare a computer-generated morphological recognition module with the hand-generated corpora. Finally, we would like to refine the post-mortem approach by offering a more elegant solution than the combination of first-choice and second-choice lists. There is information in the parse forest of failed parses that may allow single words to be identified as &quot;problem words&quot;. This would allow the parser to reparse the sentence changing only a few words' definitions, providing better all-around performance.</Paragraph> </Section> class="xml-element"></Paper>