File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1003_intro.xml
Size: 1,196 bytes
Last Modified: 2025-10-06 14:06:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1003"> <Title>Noun-Phrase Analysis in Unrestricted Text for Information Retrieval</Title> <Section position="3" start_page="0" end_page="17" type="intro"> <SectionTitle> 2 (Salton & McGill, 1983) </SectionTitle> <Paragraph position="0"> 1. Ability to process large amounts of text The amount of text in the databases accessed by modem IR systems is typically measured in gigabytes. This requires that the NLP used must be extraordinarily efficient in both its time and space requirements. It would be impractical to use a parser with the speed of one or two sentences per second.</Paragraph> <Paragraph position="1"> 2. Ability to process unrestricted text The text database for an IR task is generally unrestricted natural-language text possibly encompassing many different domains and topics. A parser must be able to manage the many kinds of problems one sees in natural-language corpora, including the processing of unknown words, proper names, and unrecognized structures. Often more is required, as when spelling, transcription, or OCR errors occur. Thus, the NLP used must be especially robust.</Paragraph> </Section> class="xml-element"></Paper>