File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/91/h91-1068_concl.xml
Size: 1,554 bytes
Last Modified: 2025-10-06 13:56:40
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1068"> <Title>Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum Assoc.,</Title> <Section position="7" start_page="349" end_page="349" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> In this paper we described the experiments with an efficient processing of large collections of natural language documents that could lead to an effective and reliable method for automated indexing of text in information retrieval applications.</Paragraph> <Paragraph position="1"> The documents are initially tagged with a stochastic tagger, and then parsed with the 'ITP parser that generates approximate regularked &quot;logical&quot; structure for each sentence. These structures are subsequently analyzed by various statistical processes that collect data about word frequencies, co-occurrences and similarities. The results obtained in deriving word pairs show a marked improvement in precision for capturing the &quot;correct&quot; word dependencies as compared to more traditional methods in information retrieval that use only very limited parsing \[7\]. The computed similarity sets are quite interesting and they produce meaningful classifications. These results can still be improved if the statistical data is collected from a larger amount of text. We believe that the improved precision in text indexing will translate into an improved precision in document retrieval.</Paragraph> </Section> class="xml-element"></Paper>