File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/90/c90-3063_concl.xml
Size: 1,803 bytes
Last Modified: 2025-10-06 13:56:34
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3063"> <Title>Automatic Processing of Large Corpora fbr the Resolution of Anaphor References</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> We have suggested using cooccurrence patterns, automatically acquired from a large corpus, as an alternative to selectional constraints. The initial results indicate that even in its basic form, as presented here, the approach is useful for disambiguation, and many times performs even better than the traditional model. This should be considered relative to the effort that would have been required to achieve such coverage and accuracy by manual acquisition of constraints, for the broad domain of parliament proceedings.</Paragraph> <Paragraph position="1"> SAlthough the constnlction of the full size database is not feasible for us, it is clearly feasible for a large scale project. This is shown by a similar database that w~s implemented as part of the laslgllage model of the IBM speech recognition system. 3?his database contalns counters for occurrences of sequences of three words in lm'ge corpora (trigrams), which arc much more numerous than our syntactic patterns.</Paragraph> <Paragraph position="2"> In a general perspective, this project promotes the use of a large corpus for linguistic research and applications. Processing such large corpora is a non-trivial engineering problem, the solution of which enables research to focus on complicated real world sentences. Our research demonstrates how statistical methods can be built on top of more 'traditional' linguistic tools, achieving a better and more feasible environment for the resolution of ambiguities.</Paragraph> </Section> class="xml-element"></Paper>