File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-1026_concl.xml
Size: 2,586 bytes
Last Modified: 2025-10-06 13:54:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1026"> <Title>Linguistic Profiling for Author Recognition and Verification</Title> <Section position="9" start_page="0" end_page="0" type="concl"> <SectionTitle> 8 Conclusion </SectionTitle> <Paragraph position="0"> Linguistic profiling has certainly shown its worth for authorship recognition and verification. At the best settings found so far, a profiling system using combination of lexical and syntactic features is able select the correct author for 97% of the texts in the test corpus. It is also able to perform the verification task in such a way that it rejects no texts that should be accepted, while accepting only 8.1% of the texts that should be rejected. Using additional knowledge about the test corpus can improve this to 100% and 2.4%.</Paragraph> <Paragraph position="1"> The next step in the investigation of linguistic profiling for this task should be a more exhaustive charting of the parameter space, and especially the search for an automatic parameter selection procedure. Another avenue of future research is the inclusion of even more types of features. Here, however, it would be useful to define an even harder verification task, as the current system scores already very high and further improvements might be hard to measure.</Paragraph> <Paragraph position="2"> With the current corpus, the task might be made harder by limiting the size of the test texts.</Paragraph> <Paragraph position="3"> Other corpora might also serve to provide more obstinate data, although it must be said that the current test corpus was already designed specifically for this purpose. Use of further corpora will also help with parameter space charting, as they will show the similarities and/or differences in behaviour between data sets. Finally, with the right types of corpora, the worth of the technique for actual application scenarios could be investigated. null So there are several possible routes to further improvement. Still, the current quality of the system is already such that the system could be applied as is. Certainly for authorship recognition and verification, as we hope to show by our participation in Patrick Juola's Ad-hoc Authorship Attribution Contest (to be presented at ALLC/ACH 2004), for language verification (cf.</Paragraph> <Paragraph position="4"> van Halteren and Oostdijk, 2004), and possibly also for other text classification tasks, such as language or language variety recognition, genre recognition, or document classification for IR purposes.</Paragraph> </Section> class="xml-element"></Paper>