File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4020_intro.xml
Size: 1,486 bytes
Last Modified: 2025-10-06 14:02:19
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4020"> <Title>Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Project Description </SectionTitle> <Paragraph position="0"> This inquiry into interannotator reliability measurements was spawned by problems encountered during a project classifying and summarizing email messages. In this project email messages are classified into one of ten classes. This classification facilitates email thread reconstruction as well as summarization. Distinct email categories have distinct structural and linguistic elements and thus ought to be summarized differently. For the casual email user, the luxuries of summarization and automated classification for the dozen or so daily messages may be rather superfluous, but for those with hundreds of important emails per day, automatic summarization and categorization can provide an efficient and convenient way to both scan new messages (e.g., if the sender responds to a question, the category will be &quot;answer&quot;, while the summary will contain the response) and retrieve old ones (e.g., &quot;Display all scheduling emails received last week&quot;). While the project intends to apply machine learning techniques to both facets, this paper will be focusing on the categorization component.</Paragraph> </Section> class="xml-element"></Paper>