File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-2805_abstr.xml
Size: 1,099 bytes
Last Modified: 2025-10-06 13:45:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2805"> <Title>Learning to Recognize Blogs: A Preliminary Exploration</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We present results of our experiments with the application of machine learning on binary blog classification, i.e. determining whether a given web page is a blog page. We have gathered a corpus in excess of half a million blog or blog-like pages and pre-classified them using a simple baseline. We investigate which algorithms attain the best results for our classification problem and experiment with resampling techniques, with the aim of utilising our large dataset to improve upon our baseline. We show that the application of off-the-shelf machine learning technology to perform binary blog classification offers substantial improvement over our baseline. Further gains can sometimes be achieved using resampling techniques, but these improvements are relatively small compared to the initial gain.</Paragraph> </Section> class="xml-element"></Paper>