File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2090_concl.xml
Size: 2,695 bytes
Last Modified: 2025-10-06 13:55:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2090"> <Title>Implementing a Characterization of Genre for Automatic Genre Identification of Web Pages</Title> <Section position="7" start_page="704" end_page="705" type="concl"> <SectionTitle> 5 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> From a technical point of view, the inferential model presented in this paper is a simple starting point for reflection on a number of issues in automatic identification of genres in web pages.</Paragraph> <Paragraph position="1"> Although parameters need a better tuning and text type and genre palettes need to be enlarged, it seems that the inferential approach is effective, as shown by the preliminary evaluation reported in Section 4.3.</Paragraph> <Paragraph position="2"> More importantly, this model instantiates a theoretical characterization of genre that includes hybridism and individualization, and interprets these two elements as the forces behind genre evolution. It is also worth noticing that the inclusion of the attribute 'text types' in the tuple gives flexibility to the model. In fact, the model can assign not only a single genre label, as in previous approaches to genre, but also multiple labels or no label at all. Ideally other computationally tractable attributes can be added to the tuple to increase flexibility and provide a multi-facetted classification, for example register or layout analysis.</Paragraph> <Paragraph position="3"> However, other issues remain open. First, the possibility of a comprehensive evaluation of the model is to be explored. So far, only tentative evaluation schemes exist for multi-label classification (e.g. McCallum, 1999). Further research is still needed.</Paragraph> <Paragraph position="4"> Second, in this model the detection of emerging genres can be done indirectly through the analysis of an unexpected combination of text types and/or genres. Other possibilities can be explored in future. Also the objective evaluation of emerging genres requires further research and discussion.</Paragraph> <Paragraph position="5"> More feasible in the short term is an investigation of the scalability of the model, when additional web pages, classified or not classified by genre, are added to the web corpus. Also the possibility of replacing hand-crafted rules with some learning methodology can be explored in the near future. Apart from the approach suggested by Segal and Kephart (2000) mentioned above, many other pieces of experience are now available on adaptive learning (for example those reported in the EACL 2006 on Workshop on Adaptive Text Extraction and Mining).</Paragraph> </Section> class="xml-element"></Paper>