File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/p97-1005_concl.xml
Size: 7,141 bytes
Last Modified: 2025-10-06 13:57:44
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1005"> <Title>Automatic Detection of Text Genre</Title> <Section position="6" start_page="35" end_page="37" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> The experiments indicate that categorization decisions can be made with reasonable accuracy on the basis of surface cues. All of the facet level assignments are significantly better than a baseline of always choosing the most frequent level (Table 1). and the performance appears even better when one considers that the machines do not actually know what the most frequent level is.</Paragraph> <Paragraph position="1"> When one takes a closer look at the performance of the component machines, it is clear that some facet levels are detected better than others. Table 2 shows that within the facet GENRE, our systems do a particularly good job on REPORTAGE and FICTION.</Paragraph> <Paragraph position="2"> trend correctly but not necessarily significantly for SCITECH and NONFICTION, but perform less well for EDITORIAL and LEGAL texts. We suspect that the indifferent performance in SCITECH and LEGAL texts may simply reflect the fact that these genre levels are fairly infrequent in the Brown corpus and hence in our training set. Table 3 sheds some light on the other cases. The lower performance on the EDITORIAL and NONFICTION tests stems mostly from misclassifying many NONFICTION texts as EDITORIAL.</Paragraph> <Paragraph position="3"> Such confusion suggests that these genre types are closely related to each other, as ill fact they are. Editorials might best be treated in future experiments as a subtype of NONFICTION, perhaps distinguished by separate facets such as OPINION and INSTITUTIONAL AUTHORSHIP.</Paragraph> <Paragraph position="4"> Although Table 1 shows that our methods predict BROW at above-baseline levels, further analysis (Table 2) indicates that most of this performance comes from accuracy in deciding whether or not a text is HIGH BROW. The other levels are identified at near baseline performance. This suggests problems with the labeling of the BRow feature in the training data. In particular, we had labeled journalistic texts on the basis of the overall brow of the host publication, a simplification that ignores variation among authors and the practice of printing features from other publications. Vv'e plan to improve those labelings in future experiments by classifying brow on an article-by-article basis.</Paragraph> <Paragraph position="5"> The experiments suggest that there is only a small difference between surface and structural cues, Comparing LR with surface cues and LR with structural cues as input, we find that they yield about the same performance: averages of 77.0% (surface) vs.</Paragraph> <Paragraph position="6"> 77.5% (structural) for all variables and 78.4% (surface) vs. 78.9% (structural) for selected variables. Looking at the independent binary decisions on a task-by-task basis, surface cues are worse in 10 cases Note. Numbers are the percentage of the evaluation subcorpus (:V = 97) which were correctly assigned to the appropriate facet level: the Baseline column tells what percentage would be correct if the machine always guessed the most frequent level. LR is Logistic Regression, over our surface cues (Surf.) or Karlgren and Cutting's structural cues (Struct.): 2LP and 3LP are 2- or 3-layer perceptrons using our surface cues. Under each experiment. All tells the results when all cues are used, and Sel. tells the results when for each level one selects the most discriminating cues. A dash indicates that an experiment was not run.</Paragraph> <Section position="1" start_page="36" end_page="37" type="sub_section"> <SectionTitle> Levels </SectionTitle> <Paragraph position="0"> Note. Numbers are the percentage of the evaluation subcorpus (N = 97) which was correctly classified on a binary discrimination task. The Baseline column tells what percentage would be got correct by guessing No for each level. Headers have the same meaning as in Table 1.</Paragraph> <Paragraph position="1"> * means significantly better than Baseline at p < .05, using a binomial distribution (N=97, p as per first column).</Paragraph> <Paragraph position="3"> Note. Numbers are the percentage of the texts actually belonging to the GENRE level indicated in the first column that were classified as belonging to each of the GENRE levels indicated in the column headers. Thus the diagonals are correct guesses, and each row would sum to 100%, but for rounding error.</Paragraph> <Paragraph position="4"> and better in 8 cases. Such a result is expected if we assume that either cue representation is equally likely to do better than the other (assuming a binomial model, the probability of getting this or a more 8 extreme result is ~-':-i=0 b(i: 18.0.5) = 0.41). We conclude that there is at best a marginal advantage to using structural cues. an advantage that will not justify the additional computational cost in most cases.</Paragraph> <Paragraph position="5"> Our goal in this paper has been to prepare the ground for using genre in a wide variety of areas in natural language processing. The main remaining technical challenge is to find an effective strategy for variable selection in order to avoid overfitting during training. The fact that the neural networks have a higher performance on average and a much higher performance for some discriminations (though at the price of higher variability of performance) indicates that overfitting and variable interactions are important problems to tackle.</Paragraph> <Paragraph position="6"> On the theoretical side. we have developed a taxonomy of genres and facets. Genres are considered to be generally reducible to bundles of facets, though sometimes with some irreducible atomic residue.</Paragraph> <Paragraph position="7"> This way of looking at the problem allows us to define the relationships between different genres instead of regarding them as atomic entities. We also have a framework for accommodating new genres as yet unseen bundles of facets. Finally, by decomposing genres into facets, we can concentrate on whatever generic aspect is important in a particular application (e.g., narrativity for one looking for accounts of the storming of the Bastille).</Paragraph> <Paragraph position="8"> Further practical tests of our theory will come in applications of genre classification to tagging, summarization, and other tasks in computational linguistics. We are particularly interested in applications to information retrieval where users are often looking for texts with particular, quite narrow generic properties: authoritatively written documents, opinion pieces, scientific articles, and so on.</Paragraph> <Paragraph position="9"> Sorting search results according to genre will gain importance as the typical data base becomes increasingly heterogeneous. We hope to show that the usefulness of retrieval tools can be dramatically improved if genre is one of the selection criteria that users can exploit.</Paragraph> </Section> </Section> class="xml-element"></Paper>