File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/c96-2100_concl.xml

Size: 1,686 bytes

Last Modified: 2025-10-06 13:57:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2100">
  <Title>Good Bigrams</Title>
  <Section position="8" start_page="596" end_page="596" type="concl">
    <SectionTitle>
8 Conclusion
</SectionTitle>
    <Paragraph position="0"> The question is &amp;quot;what is gained by using a measure?&amp;quot;. Mutual infornmtion tends to find combinations of words that are highly co-ordinated with each other, but these bigrams show both interesting bigrams (e.g. &amp;quot;cheshire cat&amp;quot;) and conventional (and uninteresting for keywords) bigrams (e.g. &amp;quot;in a&amp;quot;). The stability of interesting bigrams is improved by demanding candidate bigrams to occur more than a fixed number of times.</Paragraph>
    <Paragraph position="1"> In this paper it has been shown that genre matters, and can be used to extract items that differ between genres. Instead of balancing one big corpus, the analysis of one corpus might benefit from finding out how it is different from another corpus. The bigrams that were formed by using different genres as filters showed interesting characteristics.</Paragraph>
    <Paragraph position="2"> However, if we are to deal with larger amounts of data it might be unrealistic to compare differences directly between two large genres without the exclusion of terms that occur by chance.</Paragraph>
    <Paragraph position="3"> The method that could be recommended from the results presented in this study is to triangulate a sample by the difference to other gcnres that we have some recta-knowledge about (i.e. we know that Western Fiction and Scientific Writing, at least on the surface, have little vocabulary in common).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML