File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0901_intro.xml

Size: 1,056 bytes

Last Modified: 2025-10-06 14:01:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0901">
  <Title>Comparing Corpora using Frequency Profiling</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Methodology
</SectionTitle>
    <Paragraph position="0"> The method is fairly simple and straightforward to apply. Given two corpora we wish to compare, we produce a frequency list for each corpus. Normally, this would be a word frequency list, but as described above and as with examples in the following application section, it can be a part-of-speech (POS) or semantic tag frequency list. However, let us assume for now that we are performing a comparison at the word levee For each word in the two frequency lists we calculate the log-likelihood (henceforth LL) statistic. This is performed by constructing a contingency table as in Table 1.</Paragraph>
    <Paragraph position="1"> i The application of this technique to POS or semantic tag frequency lists is achieved by constructing the contingency table with tag rather</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML