File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0901_intro.xml
Size: 1,056 bytes
Last Modified: 2025-10-06 14:01:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0901"> <Title>Comparing Corpora using Frequency Profiling</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Methodology </SectionTitle> <Paragraph position="0"> The method is fairly simple and straightforward to apply. Given two corpora we wish to compare, we produce a frequency list for each corpus. Normally, this would be a word frequency list, but as described above and as with examples in the following application section, it can be a part-of-speech (POS) or semantic tag frequency list. However, let us assume for now that we are performing a comparison at the word levee For each word in the two frequency lists we calculate the log-likelihood (henceforth LL) statistic. This is performed by constructing a contingency table as in Table 1.</Paragraph> <Paragraph position="1"> i The application of this technique to POS or semantic tag frequency lists is achieved by constructing the contingency table with tag rather</Paragraph> </Section> class="xml-element"></Paper>