File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-2165_evalu.xml

Size: 5,911 bytes

Last Modified: 2025-10-06 14:00:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2165">
  <Title>CATCHING THE CHESHIRE CAT</Title>
  <Section position="6" start_page="1021" end_page="1023" type="evalu">
    <SectionTitle>
RESULTS
</SectionTitle>
    <Paragraph position="0"> The collocations were ordered differently by the two measures. The g was sensitive to individual frequencies, and favoured very low fi'equency collocations. The Ag was sensitive to the ordering of the words, and favoured high frequency collocations that only occmred in one order. The quality of the diffemnt measures can be seen by comparing the top and last ten collocations between the measures. Table 1.1 and 2.1 refer to Ag, and Table 1.2 and 2.2 refer to g. The N column tells the rank-number of the collocation.</Paragraph>
    <Paragraph position="1"> Note that the frequencies of the individual words, F1 and F2, are not used to compute Ag, they are only provided for compa~%on with the g-measure.</Paragraph>
    <Paragraph position="2"> Note that the numerical values of the g-measure and the Ag-measure cannot be directly compared since they measure slightly different phenomena.</Paragraph>
    <Paragraph position="3">  als gives a measure of local links between words. As can be seen from Table 1.1, Abt captures local constraints: that prepositions am usually followed by a noun phrase, that 'and' usually is used as a noun co-ordinator (indicated by the high value for 'and-&gt;the'). Mitjushin (1992) has proposed similar links on a higher syntactic level, using a rule-based approach. We have deliberately tried to awfid talking about word-classes since it is misleading at this level of analysis. However, we get many examples of good representatives for word-</Paragraph>
    <Paragraph position="5"> The flavour of the collocations that bt rate highly is different. As can be seen from Table 1.2, low individual frequencies result in a high g-value, even if the collocation is unique. This gives an illusion of a semantic relation, which is due to the fact that low frequency words arc usually high in content. The g-measure is useful when we are interested in the correlation between words within and between documents (Steier &amp; Belew, 1991). This notion could be expanded up{}n to incorporate correlation between any two words in general, and it seems to work well for the g-measure (Wettler and Rapp, 1989).</Paragraph>
    <Paragraph position="6"> The last ten collocations. Ag is sensitive to deviation from an expected ordering in tile sample. The negative valued link between these words makes a phrase boundary between the two words probable.</Paragraph>
    <Paragraph position="7">  The g-measure, in contrast, gives some collocations that are intuitively unlikely phrases consisting of high frequency words. In the case of &amp;quot;the-&gt; the&amp;quot; there exists 1641 pairs that speak against that pairing, but it is hard to explain this in terms of local syntactic constraints. The negative scores seems to capture possible typographic errors.</Paragraph>
    <Paragraph position="8">  ~t-measure, because the individual fl'cquencies of the particles are usually devastatingly high, ~md the fl'equency of the main verb in pm'ticle verb constructions are usually higher than avcrage. The Abt are, in gencral, good at finding such combinations if the order between the two words is fixed ('Fable 3.1).</Paragraph>
    <Paragraph position="9">  But what about finding Alice's friends' ? Does the art find the phrases that the text is about (~ thematic phrases)? To test this we chose some of the names of Alice's friends (Table 3.2).</Paragraph>
    <Paragraph position="10"> We found that the rank number that Ag delivers is higher than the rank number for the rt-measure for all the checked friends. This is due to the frequency effects discussed above.</Paragraph>
    <Section position="1" start_page="1023" end_page="1023" type="sub_section">
      <SectionTitle>
What is lost
</SectionTitle>
      <Paragraph position="0"> There am obviously good phrases that g rates higher than zXg. These usually consists of two words that are uncommon in the sample.</Paragraph>
      <Paragraph position="1"> Some idioms are of this kind. The at* needs to find more examples of collocations with the exact ordering between the consti-tuents to rate the collocation high ( Table 3.3).</Paragraph>
      <Paragraph position="2">  We have also done some experiments with adding memory to the method. A 'memory' could, for example, extend 10 words after each word. All words following within a distance equal to the size of the memory were collected. Adding a memory allowed the model to detect shared information of words that was further apart (for example &amp;quot;pack of card~&amp;quot; or &amp;quot;boots and shoes&amp;quot;.</Paragraph>
      <Paragraph position="3"> The memory introduced false collocations: e.g., &amp;quot;grammar-&gt; mouse&amp;quot;. The context was: &amp;quot;Alice thought thi,~&amp;quot; lnust be the right way of speaking to a mouse: she had never done such a thing before, but she remembered having seen in her brother~ Latin Grammar, ',4 mouse--era mouse--to a mouse--a mouse--O mouse\]'&amp;quot; This context gave up to 5 collocations for &amp;quot;grammar&amp;quot; followed by &amp;quot;mouse&amp;quot;, and therefore rated &amp;quot;grammar-&gt; mouse&amp;quot; very high.</Paragraph>
      <Paragraph position="4"> Otherwise, words that happened to be near a word without being statistically related to the word were usually rated low. The g gave clearly better results on finding related phrases than the zXg, with the model with the 'memory'.</Paragraph>
      <Paragraph position="5"> With the memory, the Abt ordered the pairs closer to the original raw-frequency ordering the more 'memory' was present. The experiment with the memory was useful because it showed that this was not worth doing for aj.t, but likely worth doing for g.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML