File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2165_intro.xml

Size: 6,194 bytes

Last Modified: 2025-10-06 14:05:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2165">
  <Title>CATCHING THE CHESHIRE CAT</Title>
  <Section position="3" start_page="0" end_page="1021" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> In Alice's Adventures in Wonderland by Lewis Carrel many of Alice's friends have names that consists of two words, for example: the March Hare, the Mock Turtle, and the Cheshire Cat.</Paragraph>
    <Paragraph position="1"> '\['he individual words in these combinations, if we ignore capitalisation, might be quite common. null Individual words usually mean different things when they am free. l:or example, in &amp;quot;The March against Apartheid&amp;quot;, and &amp;quot;The March I tare&amp;quot;, &amp;quot;march&amp;quot; means totally different things. There is obviously a strong link between &amp;quot;the&amp;quot; and &amp;quot;march&amp;quot;, but the link between &amp;quot;march&amp;quot; and &amp;quot;hare&amp;quot; is definitely stronger, at least in Lt;wis Carrol's text.</Paragraph>
    <Paragraph position="2"> The goal of this paper is to propose a statistic that measures the strength ol7 such glue between words in a sampled text. Finding tile names {)17 Alice's friends can be done by searching for two adjacent words with initial capit~d letters.</Paragraph>
    <Paragraph position="3"> ()no use of statistical associations could he to find translatable concepts and phrases, that might be expressed with a different number of words in another language. Another possibly interesting use of statistical associations is to predict whether words constitute new or given information in speech. It has been proposed (e.g. Horne&amp; Johansson, 1993) that the stress of words in speech is highly dependent on the informational content of the word. Also, statistical associations are not incompatible with the first stages of the &amp;quot;hypothesis space&amp;quot; proposed by Processability Theory (personal communication with Manfred Pienemann of Sydney University, see also Meisel &amp; al., 1981).</Paragraph>
    <Paragraph position="4"> There are different methods of calculating statistical associations. Yang &amp; Chute (1992) showed that a linear least square mapping of natural language to canonical terms is both feasible, and a way of detecting synonyms. Their method does not seem to detect dependencies in the order of words however. To do this we need a measure that is sensitive to the order between words. In this paper we will use a variant of mutual infi)rmation that derives from Shannon's theory of information. (as discussed in e.g., Salton &amp; McGill, 1983) Definitions and assumptions The definition of a word in a meaninglul way is \[:ar from easy, but a working definition, for technical purposes, is to assume that a word equals a string of letters. These 'words' are separated by non-letters. The case of letters is ignored, i.e. converted into lower case. For example: &amp;quot;there's&amp;quot; are two 'words': &amp;quot;there&amp;quot; and ~IS&amp;quot;.</Paragraph>
    <Paragraph position="5"> A collocation consists of a word and the word that immediate@ follows. Index I will refer to the first word and 2 to the second word.</Paragraph>
    <Paragraph position="6"> Index 12 will refer to word 1 followed by word2, and similarly for 2 I.</Paragraph>
    <Paragraph position="7"> Another assumption is that natural language is morn predictive in the (left-to-right) temporal order, than in tile reversed order. This is motiwtted by the simple obserwttion that speech comes into the system through the ears serially.</Paragraph>
    <Paragraph position="8"> For example: consider the French phrase &amp;quot;un ben viu hlanc&amp;quot; (Lit. &amp;quot;a good wine white&amp;quot;). &amp;quot;Ben&amp;quot; can (relatively often) be followed by &amp;quot;vin&amp;quot;, but usually not &amp;quot;vin&amp;quot; by &amp;quot;ben&amp;quot;. The same kind of link exists between &amp;quot;vin&amp;quot; and &amp;quot;bhmc&amp;quot;, but not between &amp;quot;blanc&amp;quot; and &amp;quot;vin&amp;quot;. This linking affects the intonation of French phrases, and also that intonation supports these kinds of links. Note, that this is not an explana-.</Paragraph>
    <Paragraph position="9"> tion of either intonation or syntax: we mosl likely have to consider massive interaction be-.</Paragraph>
    <Paragraph position="10"> tween different modalities of language.</Paragraph>
    <Paragraph position="11">  Deriving the measure The mutual information ratio, g, provides a rough estimation on the glue between words. It measures, roughly, how much more common a collocation is in a text than can be accounted for by chance. This measure does not assume any ordering between the words making up a collocation, in the sense that the g-measure of \[wl...w2\] and \[w2...wl\] are calculated as if they were unrelated collocations.</Paragraph>
    <Paragraph position="12"> The mutual information ratio (in Steier &amp; Belew, 1991) is expressed: , Formula 1: The mutual information ratio where 'p' defines the probability function, p(\[wl...w2\]) is read as &amp;quot;the probability of finding word w2 after word wl&amp;quot;.</Paragraph>
    <Paragraph position="13"> Adjusting for order between words We have experimented with the difference in mutual information, ag, between the two different orderings of two words making up a collocation. The results indicate that zxg captures some of the local constraints in a sampled text.</Paragraph>
    <Paragraph position="15"> where F(\[wx...Wy\]) denotes the frequency of which Wx and Wy co-occur in the sample.</Paragraph>
    <Paragraph position="16"> F(wx) is the frequency of word Wx. Note that the size of the sample cancels in this equation.</Paragraph>
    <Paragraph position="17"> Note also that this measure is not sensitive to the individual probabilities of the words.</Paragraph>
    <Paragraph position="18"> A problem is when them is no F(\[w2...wl\]).</Paragraph>
    <Paragraph position="19"> In these cases, we have chosen to arbitrarily set F(\[w2...Wl\]) to 0.1, with the justification that if the sample was ten times larger we might have found at least one such pair.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML