File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0719_metho.xml

Size: 16,249 bytes

Last Modified: 2025-10-06 14:15:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0719">
  <Title>I I I I I I I I I I</Title>
  <Section position="4" start_page="137" end_page="139" type="metho">
    <SectionTitle>
3 Lexical FreeNet
</SectionTitle>
    <Paragraph position="0"> Lexical FreeNet is an instance of FreeNet supporting a range of lexical semantic applications. It achieves this by mixing statistically-derived and knowledgederived relations.</Paragraph>
    <Paragraph position="1">  The tokens in Lexical FreeNet are the words that appear in at least one of the program's various data sources. This includes over 130,000 words from the CMU Pronouncing Dictionary vl.6d (CMU, 1997), 160,000 words and multiple-word phrases from WordNet 1.6, and 60,000 words from the broadcast news transcripts used to train the trigger relation. The intersection between these three sources is significant, of course, and in total there are slightly under 200,000 distinct tokens, including phrases.</Paragraph>
    <Paragraph position="3"> number of word pairs that exist in both relations.</Paragraph>
    <Paragraph position="4"> One of the 5 pairs counted in the cell at (ANT, C0M), for example, is (DAY, ~IGh'T).</Paragraph>
    <Section position="1" start_page="137" end_page="138" type="sub_section">
      <SectionTitle>
Relations
</SectionTitle>
      <Paragraph position="0"> Lexical FreeNet includes seven semantic relations, two phonetic relations, and one orthographic relation. These relations connect the token set with about seven million links, costing 30 MB of disk space. A summary of the relations is shown in Figure 1. Below we use a bidirectional arrow (.,t--&gt;) to indicate a symmetric relation, and a unidirectional arrow (==:,) to indicate an assymetric relation.</Paragraph>
      <Paragraph position="1"> &amp;quot;Synonym of&amp;quot; (~) This relation is computed by taking, for each synonym set (or synset) in all WordNet 1.6 word categories, the cross-product of the synonym set with itself, excluding reflexive links (self-loops). That is to say, we include all pairs of lexemes in each synset except the links from a lexeme to itself. Thus we mix different lexeme senses into the same soup, conflating, for example, the noun and verb senses of BIKE in bike ~ bicycle and bike ~=~ pedal.</Paragraph>
      <Paragraph position="3"> Trigger pairs are ordered word pairs that co-occur significantly in data; that is, they are pairs that appear near each other in text more frequently than  would be expected if the words were unrelated. Given a large corpus of text data, we built the assymetric trigger relation by finding the pairs in the cross-product of the vocabulary that have the highest average mutual information, as in (Rosenfeld, 1994; Beeferman et al., 1997). Mutual information is one measure of whether an observed co-occurrence of two vocabulary words is not due to chance. Word pairs with high mutual information are likely to be semantically related in some way.</Paragraph>
      <Paragraph position="4"> We chose 160 million words of Broadcast News data (LDC, 1997) for this computation, and defined co-occurrence as &amp;quot;occurring within 500 words&amp;quot;, approximately the average document length. We selected the top 350,000 trigger pairs from the ranking to use in the relation, putting the size of the relation on par with the synonym relation. 1 Some of the top trigger pairs discovered by this procedure are shown in Table 2. In our implementation we limit the number of trigger links emanating from a token to the top 50, and prune away links that include any member of a handcoded stopword set that includes function words.</Paragraph>
      <Paragraph position="5">  by mutual information, in the Lexical FreeNet trigger relation, and the 500th through 505th-ranked pairs. The highest-ranked pairs tend to be distanceone bigram phrases, while the remainder co-occur at greater distances.</Paragraph>
      <Paragraph position="6"> &amp;quot;Specializes&amp;quot; (~:~) and &amp;quot;Generalizes&amp;quot; (~:~g) The specialization relation captures the lexical inheritance system underlying WordNet nouns (Miller, 1990) and verbs (Fellbaum, 1990). It is computed by taking, for each pair of WordNet synsets that appear as parent and child in the WordNet hyponym trees, the cross-product of the pair. For example, shoe ~ footrest.</Paragraph>
      <Paragraph position="7"> The generalization relation is simply the inverse of specialization relation, or SPC-. For example: tree ~ cypress.</Paragraph>
      <Paragraph position="8"> I We used the Trigger Toolkit, available at http ://v~. cs. cmu. edu/ aberger/softeare, h~l, for this computation</Paragraph>
      <Paragraph position="10"> &amp;quot;Part of&amp;quot; (:~) and &amp;quot;Comprises&amp;quot; (~,) PAR The ==C/, relation captures meronomy, another inheritance system which can informally be thought of as a &amp;quot;part of&amp;quot; tree over nouns. It is computed by taking, for each pair of WordNet synsets that are related in WordNet by the meronym relation, the cross-product of the pair. For example, shoe =~g footwear. The &amp;quot;comprises&amp;quot; relation is simply its COb! inverse, PAR-, as in tree ==~ cypress.</Paragraph>
      <Paragraph position="11"> &amp;quot;Antonym of&amp;quot; (~=~) The antonym relation uses the antonym relation defined in WordNet for nouns, verbs, adjectives, and adverbs. It is computed by taking, for each pair of WordNet synsets that are related in WordNet by the antonym relation, the cross-product of the pair. For example, clear ~ opaque.</Paragraph>
      <Paragraph position="12"> &amp;quot;Phonetically similar to&amp;quot; (qs~) and &amp;quot;Rhymes with&amp;quot; (a,_~.) To allow users to cross the dimensions of sound and meaning in their queries, two phonetic relations are added to the mix in Lexical FreeNet. These relations, while amusing for shortest path queries, are not expected to contribute to the text processing applications discussed later in this paper. Both relations leverage the phonetic and lexical stress transcriptions in the CMU Pronouncing Dictionary.</Paragraph>
      <Paragraph position="13"> The ~ relation is computed by adding every pair of words in the vocabulary that have pronunciations which differ in edit distance by at most some number of edits. Edit distance is computed using a dynamic programming algorithm as the minimum number of substitutions, insertions, and deletions (unweighted, and blind to nearness in substitution) to the first word's phonetic sequence required to reach the second word's phonetic sequence. In our current implementation we limit the relation to pairs with edit distance at most 1, e.g. cancel candle.</Paragraph>
      <Paragraph position="14"> The ~:~ relation is computed by adding each pair of words that have pronunciations such that their phonetic suffixes including and following the primary an,( stressed syllables match, e.g. Reno ~ Casino.</Paragraph>
      <Paragraph position="15"> &amp;quot;Anagram of&amp;quot; (~:~:~) AN The final relation, ~:~, is almost, but not quite, completely useless, symmetrically linking lexemes that use the same distribution of letters, as in ANA Geraldine C/=~ realigned. This is perhaps best described as a &amp;quot;wormhole&amp;quot; in lexical space.</Paragraph>
    </Section>
    <Section position="2" start_page="138" end_page="139" type="sub_section">
      <SectionTitle>
Extensions
</SectionTitle>
      <Paragraph position="0"> A portion of the wealth of WordNet was discarded in Lexical FreeNet--the verb entailment relation, for instance. Adjectives are somewhat slighted by the system, as their WordNet description in terms of bipolar attributes (Gross and Miller, 1990) is largely ignored.</Paragraph>
      <Paragraph position="1"> Other possible semantic relations include the more specialized knowledge-engineered links that appear  in typically narrow-coverage semantic nets, such as &amp;quot;acts on&amp;quot;, &amp;quot;uses&amp;quot;, &amp;quot;stronger than&amp;quot;, and the like. Data-driven approaches to relation induction that dig deeper than the collocation extraction of the trigger computation may prove useful and interesting. One approach (Richardson, 1997; Richardson et al., 1993) bootstraps a parser to induce many unconventional semantic relations from dictionary data. A link grammar (Sleator and Temperley, 1991) applied to data can conceivably be used to extract some interesting relations that live at the syntax/semantics interface.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="139" end_page="139" type="metho">
    <SectionTitle>
4 Lexical discovery
</SectionTitle>
    <Paragraph position="0"> A World Wide Web interface to Lexical FreeNet, depicted in Figure 3, is available and has become a popular online resource since its release in late January, 1998.:. The program allows the user to issue one of the four template queries to the database do scribed in Section 2.3. One of these query templates (&amp;quot;Fanout&amp;quot;) requires only a single source token as input, and this has become a popular lookup tool, providing some of the functionality of a thesaurus and rhyming dictionary. The other query functions require source and target tokens to be specified. Each token can itself contain spaces in the case of phrasal inputs, which are normalized to the underscore character in processing. The four basic queries allow the user to specify a subset of the ten primitives relations to permit in the output paths by clicking a series of checkboxes. Upon submission, the state of the checkboxes sets the ANY relation to be the union of checked relations.</Paragraph>
    <Paragraph position="1"> An additional &amp;quot;Spell check&amp;quot; query mode allows the user to find database tokens that have similar (or exact) spelling to a given input token, where similarity is measured by an orthographic edit distance.</Paragraph>
    <Paragraph position="2"> Upon submission, the system finds and displays the path or paths resulting from the query with arrow glyphs representing the various relations.</Paragraph>
    <Paragraph position="3"> Queries typically finish within an acceptable time window of three to ten seconds. The results screen summarizes the query and allows the user to resubmit it with modifications, improving the ease of database &amp;quot;navigation&amp;quot; over having to return to the title screen.</Paragraph>
    <Paragraph position="4"> Feedback from the Web site indicates that the system has been used as an aid in writing poetry and lyrics; devising product names; generating puzzles for elementary school language arts classes; writing greeting cards; devising insults and compliments; and, above all, just exploring. Following are selected examples of the system's output in various configurations. null Shortest path queries The shortest path query is the primary vehicle for establishing connections between words and concepts:</Paragraph>
    <Section position="1" start_page="139" end_page="139" type="sub_section">
      <SectionTitle>
Lexical FreeNet
</SectionTitle>
      <Paragraph position="0"> and quips involving the two endpoint concepts.</Paragraph>
      <Paragraph position="1"> For example, below is the shortest path between Clinton and Lewinsky using all relations:</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="139" end_page="140" type="metho">
    <SectionTitle>
CLINTON ~ HOUSE ~ CABIN
KACZYNSKI ~:~ LEWINSKY
</SectionTitle>
    <Paragraph position="0"> * Shortest path queries allowing only the hyponomy relations can connect any two nouns in the WordNet hyponymy tree through their least common ancestor. For example, animals can be connected taxonomically, as in the shortest path between porto and langur using only the specialization (:~) and generalization (~:~) and relations:</Paragraph>
    <Paragraph position="2"> * Shortest path queries allowing only the meronomy relations can connect many noun pairs.</Paragraph>
    <Paragraph position="3"> For example, geographical connections can be made between place names to find the largest enclosing region, as in the shortest path between  mon words can be connected using only the synonym relation (~*::~). This demonstrates the high degree of polysemy exhibited by familiar words. Consider the shortest synonym path between one and zero. a computer scientist's favorite antonym pair. Every successive word pair exhibits a different sense:  ZERO ~ CIPHER ~=~ CALCULATE SYN DIRECT ~ LEAD ~ STAR &lt;~ ACE &lt;~ ONE  * Using only the trigger (~=~) relation, one can connect concepts that occur in the domain of the data used to train the trigger pairs, in this case broadcast news:</Paragraph>
  </Section>
  <Section position="7" start_page="140" end_page="140" type="metho">
    <SectionTitle>
TRG SMOKING ~ CIGARETTES ~ MACHINES
COMPUTERS
</SectionTitle>
    <Paragraph position="0"> * The trigger relation enriches the WordNet-derived vocabulary of common nouns with topical proper names, as in the shortest paths shown below. Trigger pairs are often expressible in terms of a sequence of one or more WordNet-derived relations. In many cases, however, news-based triggers defy any fixed set of hand-coded lexical relations.</Paragraph>
  </Section>
  <Section position="8" start_page="140" end_page="140" type="metho">
    <SectionTitle>
TRO TITANIC ~:~ SANK ~:~ SHIP ~ VALDEZ
COFFEE
TRG NADER ~ REGULATIONS
ENVIRONMENTAL ~ GORE
FALWELL ~ CHRISTIAN
CONSERVATIVE ~ GINGRICH
</SectionTitle>
    <Paragraph position="0"> * But when the WordNet-derived semantic relations are permitted in addition to the trigger relation, shortest paths become shorter, overcoming the inherent limitations of the data-derived triggers. In the case below, the pair (relativity, physics) did not occur sufficiently often in training data for the pair to make the grade as a trigger.</Paragraph>
  </Section>
  <Section position="9" start_page="140" end_page="140" type="metho">
    <SectionTitle>
EINSTEIN ~ RELATIVITY ~ PHYSICS
VELOCITY ~ SPEED_OF.LIGHT
</SectionTitle>
    <Paragraph position="0"> * For amusement, the phonetic relations, rhymeswith (~=~) and sounds-like (,~:~), can be used alone to produce &amp;quot;word ladders&amp;quot; of sequentially similar words, as in the example below. In combination with the semantic relations, the phonetic relations can aid in creating rhymed poetry and puns.</Paragraph>
    <Paragraph position="1"> IFE NINE sPINE sPOON Intersection queries Intersection queries can be used in Lexical FreeNet to find the set of concepts and words that two inputs both directly relate to in some way. We use the notation (wl =~, w.~ =~)w3 to mean that &amp;quot;wl is related to w3 by relation rt, and w~. is related to w3  * Triggers can be a useful tool for discovering what two names in the news have in common, or two names in history:</Paragraph>
  </Section>
  <Section position="10" start_page="140" end_page="140" type="metho">
    <SectionTitle>
(STARR ~, MCDOUGAL ~:~:~) WEtITEVATER
(CHURCHILL r.o T.O :==:~, STALIN =:::~)
</SectionTitle>
    <Paragraph position="0"> HITLER, ROOSEVELT, TRUMAN, POTSDAM * In some cases, identification questions can be formulated as intersection queries. For example, &amp;quot;What's the name of that congresswoman from Colorado I'm always hearing about?&amp;quot; can be asked as an intersection query with arguments (congressvoman, Colorado). &amp;quot;What's the capital of the state of Nebraska?&amp;quot; can be asked as an intersection query with arguments (Nebraska, state_capital):  The phonetic relations in Lexical FreeNet are particularly useful for finding rhyming words with certain target meanings. The coercion function on the Web interface is hardcoded such that the relation ret (see Section 2.3) is simply the union of all semantic relations, and re2 is the union of all phonetic relations. Thus, given two endpoint words (wt, w.,), the system tries to find words (w~, w'), with respectively related meanings, that rhyme or sound alike. For example, if you wanted to write a poem about petting a lion, you might do a coercion query with the words Couch and lion. Amongst a few others, you'll get back the suggestions (RUB, CUB), since TOUCH ~ RUB and LION ~ CUB; and (PAT, CAT), since TOUCH ~ PAT and LION ~ CAT. Most rhyme coercion queries to the online system have produced at least one result in this manner.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML