File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/n01-1024_evalu.xml

Size: 2,789 bytes

Last Modified: 2025-10-06 13:58:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="N01-1024">
  <Title>Knowledge-Free Induction of Inflectional Morphologies</Title>
  <Section position="5" start_page="222" end_page="222" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> We compare this improved algorithm to our former algorithm (Schone and Jurafsky (2000)) as well as to Goldsmith's Linguistica (2000). We use as input to our system 6.7 million words of English newswire, 2.3 million of German, and 6.7 million of Dutch. Our gold standards are the hand-tagged morphologically-analyzed CELEX lexicon in each of these languages (Baayen, et al., 1993). We apply the algorithms only to those words of our corpora with frequencies of 10 or more. Obviously this cut-off slightly limits the generality of our results, but it also greatly decreases processing time for all of compare induced conflation sets to those of CELEX. To evaluate, we compute the number of correct (G26), inserted (G2C), and deleted (G27) words each algorithm predicts for each hypothesized conflation set. If X represents word w's conflation set w according to an algorithm, and if Y represents its w CELEX-based conflation set, then,</Paragraph>
    <Paragraph position="2"> In making these computations, we disregard any CELEX words absent from our data set and vice versa. Most capital words are not in CELEX so this process also discards them. Hence, we also make an augmented CELEX to incorporate capitalized forms.</Paragraph>
    <Paragraph position="3"> Table 5 uses the above scoring mechanism to compare the F-Scores (product of precision and recall divided by average of the two ) of our system at a cutoff threshold of 85% to those of our earlier algorithm (&amp;quot;S/J2000&amp;quot;) at the same threshold; Goldsmith; and a baseline system which performs no analysis (claiming that for any word, its conflation set only consists of itself). The &amp;quot;S&amp;quot; and &amp;quot;C&amp;quot; columns respectively indicate performance of systems when scoring for suffixing and circumfixing (using the unaugmented CELEX). The &amp;quot;A&amp;quot; column shows circumfixing performance using the augmented CELEX. Space limitations required that we illustrate &amp;quot;A&amp;quot; scores for one language only, but performance in the other two language is similarly degraded. Boxes are shaded out for algorithms not designed to produce circumfixes.</Paragraph>
    <Paragraph position="4"> Note that each of our additions resulted in an overall improvement which held true across each of the three languages. Furthermore, using ten-fold cross validation on the English data, we find that F-score differences of the S column are each statistically significant at least at the 95% level.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML