File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/w93-0113_evalu.xml

Size: 3,010 bytes

Last Modified: 2025-10-06 14:00:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W93-0113">
  <Title>Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches</Title>
  <Section position="6" start_page="146" end_page="151" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> The first table, in Figure 5, compares the hits produced by the two techniques over Rogel's and over another online thesaurus, Macquarie's, that we had available in the Laboratory for Computational Linguistics at Carnegie Mellon University. This table compares the results obtained from the windowing technique described in preceding paragraphs to those  With the arrival of Europeans in 1788 , many Aboriginal societies , caught vithin the coils of expanding white settlement , were gradually destroyed .</Paragraph>
    <Paragraph position="1"> Contexts o/nouns extracted after syntactic analysis  and by the window technique, to some frequently occurring words in the corpus  technique (hashed bars) vs syntactic technique (solid bars). The y-axis gives the percentage of hits for each group of frequency-ranked terms.</Paragraph>
    <Paragraph position="2">  obtained from the syntactic technique, retaining only words for which similarity judgements were made by both techniques.</Paragraph>
    <Paragraph position="3"> It can be seen in Figure 5 that simple technique of moving a window over a large corpus, counting co-occurrences of words, and eliminating empty words, provides a good hit ratio for frequently appearing words, since about 1 out of 5 of the 100 most frequent words are found similar to words appearing in the same heading in a hand-built thesaurus. It can also be seen that the performance of the partial syntactic analysis based technique is better for the 600 most frequently appearing nouns, which may be considered as the characteristic vocabulary of the corpus. The difference in performance between the two techniques is statistically significant (p i 0.05). The results of a X 2 test are given in Figure 9. Figures 6 and 7 show the same results as histograms. In these histograms it becomes more evident that the window co-occurrence techniques give more hits for less frequently occurring words, after the 600th most frequent word. One reason for this can be seen by examining the 900th most frequent word, employment. Since the windowing technique extracts up to 20 non-stopwords from either side, there are still 537 context words attached to this word, while the syntactically-based technique, which examines finer-grained contexts, only provides 32 attributes.</Paragraph>
    <Paragraph position="4"> Figure 8 shows the results of applying the less focused dictionary gold standard experiment to the similarities obtained from the corpus by each technique. For this experiment, both techniques provide about the same overlap for frequent words, and the same significantly stronger showing for the rare words for the windowing technique.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML