File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/w99-0509_evalu.xml

Size: 3,288 bytes

Last Modified: 2025-10-06 14:00:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0509">
  <Title>An Overt Semantics with a Machine-guided Approach for Robust LKBs</Title>
  <Section position="8" start_page="65" end_page="66" type="evalu">
    <SectionTitle>
6 Summary
</SectionTitle>
    <Paragraph position="0"> The MIkrokosmos lexicon acqulmtlon group has acquired the following data - Spanish lexicon 7,000 word sense entrms (35,000 word sense entries after applying the morpho-semantm lexlcal rules), Chinese lexicon about 3,000 word sense entries, and Enghsh, about 15,000 word sense entrms so far For instance, the acquisition of 15,000 word sense entries took one year and involved 50% of the time of a computational hngmst (to develop the methodology, train the acqulrers and design the GUIs), 50 acqmrer hours per week, 10 hours per week of a programmer to tmplement the GUIs, mamtam the tools and test the entries Our approach to the development of lex, cons differs from others m that our rules apply directly to semantlc frames and not to the basic forms of verbs Our methodology allows us to alleviate the burden of manual checking by applying linking rules directly on the semantics of the lexemes Some rules add discourse related features, such as focus m some alternations, e g, they zmproved the s~tuatwn --+ the s~tuatzon zmproved What is Important to evaluate is how much do we gam by using rules and other resources Today, ~t ~s still d~fficult to say exactly how much Adequately predicting the subcategonzatlons for a semantic class depends on its gram size the finergrained, the better the pred2ctwn wall be However, m NLP apphcatlons, where one Is constramed by tlme, only the semantics necessary for a particular application is acquired, which means that m many cases the semantms is left at a coarser grain s~ze than the one required to predict the subcategomzatlons In practice, we overgenerate some subcategortzatlons and need therefore to have them checked by humans This ~s why we have concentrated on a small set of rules Results on that trade-off issue have been reported in Vmgas et al (1998) Our experience in large-scale acqulsltmn of lexicons shows that Idiosyncrasies overrule many of our general rules This is mainly due to the fact that we need a more fine-gramed semantms than the one which is available now This ,s not just a criticism of our framework, ,t is a genelal fact that we all encounter when mvestlgatmg lexlcal semantics This might be due to the fact that we work m a synchronic perspective (a highly recommended approachl), whereas language evolves constantly, thus creatmg &amp;quot;artificial&amp;quot; ldmsyncrasms In any case one cannot avold them when butldmg a computatmnal semantic lexicon We have also learnt dunng the acquisition of the Mtkrokosmos lexmons that different acqulrers, who have been through the same Intensive tralmng, will arrive at the same numbel of meanings for a word, in more than 90% of the cases The meaning of a word might differ, for different trained acqulrers, along ISA links Corpora also Influence the decision of the acqmrers, and here too we have seen some human &amp;quot;mconsltencms&amp;quot; which we thmk could be &amp;quot;corrected&amp;quot; automatically, as discussed in the followmg section</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML