File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1193_intro.xml

Size: 3,054 bytes

Last Modified: 2025-10-06 14:02:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1193">
  <Title>Acquiring an Ontology for a Fundamental Vocabulary</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Resources
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 The Lexeed Semantic Database of
Japanese
</SectionTitle>
      <Paragraph position="0"> The Lexeed Semantic Database of Japanese is a machine readable dictionary that covers the most common words in Japanese (Kasahara et al., 2004). It is built based on a series of psycholinguistic experiments where words from two existing machine-readable dictionaries were presented to multiple subjects who ranked them on a familiarity scale from one to seven, with seven being the most familiar (Amano and Kondo, 1999). Lexeed consists of all open class words with a familiarity greater than or equal to ve.</Paragraph>
      <Paragraph position="1"> The size, in words, senses and de ning sentences is given in Table 1.</Paragraph>
      <Paragraph position="2">  The de nition sentences for these sentences were rewritten by four di erent analysts to use only the 28,000 familiar words and the best de nition chosen by a second set of analysts.</Paragraph>
      <Paragraph position="3"> Not all words were used in de nition sentences: the de ning vocabulary is 16,900 di erent words (60% of all possible words were actually used in the de nition sentences). An example entry for the word a0a2a1a4a3a4a5a7a6 doraib a \driver&amp;quot; is given in Figure 1, with English glosses added. The underlined material was not in Lexeed originally, we extract it in this paper. doraib a \driver&amp;quot; has a familiarity of 6.55, and three senses. The rst sense was originally de ned as just the synonym nejimawashi \screwdriver&amp;quot;, which has a familiarity below 5.0. This was rewritten to the explanation: \A tool for inserting and removing screws&amp;quot;.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 The Hinoki Treebank
</SectionTitle>
      <Paragraph position="0"> In order to produce semantic representations we are using an open source HPSG grammar of Japanese: JACY (Siegel and Bender, 2002), which we have extended to cover the dictionary de nition sentences (Bond et al., 2004). We have treebanked 23,000 sentences using the [incr tsdb()] pro ling environment (Oepen and Carroll, 2000) and used them to train a parse ranking model for the PET parser (Callmeier, 2002) to selectively rank the parser output.</Paragraph>
      <Paragraph position="1"> These tools, and the grammar, are available from the Deep Linguistic Processing with HPSG Initiative (DELPH-IN: http://www.delph-in.</Paragraph>
      <Paragraph position="2"> net/).</Paragraph>
      <Paragraph position="3"> We use this parser to parse the de ning sentences into a full meaning representation using minimal recursion semantics (MRS: Copestake et al. (2001)).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML