File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1041_intro.xml

Size: 2,127 bytes

Last Modified: 2025-10-06 14:03:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1041">
  <Title>High Precision Treebanking Blazing Useful Trees Using POS Information</Title>
  <Section position="3" start_page="330" end_page="330" type="intro">
    <SectionTitle>
2 The Hinoki Treebank
</SectionTitle>
    <Paragraph position="0"> The Hinoki treebank currently consists of around 95,000 annotated dictionary de nition and example sentences. The dictionary is the Lexeed Semantic Database of Japanese (Kasahara et al., 2004), which consists of all words with a familiarity greater than or equal to ve on a scale of one to seven. This gives 28,000 words, divided into 46,347 different senses. Each sense has a de nition sentence and example sentence written using only these 28,000 familiar words (and some function words). Many senses have more than one sentence in the de nition: there are 81,000 de ning sentences in all.</Paragraph>
    <Paragraph position="1"> The data used in our evaluation is taken from the rst sentence of the de nitions of all words with a familiarity greater than six (9,854 sentences). The Japanese grammar JACY was extended until the coverage was over 80% (Bond et al., 2004).</Paragraph>
    <Paragraph position="2"> For evaluation of the treebanking we selected 5,000 of the sentences that could be parsed, and divided them into ve 1,000 sentence sets (A E). Definition sentences tend to vary widely in form depending on the part of speech of the word being dened each set was constructed with roughly the same distribution of de ned words, as well as having roughly the same length (the average was 9.9, ranging from 9.5 10.4).</Paragraph>
    <Paragraph position="3"> A (simpli ed) example of an entry (Sense 2 of a0a2a1a2a3a5a4 kflaten curtain: any barrier to communication or vision ), and a syntactic view of its parse are given in Figure 1. There were 6 parses for this definition sentence. The full parse is an HPSG sign, containing both syntactic and semantic information.</Paragraph>
    <Paragraph position="4"> A view of the semantic information is given in Figure 21.</Paragraph>
    <Paragraph position="6"/>
  </Section>
class="xml-element"></Paper>
Download Original XML