File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-3030_intro.xml

Size: 2,858 bytes

Last Modified: 2025-10-06 14:03:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3030">
  <Title>Organizing English Reading Materials for Vocabulary Learning</Title>
  <Section position="3" start_page="117" end_page="117" type="intro">
    <SectionTitle>
2 Algorithm
</SectionTitle>
    <Paragraph position="0"> We want to prepare efficient courseware for learning a target vocabulary. We defined efficiency in terms of the amount of reading materials that must be read to learn a required vocabulary. That is, efficient courseware is as short as possible, while containing the required vocabulary. We used a greedy method to develop the efficient courseware (Utiyama et al., 2004).</Paragraph>
    <Paragraph position="1"> Let C be the courseware under development and V be the target vocabulary to be learned. We iteratively select a document (from the target corpus) that has the largest number of new types8 (types contained in V but not in C) and put it into C until C covering all of V . &amp;quot;C covers all of V &amp;quot; means that each word in V occurs at least once in a document in C.</Paragraph>
    <Paragraph position="2"> More concretely, let Vtodo be the part of V not covered by C, and let Vdone be V [?]Vtodo. We iteratively put document d into C that maximizes G(*),</Paragraph>
    <Paragraph position="4"> where W(d) is the set of types in d, E(|W(*)|) is the average for |W(*) |over the whole corpus, and k1 and b are parameters that depend on the corpus.</Paragraph>
    <Paragraph position="5"> We set k1 as 1.5 and b as 0.75. g(d|Vx) takes a large value when there is a large number of common types between W(d) and Vx and d is short. These effects are due to |W(d)[?]Vx |and |W(d)|E(|W(*)|) respectively. As g(*) is based on the Okapi BM25 function (Robertson and Walker, 2000), which has been shown to be quite efficient in information retrieval,9 we expected  cient in information retrieval. Readers are referred to papers by the Text REtrieval Conference (TREC, http://trec.nist.gov/), for example.</Paragraph>
    <Paragraph position="6"> g(*) to be effective in retrieving documents relevant to the target vocabulary.</Paragraph>
    <Paragraph position="7"> In Eq. (1), a is used to combine the scores of document d, which are obtained by using Vtodo and Vdone. It is defined as</Paragraph>
    <Paragraph position="9"> This implies that even if |W(d) [?] Vtodo |is 1, it is as important as |W(d) [?] Vdone |= |Vdone|. Consequently, G(*) uses documents that have new types of the given vocabulary in preference to documents that have covered types.</Paragraph>
    <Paragraph position="10"> To summarize, efficient courseware is constructed by putting document d with maximum G(*) into C until C covers all of V . This allows us to construct efficient courseware because G(*) takes a large value when a document has a large number of new types and is short.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML