File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2042_metho.xml

Size: 14,295 bytes

Last Modified: 2025-10-06 14:09:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2042">
  <Title>Toward a Unified Evaluation Method for Multiple Reading Support Systems: A Reading Speed-based Procedure</Title>
  <Section position="4" start_page="244" end_page="245" type="metho">
    <SectionTitle>
3 Evaluation with Reading Speed
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="244" end_page="245" type="sub_section">
      <SectionTitle>
3.1 The purpose
</SectionTitle>
      <Paragraph position="0"> The purpose of our evaluation is to pursue the efficacy of reading support systems with respect not only to the users' reading ability but also to the readability of a complete text or a single sentence. That is, we would like to explicate through the evaluation whether the supporting effect might change due to the text properties such as complexity of a syntactic structure, familiarity of words, and so on.</Paragraph>
      <Paragraph position="1"> In order to depict such a local effect, we assume that the comprehension-based evaluation  would be inappropriate, as it is inefficient to assign a comprehension question to each sentence. Suppose that we could evaluate reading support systems regarding such a local domain.</Paragraph>
      <Paragraph position="2"> Then, we could choose which system is proper, depending on his/her reading ability and the readability of a text. Such usage of reading support systems would be useful.</Paragraph>
    </Section>
    <Section position="2" start_page="245" end_page="245" type="sub_section">
      <SectionTitle>
3.2 Reading Speed as an Evaluation Criterion
</SectionTitle>
      <Paragraph position="0"> In our evaluation method, we adopt reading speed performance as an evaluation criterion in addition to the comprehension performance.</Paragraph>
      <Paragraph position="1"> There are three reasons for this adoption of reading speed.</Paragraph>
      <Paragraph position="2"> First, in contrast to reading comprehension, we can measure sentence-reading speed, and thus we can examine system efficacy on a sentence-level. null Secondly, reading speed can be measured with any texts which is readable by the reading support systems. For instance, we can evaluate system efficacy for texts such as newspapers, magazine articles, web pages, emails, and so on. By contrast, the comprehension-based evaluation requires comprehension questions.</Paragraph>
      <Paragraph position="3"> Thirdly, as shown below, we have statistically found that the reading speed reflects the readability of a sentence. We confirmed the positive correlation (r=0.7, p&lt;0.01) between reading speed and readability of a text calculated with the so-called readability formula (Flesch 1948). Given this positive correlation, we assumed that reading speed indicates readability. Thus, a direct relationship exists between readability and reading speed.</Paragraph>
    </Section>
    <Section position="3" start_page="245" end_page="245" type="sub_section">
      <SectionTitle>
3.3 Reading Speed-based Evaluation
Method
</SectionTitle>
      <Paragraph position="0"> Assuming that reading speed reflects text readability, we can further assume that the reading support systems would affect text readability.</Paragraph>
      <Paragraph position="1"> That is, the positive supporting effect of a system would increase the text readability. Given this, we can evaluate the efficacy of a system on the basis of reading speed.</Paragraph>
      <Paragraph position="2"> Our evaluation method accepts the positive effect of a system if the reading speed is increased. When the reading speed remains invariant, or decreases, the method regards a system as inefficient. Thus, if we compare the reading speed between a supported and a non-supported text, the increase of speed should be greater for those who have a lower reading ability than the highly skilled people on the basis of previous studies.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="245" end_page="247" type="metho">
    <SectionTitle>
4 Evaluation Experiment
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="245" end_page="245" type="sub_section">
      <SectionTitle>
4.1 The Experimental Purpose
</SectionTitle>
      <Paragraph position="0"> We conducted an experiment in order to examine the validity of our method. Given the reading speed evaluation method, it is predicted that reading speed would reflect readability of a text (Hypothesis 1) and reader's ability (Hypothesis 2).</Paragraph>
      <Paragraph position="1"> As for readability of a text, we assume that supporting systems would increase readability of a text. Therefore, we set the following hypothesis: null</Paragraph>
    </Section>
    <Section position="2" start_page="245" end_page="245" type="sub_section">
      <SectionTitle>
Hypothesis 1:
</SectionTitle>
      <Paragraph position="0"> A non-supported English text would be the most difficult to read, whereas a manually translated Japanese text would be the easiest.</Paragraph>
      <Paragraph position="1"> Supported text would fall mid-range.</Paragraph>
      <Paragraph position="2"> The efficacy of the supporting systems is inversely related to the reader's ability, as the previous studies have shown. Therefore, we propose the following hypothesis: Hypothesis2: The inverse relation is detectable between the reading ability and the reading speed increase.</Paragraph>
    </Section>
    <Section position="3" start_page="245" end_page="246" type="sub_section">
      <SectionTitle>
4.2 The Experimental Design
</SectionTitle>
      <Paragraph position="0"> One hundred and two non-native English speakers participated in the experiment. We divided the participants into three groups based on their TOEIC scores: (i) those with a lower score (400-595 pts.), (ii) those with an intermediate score (600-795 pts.); and (iii) those with a higher score (800-995 pts.). The group sizes were: (i) = 36, (ii) = 36, and (iii) = 30. We statistically compared average test scores and reading speed among these groups.</Paragraph>
      <Paragraph position="1"> We prepared eighty-four texts out of our sourced TOEIC texts. Each text consists of a passage and some comprehension questions.</Paragraph>
      <Paragraph position="2"> We added outputs of supporting systems to each text.</Paragraph>
      <Paragraph position="3">  In this experiment, we examined the efficacy of the following supporting systems: a sentence translation system, a word/phrase translation system, and a chunker. Thus, we created four types of test texts: (i) English texts glossed with sentence translations (hereafter, E&amp;MT); (ii) machine-translated texts (MT); (iii) English texts glossed with word translation (RUB); and (iv) English texts with word/phrase boundary markers (CHU).</Paragraph>
      <Paragraph position="4"> In addition, we prepared two types of control texts. One is a raw English text, and the other is a human-translated Japanese text. We randomly selected sixteen texts from each text group and distributed eighty-four to each participant. Thus, the participants are exposed to a variety of texts.</Paragraph>
      <Paragraph position="5"> In the experiment we used a reading process monitoring tool and recorded the reading time per sentence (see Yoshimi et al. 2005 for further description). We calculated the sentence reading speed based on words per minute (WPM) read. As the cursor moves over each number bar, the text is displayed sentence-by-sentence. See Figure 1. There is no limit to how many times a sentence can be viewed.</Paragraph>
      <Paragraph position="6">  We omitted the machine-translated words and focused solely on the number of English words to calculate the reading speed. Therefore, we were able to directly compare the reading speed of a supported text to that of a non-supported English text.</Paragraph>
      <Paragraph position="7"> The goal of this study is to depict the efficacy of the support systems. Hence, the actual reading speed of an English and Japanese mixed text was out of the scope. If reading speed was calculated based on both English and Japanese words, the reading speed of a supported text would be faster than an English text, even though the reading time was the same. This is due to a greater number of words in the supported text. Therefore, we calculated reading speed based solely on English words to account for this implausible effect. We also applied this procedure to a manually translated Japanese text.</Paragraph>
    </Section>
    <Section position="4" start_page="246" end_page="247" type="sub_section">
      <SectionTitle>
4.3 Experimental Results
4.3.1 Tested Data
</SectionTitle>
      <Paragraph position="0"> Before presenting the experimental results, one clarification is in order here. We chose to analyse a manageable 13 reading texts of the whole data, i.e., 84 reading texts. The texts we used varied in topic, style, and length. For instance, they were article-based texts, reports, and advertisements. Among these texts, we examined article type texts.</Paragraph>
      <Paragraph position="1"> There were two reasons for this limitation.</Paragraph>
      <Paragraph position="2"> One concern was with the performance of the reading support systems. We assumed that the system performance was dependant on text styles, and that the system would most effectively support reading of article type texts because they contained less stylistic variations compared with other types of texts, particularly, advertisements.</Paragraph>
      <Paragraph position="3"> The other concern was with text length.</Paragraph>
      <Paragraph position="4"> Article type texts tended to be longer than the others, and hence were more conducive to the supporting effect of the systems as shown in  We are able to conclude in Hypothesis 1 that the reading speed of a supported text is slower than that of a non-supported English text. See Table 2. Therefore, the hypothesis is incorrect with respect to the slowest speeds. However, in regards to the fastest reading speed, Hypothesis  *ENG, English texts; CHU, English texts marked with word/phrase boundary; RUB, English texts glossed with machine-translated words; MT, machine-translated texts; E&amp;MT, English texts glossed with machine-translated sentences; JPN, manually-translated texts  Hypothesis 1 was not supported for the lowest comprehension scores, paralleling reading speed results. Thus, the lowest score was found in the MT texts as shown in Table 3.</Paragraph>
      <Paragraph position="5"> The results supported the hypothesis in respect to the JPN texts scoring highest.</Paragraph>
      <Paragraph position="6">  In order to analyse the reading data in more detail, we compared the correct answer rates among the TOEIC test score groups. We divided the participants into three groups based on TOEIC scores: 400-595 (BEGinner), 600-795 (INTermediate), and 800-995 (ADVanced).</Paragraph>
      <Paragraph position="7"> The correct answer rate of each group is shown in Table 4. In the BEG class, the lowest rate was found in English texts, and the highest was seen in Japanese texts. Although the highest rate can be seen in Japanese texts, the lowest was found in MT texts in the INT class and ADV class.</Paragraph>
      <Paragraph position="8"> On the basis of comprehension test results, we confirmed that all the supporting systems increased comprehension test scores for the BEG class, E&amp;MT for the INT class, but not for the ADV class.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="247" end_page="247" type="metho">
    <SectionTitle>
BEG INT ADV
</SectionTitle>
    <Paragraph position="0"> On the basis of this result, we conclude that the reading support systems help the lowest TOEIC score group participants, while the supporting effect would be minor for the higher score group.</Paragraph>
    <Paragraph position="1"> We analysed the mean rate with one-way ANOVA by contrasting the ENG texts or the JPN texts. The result is shown in Table 5. The asterisk refers to a non-significant difference, while the check mark shows a significant difference.</Paragraph>
    <Paragraph position="2"> In the BEG class, the rate of correct answers in the ENG texts was significantly lower than in the E&amp;MT texts. There was no text that significantly differed from the JPN texts.</Paragraph>
    <Paragraph position="3"> In the INT class, there was no significant difference compared with the ENG texts, while the rate of the JPN texts significantly differed from the CHU, RUB, and MT texts.</Paragraph>
    <Paragraph position="4"> In the ADV class, there was no significant difference comparing with the ENG texts. The rate of the JPN texts showed a significant difference from the MT texts.</Paragraph>
  </Section>
  <Section position="7" start_page="247" end_page="248" type="metho">
    <SectionTitle>
BEG INT ADV
ENG JPN ENG JPN ENG JPN
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> We found variances in the Hypothesis 1.</Paragraph>
    <Paragraph position="3"> Thus, the most readable text was the JPN texts, whereas the least readable text was not the ENG texts but the RUB texts(Table 3). In addition, the other supported texts, the CHU, RUB, and E&amp;MT texts were less readable than the non-supported ENG texts. However, the MT texts were more readable than the ENG texts. There- null fore, we were able to conclude that Hypothesis 1 was supported among the ENG, MT, and JPN texts.</Paragraph>
    <Paragraph position="4"> Given this, we focused on these texts and found that Hypotheses 2 was correct. As Table 6 shows, the reading speed of the MT texts was faster than the ENG texts in all the groups. The increase of the speed was inversely related to the readers' ability. Thus, the increase was 47.3 in the BEG class; 25.4 in the INT class; and 10.9 in the ADV class.</Paragraph>
  </Section>
  <Section position="8" start_page="248" end_page="248" type="metho">
    <SectionTitle>
BEG INT ADV
</SectionTitle>
    <Paragraph position="0"> We analysed the mean reading speed (Table 7) with one-way ANOVA by contrasting the ENG texts or the JPN texts. The speed of the MT texts was significantly faster than that of the ENG texts in the BEG and INT classes. However, in the ADV class, there was no text that significantly deferred from the ENG texts. The reading speed of the JPN texts was significantly faster than the other texts in all the classes. See Table 8.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML