File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/04/n04-2008_ackno.xml

Size: 3,431 bytes

Last Modified: 2025-10-06 13:50:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-2008">
  <Title>Greek Word Segmentation Using Minimal Information</Title>
  <Section position="4" start_page="5" end_page="5" type="ackno">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
4.1 Comparisons with Aslin et al. (1996)
</SectionTitle>
      <Paragraph position="0"> Obviously, the cues of preceding and following segments are in and of itself insufficient to predict a word boundary with any reasonable degree of accuracy, just as Christiansen et al. (1998) found that no one cue was sufficient for English. However, a few comparisons with Aslin's et al. (1996) data in Table 1 may be useful, although they should be interpreted cautiously given the differences in the training and testing corpora between their study and this one. Their results for the singlephoneme condition have nearly equal hits and false alarms--a precision of about 51%. They apparently do not consider this sufficient evidence of learning, although it is significantly better than their random baseline. Similarly, the worst non-random condition  reported here (lower bound without constraint (2)) also has a precision of 51%. This, too, is difficult to call &amp;quot;learning,&amp;quot; as it represents the heuristic of always inserting a word boundary any time there could be one.</Paragraph>
      <Paragraph position="1"> The only fact that has been learned is which segments cannot be (excepting foreign loan-words) word-final.</Paragraph>
      <Paragraph position="2"> However, if the criterion for learning (or at least satisfactory performance) is hits exceeding false alarms, then the utterance-boundary statistical heuristic, with 739 hits and only 396 false alarms, is nearly as accurate as Table 1's two-phoneme condition. While further information (whether phonological features, longer strings of phonemes, or some other cue) is needed to reach the 74% accuracy of Table 1's three-phoneme condition, it seems that even these very basic cues come closer to Aslin's et al. (1996) results than might be supposed. Importantly, the same general trend was shown-that utterance-final information translates into word-boundary information not only for English, but for other languages such as Modern Greek as well.</Paragraph>
      <Paragraph position="3"> A number of further directions are possible under this framework, including:  tual information measures over two adjacent segments as cues to the likelihood of word boundaries between those two segments, as suggested in e.g., (Brent, 1999a). (2) Developing more plausible models for approximating word-length distributions from utterance-length information, distances between stressed vowels, pause information, and other salient cues available to children. (3) Incorporating stress cues (as potentially signaling both beginnings and approaching ends of content words) both alone and in combination with segmental cues.</Paragraph>
      <Paragraph position="4"> Preliminary work on each of these avenues is currently underway. While some of these heuristics may require the use of other techniques in addition to finite-state techniques, the general finite-state framework is expected to prove useful as an organizing tool for comparing various cues in a simple, rational, and transparent way.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML