File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-2008_intro.xml

Size: 2,535 bytes

Last Modified: 2025-10-06 14:02:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-2008">
  <Title>Greek Word Segmentation Using Minimal Information</Title>
  <Section position="3" start_page="5" end_page="5" type="intro">
    <SectionTitle>
3 Results
</SectionTitle>
    <Paragraph position="0"> Six different conditions were tested, corresponding to the three variants FSTs (3), (3</Paragraph>
    <Paragraph position="2"> ), both with and without the exact-word constraint in FSM (2). Each of these were composed (separately) with the &amp;quot;segment identity&amp;quot; acceptor (1) for a given utterance. The outputprojection of the best path from each resulting FST was converted back into text and compared to the text of the original utterance. Scores for both boundaries and words are reported (where a word is counted as correctly segmented only if both its left and right boundaries are correctly placed). In the case where several best-paths of equal cost exist, the average scores for precision and recall are counted.</Paragraph>
    <Paragraph position="3"> The results with and without the number of words known are shown in Tables 2 and 3, following. In both cases, the precision scores patterned as expected. The upper bound condition (representing a supervised case, where statistics on the word boundaries are available for the training data) proved the most accurate on the test data. This suggests (as has been confirmed for English in such studies as Brent 1999a) that the learning of patterns over already-acquired vocabulary has perhaps the largest effect in the acquisition of new vocabulary.</Paragraph>
    <Paragraph position="4"> The utterance-based approximation, corresponding most closely to (Aslin et al., 1996), seems to be slightly better overall than the lower bound. Without the number of words known, (3) has an F-score of 20.2 for words and 70.2 for boundaries, whereas (3</Paragraph>
    <Paragraph position="6"> scores of only 17.0 (word) and 68.0 (boundaries), though this difference may not be significant. This difference was less than expected, given preliminary examination of the training data; it may be that once the set of allowable word-final phonemes is observed, the relative probabilities of those phonemes is not as usefully learned from utterance boundaries. However, the lower bound (corresponding to purely symbolic knowledge of the allowable word-final segments) is significantly better than the random walk, suggesting that any knowledge, no matter how rudimentary, begins to make a difference.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML