File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/a94-1013_evalu.xml

Size: 6,438 bytes

Last Modified: 2025-10-06 14:00:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1013">
  <Title>Adaptive Sentence Boundary Disambiguation</Title>
  <Section position="5" start_page="80" end_page="81" type="evalu">
    <SectionTitle>
3 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> We tested the boundary labeler on a large body of text containing 27,294 potential sentence-ending punctuation marks taken from the Wall Street Journal portion of the ACL/DCI collection (Church and Liberman, 1991). No preprocessing was performed on the test text, aside from removing unnecessary headers and correcting existing errors. (The sen- null tence boundaries in the WSJ text had been previously labeled using a method similar to that used in PARTS and is described in more detail in (Liberman and Church, 1992); we found and corrected several hundred errors.) We trained the weights in the neural network with a back-propagation algorithm on a training set of 573 items from the same corpus. To increase generalization of training, a separate cross-validation set (containing 258 items also from the same corpus) was also fed through the network, but the weights were not trained on this set. When the cumulative error of the items in the cross-validation set reached a minimum, training was stopped. Training was done in batch mode with a learning rate of 0.08. The entire training procedure required less than one minute on a Hewlett Packard 9000/750 Workstation. This should be contrasted with Riley's algorithm which required 25 million words of training data in order to compile probabilities. null If we use Riley's statistics presented in Section 1, we can determine a lower bound for a sentence boundary disambiguation algorithm: an algorithm that always labels a period as a sentence boundary would be correct 90% of the time; therefore, any method must perform better than 90%. In our experiments, performance was very strong: with both sensitivity thresholds set to 0.5, the network method was successful in disambiguating 98.5% of the punctuation marks, mislabeling only 409 of 27,294. These errors fall into two major categories: (i)&amp;quot;false positive&amp;quot;: the method erroneously labeled a punctuation mark as a sentence boundary, and (ii) &amp;quot;false negative&amp;quot;: the method did not label a sentence boundary as such. See Table 1 for details.</Paragraph>
    <Paragraph position="1">  items; to -- tl -- 0.5, 6-context, 2 hidden units.</Paragraph>
    <Paragraph position="2"> The 409 errors from this testing run can be decomposed into the following groups: 37.6% false positive at an abbreviation within a title or name, usually because the word following the period exists in the lexicon with other parts-of-speech (Mr. Gray, Col.</Paragraph>
    <Paragraph position="3"> North, Mr. Major, Dr. Carpenter, Mr.</Paragraph>
    <Paragraph position="4"> Sharp). Also included in this group are items such as U.S. Supreme Court or U.S.</Paragraph>
    <Paragraph position="5"> Army, which are sometimes mislabeled because U.S. occurs very frequently at the end of a sentence as well.</Paragraph>
    <Paragraph position="6"> 22.5% false negative due to an abbreviation at the end of a sentence, most frequently Inc., Co., Corp., or U.S., which all occur within sentences as well.</Paragraph>
    <Paragraph position="7"> 11.0% false positive or negative due to a sequence of characters including a punctuation mark and quotation marks, as this sequence can occur both within and at the end of sentences. null 9.2% false negative resulting from an abbreviation followed by quotation marks; related to the previous two types.</Paragraph>
    <Paragraph position="8"> 9.8% false positive or false negative resulting from presence of ellipsis (...), which can occur at the end of or within a sentence.</Paragraph>
    <Paragraph position="9"> 9.9% miscellaneous errors, including extraneous characters (dashes, asterisks, etc.), ungrammatical sentences, misspellings, and parenthetical sentences.</Paragraph>
    <Paragraph position="10"> The results presented above (409 errors) are obtained when both to and tl are set at 0.5. Adjusting the sensitivity thresholds decreases the number of punctuation marks which are mislabeled by the method. For example, when the upper threshold is set at 0.8 and the lower threshold at 0.2, the network places 164 items between the two. Thus when the algorithm does not have enough evidence to classify the items, some mislabeling can be avoided, s We also experimented with different context sizes and numbers of hidden units, obtaining the results shown in Tables 2 and 3. All results were found using the same training set of 573 items, cross-validation set of 258 items, and mixed-case test set of 27,294 items. The &amp;quot;Training Error&amp;quot; is one-half the sum of all the errors for all 573 items in the training set, where the &amp;quot;error&amp;quot; is the difference between the desired output and the actual output of the neural net. The &amp;quot;Cross Error&amp;quot; is the equivalent value for the cross-validation set. These two error figures give an indication of how well the network learned the training data before stopping.</Paragraph>
    <Paragraph position="11"> We observed that a net with fewer hidden units results in a drastic decrease in the number of false positives and a corresponding increase in the number of false negatives. Conversely, increasing the number of hidden units results in a decrease of false negatives (to zero) and an increase in false positives. A network with 2 hidden units produces the best overall error rate, with false negatives and false positives nearly equal.</Paragraph>
    <Paragraph position="12"> From these data we concluded that a context of six surrounding tokens and a hidden layer with two 5We will report on results of varying the thresholds in future work.</Paragraph>
    <Paragraph position="13"> units worked best for our test set.</Paragraph>
    <Paragraph position="14"> After converting the training, cross-validation and test texts to a lower-case-only format and retraining, the network was able to successfully disambiguate 96.2% of the boundaries in a lower-case-only test text. Repeating the procedure with an upper-caseonly format produced a 97.4% success rate. Unlike most existing methods which rely heavily on capitalization information, the network method is reasonably successful at disambiguating single-case texts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML