File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/w99-0909_evalu.xml

Size: 2,395 bytes

Last Modified: 2025-10-06 14:00:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0909">
  <Title>Unsupervised Lexical Learning with Categorial Grammars</Title>
  <Section position="6" start_page="63" end_page="361" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> In Table 2 we report the results of these experiments. The CCW Preset column indicates whether the closed-class words were provided or not. The lexicon accuracy column is a measure, calculated by manual analysis, of the percentage of lexical entries i.e. entries that have word-category pairs that can plausibly be accepted as existing in English. This should be taken together with the parse accuracy, which is the percentage of correctly parsed examples i.e.</Paragraph>
    <Paragraph position="1"> a linguistically correct syntactic analysis. The  results for the first two corpora are extremely encouraging with 100% accuracy in both measures. While these experiments are only on relatively simple corpora, these results strongly suggest that the approach can be effective. It  should be noted that any experiment on corpus 2 without the closed-class words being set did not terminate, as the sentences in that corpus are significantly longer and each word may be a large number of categories. It is therefore clear, that setting the closed-class words greatly increases speed and that we need to consider methods of relieving the strain on the parser if the approach is to be useful on more complex corpora.</Paragraph>
    <Paragraph position="2"> The results with the LLL corpus are also encouraging in part. A lexical accuracy of 77.7% and a parse accuracy of nearly 60% (note this measure of accuracy is strict) on such a small sparse corpus is a good result and analysis suggests most errors were made due to the small coverage of the grammar - especially not allowing any movement. Errors also suggest that adding some further linguistic constraints - for example not allowing words to be assigned the basic category s - and strengthening the compression heuristic may provide improvements.</Paragraph>
    <Paragraph position="3"> It was these problems, along with the sparseness of the corpus, that led to the poor results with the LLL corpus without preset words.</Paragraph>
    <Paragraph position="4"> Table 3 shows predictably good results for parsing the test sets with the learned lexicons.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML