File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-1306_concl.xml

Size: 1,976 bytes

Last Modified: 2025-10-06 13:52:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1306">
  <Title>Sample Selection for Statistical Grammar Induction</Title>
  <Section position="8" start_page="49" end_page="49" type="concl">
    <SectionTitle>
7 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> This empirical study indicates that sample selection can significantly reduce the human effort in parsing sentences for inducing grammars. Our proposed evaluation function using tree entropy selects helpful training examples.</Paragraph>
    <Paragraph position="1"> Choosing from a large pool of unlabeled candidates, it significantly reduces the amount of training annotations needed (by 36% in the experiment). Although the reduction is less dramatic when the pool of candidates is small (by 27% in the experiment), the training examples it selected helped to induce slightly better grammars.</Paragraph>
    <Paragraph position="2"> The current work suggests many potential research directions on selective sampling for grammar induction. First, since the ideas behind the proposed evaluation fimctions are general and independent of formalisms, we would like to empirically determine their effect on other parsers. Next, we shall explore alternative formulations of evaluation functions for the single-learner system. The current approach uses uncertainty-based evaluation functions; we hope to consider other factors such as confidence about the parameters of the grammars and domain knowledge. We also plan to focus on the constituent units within a sentence as training examples. Thus, the evaluation functions could estimate the training utilities of constituent units rather than full sentences. Another area of interest is to experiment with committee-based sample selection using multiple learners. Finally, we are interested in applying sample selection to other natural language learning algorithms that have been limited by the sparsity of annotated data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML