File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/p93-1035_concl.xml
Size: 2,968 bytes
Last Modified: 2025-10-06 13:57:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P93-1035"> <Title>Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach</Title> <Section position="7" start_page="263" end_page="264" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> In this paper, we have described a new approach for learning a grammar to automatically parse text. The method can be used to obtain high parsing accuracy with a very small training set.</Paragraph> <Paragraph position="1"> Instead of learning a traditional grammar, an ordered set of structural transformations is learned that can be applied to the output of a very naive parser to obtain binary-branching trees with unlabelled nonterminals. Experiments have shown that these parses conform with high accuracy to the structural descriptions specified in a manually annotated corpus. Unlike other recent attempts at automatic grammar induction that rely heavily on statistics both in training and in the resulting grammar, our learner is only very weakly statistical. For training, only integers are needed and the only mathematical operations carried out are integer addition and integer comparison. The resulting grammar is completely symbolic. Unlike learners based on the inside-outside algorithm which attempt to find a grammar to maximize the probability of the training corpus in hope that this grammar will match the grammar that provides the most accurate structural descriptions, the transformation-based learner can readily use any desired success measure in learning.</Paragraph> <Paragraph position="2"> We have already begun the next step in this project: automatically labelling the nonterminal nodes. The parser will first use the ~ransforma~ioual grammar to output a parse tree without nonterminal labels, and then a separate algorithm will be applied to that tree to label the nonterminals. The nonterminal-node labelling algorithm makes use of ideas suggested in (Bri92), where nonterminals are labelled as a function of the la- null bels of their daughters. In addition, we plan to experiment with other types of transformations.</Paragraph> <Paragraph position="3"> Currently, each transformation in the learned list is only applied once in each appropriate environment. For a transformation to be applied more than once in one environment, it must appear in the transformation list more than once. One possible extension to the set of transformation types would be to allow for transformations of the form: add/delete a paren as many times as is possible in a particular environment. We also plan to experiment with other scoring functions and control strategies for finding transformations and to use this system as a postprocessor to other grammar induction systems, learning transformations to improve their performance. We hope these future paths will lead to a trainable and very accurate parser for free text.</Paragraph> </Section> class="xml-element"></Paper>