File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/91/h91-1067_concl.xml
Size: 1,739 bytes
Last Modified: 2025-10-06 13:56:40
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1067"> <Title>Automatic Acquisition of Subcategorization Frames from Tagged Text</Title> <Section position="6" start_page="343" end_page="343" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> The initial results reported above are only the beginning of what promises to be a be large and rewarding endeavor. In a forthcoming paper Brent reports on acquisition of subeategorization frames using raw, untagged text. Running on raw text, the program starts with only the grammar and a lexicon of some 200 closed-class words. This opens up the possibility of learning from literally hundreds of millions of words of text without worrying the possible major categories of all the words or their relative frequencies.</Paragraph> <Paragraph position="1"> Along with implementing detection schemes for more SFs, our next major goal will be noise-reduction. If that is successful we hope to release to the community a substantial dictionary of verbs and subcategorization frames.</Paragraph> <Paragraph position="2"> We also hope to use the SF information for semantic categorization \[6\] using lexical-syntax/lexical-semantics constraints \[10, 11\]. A particularly clear example of how this can be done is provided by the verbs taking DO&clause with a non-pleonastic subject: all such verbs can describe communication \[13\]. The complete list of DO&clause verbs our program program found more than once, running in raw text mode on 2.6 million words of Wall Street Journal, supports Zwicky's observation (3).</Paragraph> <Paragraph position="3"> (1) advise, assure, convince, inform, reassure, remind, tell, warn</Paragraph> </Section> class="xml-element"></Paper>