File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1113_metho.xml
Size: 3,793 bytes
Last Modified: 2025-10-06 14:13:27
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1113"> <Title>Natural Language Research</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT ACCOMPLISHMENTS </SectionTitle> <Paragraph position="0"> * Developed a new weakly supervised learning algorithm that can bracket text using a simple distributional error-correcting technique, which performs as well as recent applications of the I-O algorithm while using an order of magnitude less training data.</Paragraph> <Paragraph position="1"> A similar technique has been applied successfully to the problem of prepositional phrase attachment.</Paragraph> <Paragraph position="2"> * Developed techniques that combine WordNet and corpus-based lexical statistics acquired from the Penn Treebank. These techniques are being applied to the resolution of syntactic ambiguity.</Paragraph> <Paragraph position="3"> * The Penn Treebank project has released the results of its first three year phase as a CDROM through the Linguistic Data Consortium, consisting of 4.5 million words of part-of-speech tagged text and 3 million words of skeletally parsed text, including a parsed version of the Brown corpus.</Paragraph> <Paragraph position="4"> * A categorial grammar based theory of intonation structure and its discourse meaning has been developed and implemented in a database query system which takes as input an orthographic representation of spoken questions including intonational annotations, and yields as output a synthesized spoken response as a speech wave bearing an intonation contour that is appropriate to the context established by the question.</Paragraph> <Paragraph position="5"> Developed an environment, Design World, for simulating interactive task-oriented dialogue between two agents, that allows us to explore a number of key issues in inter-agent coordination.</Paragraph> <Paragraph position="6"> Investigated the way in which dialogue processing is cued by patterns of spoken language in task-oriented interactions between multiple agents. Results show that the redundancy that makes communication more robust is typically marked by prosodic destressing or broad focus.</Paragraph> </Section> <Section position="3" start_page="0" end_page="419" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> * Explore statistical morphology induction, lexical disambiguation, and language modeling with stochastic dependency grammars.</Paragraph> <Paragraph position="1"> * Extend the use of WordNet and lexical statistics to the resolution of a broader set of syntactic ambiguities, and to apply these techniques to the construction of stochastic language models.</Paragraph> <Paragraph position="2"> * Contribute to a model of limited processing for discourse, using corpora collected by the Linguistic Data Consortium as the basis for a corpus-based analysis of bottom-up cues to discourse structure, such as variation in the forms of referring expressions, and prosodic marking by topline and baseline variation.</Paragraph> <Paragraph position="3"> * Extend the part-of-speech disambiguation strategies to the disambiguation of lexical tree assignments to words in a lexicalized tree-adjoining grammar.</Paragraph> <Paragraph position="4"> * Develop a minimal-response part-of-speech tagger for conversational German without the use of on-line dictionaries and with minimal human resources. * Investigate the acquisition of lexical information about novel verbs by combining information about syntactic contexts with information about semantic relationships acquired using WordNet.</Paragraph> <Paragraph position="5"> * Develop the 'strategic' or discourse-planning component of the spoken reply system.</Paragraph> </Section> class="xml-element"></Paper>