File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0203_concl.xml
Size: 3,655 bytes
Last Modified: 2025-10-06 13:54:08
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0203"> <Title>Using a probabilistic model of discourse relations to investigate word order variation</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions and potential applications </SectionTitle> <Paragraph position="0"> The statistical model here uses a combination of referential and lexical features annotated for a small window surrounding the target utterance to represent the local discourse context surrounding utterances with non-canonical and canonical word orders. The primary goal was to model the correlations between discourse relations and non-canonical syntax. Due to the difficulties inherent in annotating discourse relations directly, the featural approximation was devised as a practical alternative.</Paragraph> <Paragraph position="1"> Overall, the method used here yielded some interesting new insights into the contexts that favor the use of four types of non-canonical word order.</Paragraph> <Paragraph position="2"> The complexity of this approach does make it difficult to draw simple conclusions about the relationship between discourse relations and non-canonical syntactic forms. However, the strength of some of the correlations found here merits further investigation. The data also lend support for the idea that some aspects of discourse relations, both syntactic and semantic, can be inferred from combinations of lower-level linguistic features.</Paragraph> <Paragraph position="3"> An important factor in improving upon the current project is the need for larger amounts of data.</Paragraph> <Paragraph position="4"> The significance of any particular feature is greatly affected by the quantity of data. This was a particular issue for the lexical feature values, where it prevented inclusion of several of the less frequent connectives with better understood discourse structuring properties, like well and now. In addition, more data may also be required in order to support the use of more complex statistical models. Automatic methods of annotating the referential features or the availability of larger corpora marked up with coreferential and inferential relations and with a rich variety of syntactic forms could be used to test more accurately the predictions in Section 3.</Paragraph> <Paragraph position="5"> The technique used here for approximating discourse relations through more easily annotated features has at least two interesting potential applications. One, given the significant correlation of these features with non-canonical word order variation, the discriminative models trained here could be used as classifiers which could label discourse contexts (feature vectors) with the form best suited to the context for the surface realization stage in a natural language generation system.</Paragraph> <Paragraph position="6"> Secondly, the feature set used here could be applied to the problem of automatic classification of discourse relations. In conjunction with a relatively small set of pairs of sentences for which there is high inter-annotator agreement when hand-annotated for type of discourse relation, the lexical and referential features here could serve as an initial feature set for bootstrapping the development of a statistical discourse relation classifier. This application would require stipulation of a predetermined set of discourse relations--a requirement the present study wished to avoid. However, given the practical need for a statistical relation classifier, a set of relations could be constructed suitable to the domain of use.</Paragraph> </Section> class="xml-element"></Paper>