File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/w98-1125_concl.xml
Size: 3,439 bytes
Last Modified: 2025-10-06 13:58:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1125"> <Title>Discourse Parsing: A Decision Tree Approach</Title> <Section position="7" start_page="221" end_page="223" type="concl"> <SectionTitle> 4 Conclusion </SectionTitle> <Paragraph position="0"> The paper demonstrated how it is possible to build a discourse parser which performs reasonably well on diverse data. It relies crucially on (a) feature selection by a decision tree and (b) the way a discourse is encoded. While we have found that distance and . ~, .,.,~,1,,, ........... ,4. ........... I I I : :2 ::2 &% .&quot;,'7 -'. r ~&quot;r:'='~---:---~'-T:~'~':-:2.Z-'~:'=':'=-~-:-~-~ ............................. -t~&quot; --- ~ - ;-.--- : ---.-- - ~ ~'=~-.'..=.:'..=..:.~7~..~. _~.--: 7&quot; -&quot; 7 ;~\]~(:.--'.3 ...~........z...=-~. ~ -, ............... ~ ............ . ............................ ........ ~-JT~:.=~-_-==:.~---=~.~.7 ................................ 0 ...................... o ............................... the tokenizer program erroneously identified as a connective.</Paragraph> <Paragraph position="1"> shikashi but, ippou whereas, daga but, soreo (.), shikamo moreover, tokoroga but, soshite and, soreni moreover, sokode incidentally, soredemo still, sore (.), tadashi provided that, soredakeni all the more because, tokini by the way, dakara so, demo but, sonoue moreover, sitagatte therefore, dewa now, nimokakawarazu despite, soredewa well, sorede and then, sorekara after that, towaie nevertheless, shitagatte therefore, tsuide while, katoitte but. dakarakoso consequently, matawa or, soretomo or else, soreto for another thing, nanishiro anyhow, omakeni in addition, sunawachi in other words, toiunowa because, naraba if. sonokawari instead, samunaktL.ba or else, sunawachi namely, naishiwa or. sate by the way, toshite ('.). toiunomo because, sorenimokakawarazu nonetheless, sorenishitemo yet, oyobi moreover, tokorode incidentally, nazenara because, tosureba if, nanishiro anyhow, otto (*), nanoni but length features are more prominent than lexical features, we were not able to establish the usefulness of the latter features, which is expected from earlier works on discourse as well as on sentence parsing (Magerman, 1995; Collins, 1996).</Paragraph> <Paragraph position="2"> The following are some of the future research issues: null Building a larger corpus Our discourse parser did not perform as well as a statistical sentence parser, which normally performs with over 80% precision. We suspect that the reason may have ~to do with inconsistencies in tagging and the size of the corpus we used.</Paragraph> <Paragraph position="3"> Parsing with the rhetorical structure theory Technically it is straightforward to turn the present parsing method into a full-fledged RST parser, which involves modifying the way classes are defined and redefining constraints on a structure of discourse. A problem, however, is that the task of assigning sentences to rhetorical relations with some consistency could turn out to be quite difficult for human coders.</Paragraph> <Paragraph position="4"> Extending to other Languages The general framework in which our parser is built does not presuppose elements specific to a particular language.</Paragraph> <Paragraph position="5"> It should be possible to carry it over to other languages with no significant modification to it:</Paragraph> </Section> class="xml-element"></Paper>