File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/p99-1081_evalu.xml
Size: 2,118 bytes
Last Modified: 2025-10-06 14:00:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1081"> <Title>An Unsupervised Model for Statistically Determining Coordinate Phrase Attachment</Title> <Section position="7" start_page="611" end_page="612" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> Decisions were deemed correct if they agreed with the decision in the corresponding Tree-Bank data. The correct attachment was chosen 72% of the time on the 500-phrase development corpus from the WSJ TreeBank. Because it is a forced binary decision, there are no measurements for recall or precision. If low attachment is always chosen, the accuracy is 64%. After further development the model will be tested on a testing corpus.</Paragraph> <Paragraph position="1"> When evaluating the effectiveness of an unsupervised model, it is helpful to compare its performance to that of an analogous supervised model. The smaller the error reduction when going from unsupervised to supervised models, the more comparable the unsupervised model is to its supervised counterpart. To our knowledge there has been very little if any work in the area of ambiguous CPs. In addition to developing an unsupervised CP disambiguation model, In \[MG, in prep\] we have developed two supervised models (one backed-off and one maximum entropy) for determining CP attachment. The backed-off model, closely based on \[CB95\] performs at 75.6% accuracy. The reduction error from the unsupervised model presented here to the backed-off model is 13%. This is comparable to the 14.3% error reduction found when going from JAR98\] to \[CB95\].</Paragraph> <Paragraph position="2"> It is interesting to note that after reducing the volume of training data by half there was no drop in accuracy. In fact, accuracy remained exactly the same as the volume of data was increased from half to full. The backed-off model in \[MG, in prep\] trained on only 1380 training phrases. The training corpus used in the study presented here consisted of 119629 training phrases. Reducing this figure by half is not overly significant.</Paragraph> </Section> class="xml-element"></Paper>