File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/p99-1081_concl.xml
Size: 1,563 bytes
Last Modified: 2025-10-06 13:58:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1081"> <Title>An Unsupervised Model for Statistically Determining Coordinate Phrase Attachment</Title> <Section position="8" start_page="612" end_page="612" type="concl"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> In an effort to make the heuristic concise and portable, we may have oversimplified it thereby negatively affecting the performance of the model. For example, when the heuristic came upon a noun phrase consisting of more than one consecutive noun the noun closest to the cc was extracted. In a phrase like coffee and rhubarb apple pie the heuristic would chose rhubarb as the n3 when clearly pie should have been chosen. Also, the heuristic did not check if a preposition occurred between either nl and cc or cc and n3. Such cases make the CP ambiguous thereby invalidating it as an unambiguous training example. By including annotated training data from the TreeBank set, this model could be modified to become a partially-unsupervised classifier.</Paragraph> <Paragraph position="1"> Because the model presented here is basically a straight reimplementation of \[AR98\] it fails to take into account attributes that are specific to the CP. For example, whereas (nl ce n3) -- (n3 cc nl), (v p n) ~ (n p v). In other words, there is no reason to make the distinction between &quot;dog and cat&quot; and &quot;cat and dog.&quot; Modifying the model accordingly may greatly increase the usefulness of the training data.</Paragraph> </Section> class="xml-element"></Paper>