File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3312_intro.xml
Size: 3,940 bytes
Last Modified: 2025-10-06 14:04:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3312"> <Title>Postnominal Prepositional Phrase Attachment in Proteomics</Title> <Section position="3" start_page="0" end_page="82" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> Leroy et al. (2002; 2003) note the importance of noun phrases and prepositions in the capture of relational information in biomedical texts, citing the particular signi cance of the prepositions by, of, and in.</Paragraph> <Paragraph position="1"> Their parser can extract many different relations using few rules by relying on closed-class words (e.g.</Paragraph> <Paragraph position="2"> prepositions) instead of restricting patterns with speci c prede ned verbs and entities. This bottom-up approach achieves high precision (90%) and a claimed (though unquanti ed) high recall. However, they side-step the issue of prepositional attachment ambiguity altogether. Also, their system is targeted speci cally and only toward relations. While relations do cover a considerable portion of the most relevant information in biomedical texts, there is also much relevant lower frequency information (particularly in enzymology) such as the conditions under which these relations are expressed.</Paragraph> <Paragraph position="3"> Hahn et al. (2002) point out that PPs are crucial for semantic interpretation of biomedical texts due to the wide variety of conceptual relations they introduce. They note that this is re ected in their training and test data, extracted from ndings reports in histopathology, where prepositions account for about 10% of all words and more than 25% of the text is contained in PPs. The coverage of PPs in our development and test data, comprised of varied texts in proteomics, is even higher with 26% of the text occurring in postnominal PPs alone.</Paragraph> <Paragraph position="4"> Little research in the biomedical domain addresses the problem of PP attachment proper. This is partly due to the number of systems that process text using named-entity-based templates, disregarding PPs. In fact, the only recent BioNLP system found in the literature that makes any mention of PP attachment is Medstract (Pustejovsky et al., 2002), an automated information extraction system for Medline abstracts. The shallow parsing module used in Medstract performs limited prepositional attachment only of prepositions are attached.</Paragraph> <Paragraph position="5"> There are, of course, several PP attachment systems for other domains. Volk (2001) addresses PP attachment using the frequency of co-occurrence of a PP's preposition, object NP, and possible attachment points, calculated from query results of a web-based search engine. This system was evaluated on sentences from a weekly computer magazine, scoring 74% accuracy for both VP and NP attachment. Brill & Resnik (1994) put transformation-based learning with added word-class information from WordNet to the task of PP attachment. Their system achieves 81.8% accuracy on sentences from the Penn Treebank Wall Street Journal corpus.</Paragraph> <Paragraph position="6"> The main concerns of both these systems differ from the requirements for successful PP attachment in proteomics. The main attachment ambiguity in these general texts is between VP and NP attachment, where there are few NPs to choose from for a given PP. In contrast, proteomics texts, where NPs are the main information carriers, contain many NPs with long sequences of postnominal PPs. Consequently, the possible attachment points for a given PP are more numerous. By postnominal , we denote PPs following an NP, where the attachment point may be within the NP but may also precede it. In focusing on postnominal PPs, we exclude here PPs that trivially attach to the VP for lack of NP attachment points and focus on the subset of PPs with the highest degree of attachment ambiguity.</Paragraph> </Section> class="xml-element"></Paper>