File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1634_intro.xml
Size: 1,985 bytes
Last Modified: 2025-10-06 14:04:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1634"> <Title>Automatic Construction of Predicate-argument Structure Patterns for Biomedical Information Extraction</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> Current Affiliation: + FUJITSU LABORATORIES LTD. </SectionTitle> <Paragraph position="0"> ++ Faculty of Informatics, Kogakuin University patterns, which is tedious and time-consuming process, is not really practical.</Paragraph> <Paragraph position="1"> Techniques based on machine learning (Zhou et al., 2005; Hao et al., 2005; Bunescu and Mooney, 2006) are expected to alleviate this problem in manually crafted IE. However, in most cases, the cost of manually crafting patterns is simply transferred to that for constructing a large amount of training data, which requires tedious amount of manual labor to annotate text.</Paragraph> <Paragraph position="2"> To systematically reduce the necessary amount of training data, we divided the task of constructing extraction patterns into a subtask that general natural language processing techniques can solve and a subtask that has specific properties according to the information to be extracted. The former subtaskisoffullparsing(i.e.recognizingsyntactic structuresofsentences), andthelattersubtaskisof constructing specific extraction patterns (i.e. findingcluewordstoextractinformation)basedonthe null obtained syntactic structures.</Paragraph> <Paragraph position="3"> We adopted full parsing from various levels of parsing, because we believe that it offers the best utility to generalize sentences into normalized syntactic relations. We also divided patterns into components to improve recall and we introduced machine learning with a Support Vector Machine (SVM) to learn a prediction model using the matching results of extraction patterns. As an actual IE task, we extracted pairs of interacting protein names from biomedical text.</Paragraph> </Section> class="xml-element"></Paper>