File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1073_intro.xml
Size: 2,937 bytes
Last Modified: 2025-10-06 14:03:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1073"> <Title>Joint Learning Improves Semantic Role Labeling</Title> <Section position="3" start_page="589" end_page="590" type="intro"> <SectionTitle> 3 Local Classifiers </SectionTitle> <Paragraph position="0"> In the context of role labeling, we call a classifier local if it assigns a probability (or score) to the label of an individual parse tree node ni independently of the labels of other nodes.</Paragraph> <Paragraph position="1"> We use the standard separation of the task of semantic role labeling into identification and classifi- null et al., 2003) cation phases. In identification, our task is to classify nodes of t as either ARG, an argument (including modifiers), or NONE, a non-argument. In classification, we are given a set of arguments in t and must label each one with its appropriate semantic role. Formally, let L denote a mapping of the nodes in t to a label set of semantic roles (including NONE) and let Id(L) be the mapping which collapses L's non-NONE values into ARG. Then we can decompose the probability of a labeling L into probabilities according to an identification model PID and a classification model PCLS.</Paragraph> <Paragraph position="3"> This decomposition does not encode any independence assumptions, but is a useful way of thinking about the problem. Our local models for semantic role labeling use this decomposition. Previous work has also made this distinction because, for example, different features have been found to be more effective for the two tasks, and it has been a good way to make training and search during testing more efficient. null Here we use the same features for local identification and classification models, but use the decomposition for efficiency of training. The identification models are trained to classify each node in a parse tree as ARG or NONE, and the classification models are trained to label each argument node in the training set with its specific label. In this way the training set for the classification models is smaller. Note that we don't do any hard pruning at the identification stage in testing and can find the exact labeling of the complete parse tree, which is the maximizer of Equation 1. Thus we do not have accuracy loss as in the two-pass hard prune strategy described in (Pradhan et al., 2005).</Paragraph> <Paragraph position="4"> In previous work, various machine learning methods have been used to learn local classifiers for role labeling. Examples are linearly interpolated relative frequency models (Gildea and Jurafsky, 2002), SVMs (Pradhan et al., 2004), decision trees (Surdeanu et al., 2003), and log-linear models (Xue and Palmer, 2004). In this work we use log-linear models for multi-class classification. One advantage of log-linear models over SVMs for us is that they produce probability distributions and thus identification</Paragraph> </Section> class="xml-element"></Paper>