File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1197_intro.xml

Size: 4,328 bytes

Last Modified: 2025-10-06 14:02:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1197">
  <Title>Semantic Role Labeling via Integer Linear Programming Inference</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Semantic parsing of sentences is believed to be an important task toward natural language understanding, and has immediate applications in tasks such information extraction and question answering. We study semantic role labeling(SRL). For each verb in a sentence, the goal is to identify all constituents that fill a semantic role, and to determine their roles, such as Agent, Patient or Instrument, and their adjuncts, such as Locative, Temporal or Manner.</Paragraph>
    <Paragraph position="1"> The PropBank project (Kingsbury and Palmer, 2002) provides a large human-annotated corpus of semantic verb-argument relations. Specifically, we use the data provided in the CoNLL-2004 shared task of semantic-role labeling (Carreras and M`arquez, 2003) which consists of a portion of the PropBank corpus, allowing us to compare the performance of our approach with other systems.</Paragraph>
    <Paragraph position="2"> Previous approaches to the SRL task have made use of a full syntactic parse of the sentence in order to define argument boundaries and to determine the role labels (Gildea and Palmer, 2002; Chen and Rambow, 2003; Gildea and Hockenmaier, 2003; Pradhan et al., 2003; Pradhan et al., 2004; Surdeanu et al., 2003). In this work, following the CoNLL-2004 shared task definition, we assume that the SRL system takes as input only partial syntactic information, and no external lexico-semantic knowledge bases. Specifically, we assume as input resources a part-of-speech tagger, a shallow parser that can process the input to the level of based chunks and clauses (Tjong Kim Sang and Buchholz, 2000; Tjong Kim Sang and D'ejean, 2001), and a named-entity recognizer (Tjong Kim Sang and De Meulder, 2003). We do not assume a full parse as input.</Paragraph>
    <Paragraph position="3"> SRL is a difficult task, and one cannot expect high levels of performance from either purely manual classifiers or purely learned classifiers. Rather, supplemental linguistic information must be used to support and correct a learning system. So far, machine learning approaches to SRL have incorporated linguistic information only implicitly, via the classifiers' features. The key innovation in our approach is the development of a principled method to combine machine learning techniques with linguistic and structural constraints by explicitly incorporating inference into the decision process.</Paragraph>
    <Paragraph position="4"> In the machine learning part, the system we present here is composed of two phases. First, a set of argument candidates is produced using two learned classifiers--one to discover beginning positions and one to discover end positions of each argument type. Hopefully, this phase discovers a small superset of all arguments in the sentence (for each verb). In a second learning phase, the candidate arguments from the first phase are re-scored using a classifier designed to determine argument type, given a candidate argument.</Paragraph>
    <Paragraph position="5"> Unfortunately, it is difficult to utilize global properties of the sentence into the learning phases.</Paragraph>
    <Paragraph position="6"> However, the inference level it is possible to incorporate the fact that the set of possible rolelabelings is restricted by both structural and linguistic constraints--for example, arguments cannot structurally overlap, or, given a predicate, some argument structures are illegal. The overall decision problem must produce an outcome that consistent with these constraints. We encode the constraints as linear inequalities, and use integer linear programming(ILP) as an inference procedure to make a final decision that is both consistent with the constraints and most likely according to the learning system. Although ILP is generally a computationally hard problem, there are efficient implementations that can run on thousands of variables and constraints. In our experiments, we used the commercial ILP package (Xpress-MP, 2003), and were able to process roughly twenty sentences per second.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML