File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1052_intro.xml

Size: 5,396 bytes

Last Modified: 2025-10-06 14:03:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1052">
  <Title>Investigating a Generic Paraphrase-based Approach for Relation Extraction</Title>
  <Section position="3" start_page="409" end_page="410" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="409" end_page="409" type="sub_section">
      <SectionTitle>
2.1 Unsupervised Information Extraction
</SectionTitle>
      <Paragraph position="0"> Information Extraction (IE) and its subfield Relation Extraction (RE) are traditionally performed in a supervised manner, identifying the different ways to express a specific information or relation.</Paragraph>
      <Paragraph position="1"> Given that annotated data is expensive to produce, unsupervised or weakly supervised methods have been proposed for IE and RE.</Paragraph>
      <Paragraph position="2"> Yangarber et al. (2000) and Stevenson and Greenwood (2005) define methods for automatic acquisition of predicate-argument structures that are similar to a set of seed relations, which represent a specific scenario. Yangarber et al. (2000) approachwasevaluatedintwoways: (1)manually mapping the discovered patterns into an IE system andrunningafullMUC-styleevaluation; (2)using the learned patterns to perform document filtering at the scenario level. Stevenson and Greenwood (2005) evaluated their method through document and sentence filtering at the scenario level.</Paragraph>
      <Paragraph position="3"> Sudo et al. (2003) extract dependency subtrees withinrelevantdocumentsasIEpatterns. Thegoal of the algorithm is event extraction, though performance is measured by counting argument entities rather than counting events directly.</Paragraph>
      <Paragraph position="4"> Hasegawa et al. (2004) performs unsupervised hierarchical clustering over a simple set of features. The algorithm does not extract entity pairs for a given relation from a set of documents but ratherclassifiesallrelationsinalargecorpus. This approach is more similar to text mining tasks than to classic IE problems.</Paragraph>
      <Paragraph position="5"> To conclude, several unsupervised approaches learn relevant IE templates for a complete scenario, but without identifying their relevance to each specific relation within the scenario. Accordingly, the evaluations of these works either did not addressthedirectapplicabilityforREorevaluated it only after further manual postprocessing.</Paragraph>
    </Section>
    <Section position="2" start_page="409" end_page="410" type="sub_section">
      <SectionTitle>
2.2 Paraphrases and Entailment Rules
</SectionTitle>
      <Paragraph position="0"> A generic model for language variability is usingparaphrases, textexpressionsthatroughlyconvey the same meaning. Various methods for automatic paraphrase acquisition have been suggested recently, ranging from finding equivalent lexical elements to learning rather complex paraphrases at the sentence level1.</Paragraph>
      <Paragraph position="1"> More relevant for RE are &amp;quot;atomic&amp;quot; paraphrases betweentemplates, textfragmentscontainingvariables, e.g. 'X buy Y =X purchase Y'. Under a syntacticrepresentation, atemplateisaparsedtext fragment, e.g. 'X subj- interact mod- with pcomp[?]n- Y' (based on the syntactic dependency relations of the Minipar parser). The parses include part-of-speech tags, which we omit for clarity.</Paragraph>
      <Paragraph position="2"> Dagan and Glickman (2004) suggested that a somewhat more general notion than paraphrasing is that of entailment relations. These are directional relations between two templates, where the meaning of one can be entailed from the meaning oftheother,e.g. 'X bindtoY=X interactwithY'.</Paragraph>
      <Paragraph position="3"> For RE, when searching for a target relation, it is sufficient to identify an entailing template since it implies that the target relation holds as well. Under this notion, paraphrases are bidirectional entailment relations.</Paragraph>
      <Paragraph position="4"> Several methods extract atomic paraphrases by exhaustively processing local corpora (Lin and Pantel, 2001; Shinyama et al., 2002). Learning from a local corpus is bounded by the corpus scope, which is usually domain specific (both works above processed news domain corpora). To cover a broader range of domains several works utilized the Web, while requiring several manually provided examples for each input relation, e.g. (Ravichandran and Hovy, 2002). Taking a stepfurther, theTEASEalgorithm(Szpektoretal., 2004) provides a completely unsupervised method for acquiring entailment relations from the Web for a given input relation (see Section 5.1).</Paragraph>
      <Paragraph position="5"> Most of these works did not evaluate their results in terms of application coverage. Lin and Pantel (2001) compared their results to human-generated paraphrases. Shinyama et al. (2002) measured the coverage of their learning algorithm relative to the paraphrases present in a given corpus. Szpektor et al. (2004) measured &amp;quot;yield&amp;quot;, the number of correct rules learned for an input re- null lation. Ravichandran and Hovy (2002) evaluated the performance of a QA system that is based solely on paraphrases, an approach resembling ours. However, they measured performance using Mean Reciprocal Rank, which does not reveal the actual coverage of the learned paraphrases.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML