File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1014_intro.xml

Size: 3,704 bytes

Last Modified: 2025-10-06 14:06:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1014">
  <Title>Full Text Parsing using Cascades of Rules: an Information Extraction Perspective</Title>
  <Section position="3" start_page="102" end_page="102" type="intro">
    <SectionTitle>
2 Representation and Rules
</SectionTitle>
    <Paragraph position="0"> Every lexical element a in the input sentence w is abstractly represented by means of elementary objects, called tokens. A token T is associated with three structures: * \[T\]dep is a dependency tree for a, i.e. a tree representing syntactic dependencies between a and other lexical elements (its dependees) in w.</Paragraph>
    <Paragraph position="1"> * \[T\]leat is a feature structure representing syntactic and semantic information needed to combine a with other elements in the input.</Paragraph>
    <Paragraph position="2"> * \[T\]zy is a Quasi Logical Form (QLF) providing a semantic interpretation for the combination of a with its dependees.</Paragraph>
    <Paragraph position="3"> Rules operate on tokens, therefore they can access all the three structures above. Rules incrementally build and update the above structures. Lexical, syntactic and semantic constraints can then be used in rules at any level. The whole IE approach can be based on the same formalism and rule types, as both lexical, syntactic and semantic information can be processed uniformly.</Paragraph>
    <Paragraph position="4"> The general form of a rule is a triple (Ta~, FT, FA&gt;, where * 7c~d is a non-empty string of tokens, called the rule pattern; cr is called the rule core and is non-empty, 7, fi are called the rule context and may be empty; * FT is a set of boolean predicates, called rule test, defined over tokens in the rule pattern; * FA is a set of elementary operations, called rule action, defined over tokens in the sole rule core.</Paragraph>
    <Paragraph position="5"> The postfix, unary operators &amp;quot;,&amp;quot; (Kleene star) and &amp;quot;?&amp;quot; (optionality operator) can be used in the rule patterns.</Paragraph>
    <Paragraph position="6"> A basic data structure, called token chart, is processed and dynamically maintained. This is a directed graph whose vertices are tokens and whose arcs represent binary relations from some (finite) basic set. Initially, the token chart is a chain-like graph with tokens ordered as the corresponding lexical elements in w, i.e. arcs initially represent lexical adjacency between tokens. During the processing, arcs might be rewritten so that the token chart becomes a more general kind of graph.</Paragraph>
    <Paragraph position="7"> For a rule to apply, a path cr must be found in the token chart that, when viewed as a string of tokens, satisfies the two following conditions: (i) ~ is matched by 7a~; and (ii) all the boolean predicates in FT hold when evaluated on c~.</Paragraph>
    <Paragraph position="8"> When a rule applies, the elementary operations in FA are executed on the tokens of C/ matching the core of the rule. The effect of action execution is that \[T\]dep, IT\]lear and \[Tit/are updated for the appropriate matching tokens.</Paragraph>
    <Paragraph position="9"> Rules are grouped into cascades that are finite, ordered sequences of rules. Cascades represent elementary logical units, in the sense that all rules in a cascade deal with some specific construction (e.g., subcategorization of verbs). From a functional point of view, a cascade is composed of three segments: * sl contains rules that deal with idiosyncratic cases for the construction at hand; * s2 contains rules dealing with the regular cases; * s3 contains default rules that fire only when no other rule can be successfully applied.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML