File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0913_intro.xml

Size: 7,663 bytes

Last Modified: 2025-10-06 14:02:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0913">
  <Title>Text Understanding with GETARUNS for Q/A and Summarization</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> GETARUNS, the system for text understanding developed at the University of Venice, is equipped with three main modules: a lower module for parsing where sentence strategies are implemented; a middle module for semantic interpretation and discourse model construction which is cast into Situation Semantics; and a higher module where reasoning and generation takes place (Delmont &amp; Bianchi, 2002) .</Paragraph>
    <Paragraph position="1"> The system is based on LFG theoretical framework (Bresnan, 2001) and has a highly interconnected modular structure. It is a top-down depth-first DCG-based parser written in Prolog which uses a strong deterministic policy by means of a lookahead mechanism with a WFST to help recovery when failure is unavoidable due to strong attachment ambiguity.</Paragraph>
    <Paragraph position="2"> It is divided up into a pipeline of sequential but independent modules which realize the subdivision of a parsing scheme as proposed in LFG theory where a c-structure is built before the f-structure can be projected by unification into a DAG. In this sense we try to apply in a given sequence phrase-structure rules as they are ordered in the grammar: whenever a syntactic constituent is successfully built, it is checked for semantic consistency, both internally for head-spec agreement, and externally, in case of a non-substantial head like a preposition dominating the lower NP constituent. Other important local semantic consistency checks are performed with modifiers like attributive and predicative adjuncts. In case the governing predicate expects obligatory arguments to be lexically realized they will be searched and checked for uniqueness and coherence as LFG grammaticality principles require (Delmonte, 2002). In other words, syntactic and semantic information is accessed and used as soon as possible: in particular, both categorial and subcategorization information attached to predicates in the lexicon is extracted as soon as the main predicate is processed, be it adjective, noun or verb, and is used to subsequently restrict the number of possible structures to be built.</Paragraph>
    <Paragraph position="3"> Adjuncts are computed by semantic cross compatibility tests on the basis of selectional restrictions of main predicates and adjuncts heads. As far as parsing is concerned, we purport the view that the implementation of sound parsing algorithm must go hand in hand with sound grammar construction. Extragrammaticalities can be better coped with within a solid linguistic framework rather than without it. Our parser is a rule-based deterministic parser in the sense that it uses a lookahead and a Well-Formed Substring Table to reduce backtracking. It also implements Finite State Automata in the task of tag disambiguation, and produces multiwords whenever lexical information allows it. In our parser we use a number of parsing strategies and graceful recovery procedures which follow a strictly parameterized approach to their definition and implementation.</Paragraph>
    <Paragraph position="4"> Recovery procedures are also used to cope with elliptical structures and uncommon orthographic and punctuation patterns. A shallow or partial parser, in the sense of (Abney, 1996), is also implemented and always activated before the complete parse takes place, in order to produce the default baseline output to be used by further computation in case of total failure. In that case partial semantic mapping will take place where no Logical Form is being built and only referring expressions are asserted in the Discourse Model but see below.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.2 The Binding Module
</SectionTitle>
      <Paragraph position="0"> The output of grammatical modules is then fed onto the Binding Module(BM) which activates an algorithm for anaphoric binding in LFG terms using f-structures as domains and grammatical functions as entry points into the structure.</Paragraph>
      <Paragraph position="1"> Pronominals are internally decomposed into a feature matrix which is made visible to the Binding Algorithm(BA) and allows for the activation of different search strategies into f-structure domains. Antecedents for pronouns are ranked according to grammatical function, semantic role, inherent features and their position at f-structure. Special devices are required for empty pronouns contained in a subordinate clause which have an ambiguous context, i.e. there are two possible antecedents available in the main clause. Also split antecedents trigger special search strategies in order to evaluate the set of possible antecedents in the appropriate f-structure domain. Eventually, this information is added into the original f-structure graph and then passed on to the Discourse Module(DM). We show here below the architecture of the parser.</Paragraph>
      <Paragraph position="2"> Fig.1 GETARUNS' LFG-Based Parser</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.3 Lexical Information
</SectionTitle>
      <Paragraph position="0"> The grammar is equipped with a lexicon containing a list of fully specified inflected word forms where each entry is followed by its lemma and a list of morphological features, organized in the form of attribute-value pairs. However, morphological analysis for English has also been implemented and used for OOV words. The system uses a core fully specified lexicon, which contains approximately 10,000 most frequent entries of English. In addition to that, there are all lexical forms provided by a fully revised version of COMLEX. In order to take into account phrasal and adverbial verbal compound forms, we also use lexical entries made available by UPenn and TAG encoding. Their grammatical verbal syntactic codes have then been adapted to our formalism and is used to generate an approximate subcategorization scheme with an approximate aspectual and semantic class associated to it. Semantic inherent features for Out of Vocabulary words , be they nouns, verbs, adjectives or adverbs, are provided by a fully revised version of WordNet - 270,000 lexical entries - in which we used 75 semantic classes similar to those provided by CoreLex.</Paragraph>
      <Paragraph position="1"> Our training corpus which is made up 200,000 words and is organized by a number of texts taken from different genres, portions of the UPenn WSJ corpus, test-suits for grammatical relations, and sentences taken from COMLEX manual.</Paragraph>
      <Paragraph position="2"> To test the parser performance we used the &amp;quot;Greval Corpus&amp;quot; made available by John Carroll and Ted Briscoe which allows us to measure the precision and recall against data published in (Preis, 2003). The results obtained are a 90% F-measure which is by far the best result obtained on that corpus by other system, ranging around 75%.</Paragraph>
      <Paragraph position="3"> Overall almost the whole text - 98% - is turned into semantically consistent structures which have already undergone Pronominal Binding at sentence level in their DAG structural representation. The basic difference between the complete and the partial parser is the ability of the first to ensure propositional level semantic consistency in almost every parse, which is not the case with the second.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML