XML Viewer - w04-2005

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2005_metho.xml
Size: 37,959 bytes
Last Modified: 2025-10-06 14:09:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2005">
  <Title>EVALUATING GETARUNS PARSER WITH GREVAL TEST SUITE</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper we will present the parser used by the system GETARUN and discuss its performance with the test suite called GREVAL set up by Carroll &amp; Briscoe. We will also discuss the mapping algorithm from LFG to Dependency Grammatical Relations (DGRs), which we have been obliged to develop in order to be able to evaluate our parser. Greval is a benchmark for parser evaluation based on Grammatical Relations in a Head Dependency Structure style output, i.e. a word based Head-Dependent flat representation enriched with Grammatical Relation information, where each relation is represented as follows,</Paragraph>
    <Paragraph position="2"> where a deep relation is introduced basically for passive constructions, dative shift, and potentially other structures, according to the &amp;quot;Movement&amp;quot; approach invoked by chomskians. The annotation adopted by the authors is a surface level GRs approach where for instance, in cases of Locative Inversion as in sentence 284, (1) Here, in the old days - when they had come to see the moon or displays of fireworks - sat the king and his court while priests, soldiers, and other members of the party lounged in the smaller alcoves between.</Paragraph>
    <Paragraph position="3"> the SUBJect NP 'the king and his court&amp;quot; is assigned to DOBJ and then receives an additional deeprelation label as NCSUBJ to indicate its original deep structure position. However the same relation is &amp;quot;wrongly&amp;quot; marked as NCSUBJ in a subsequent case of Locative Inversion (the only other one, sentence 445), which we report below, (2) In his stead is a milquetoast version known as the corporation.</Paragraph>
    <Paragraph position="4"> where the inverted subject NP &amp;quot;a milquetoast version&amp;quot; is annotated straightforwardly as NCSUBJ. The inconsistency denounced by this case of double annotation lingers on other types of ambiguous GRs that we will comment below.</Paragraph>
    <Paragraph position="5"> In our experiment, for reasons already explained in (Crouch et al., 2002) and further commented below, we restricted our mapping algorithm to all &amp;quot;predonly&amp;quot; f-structures, i.e. only to semantic heads with primary GRs. This is because we assume that the most difficult task a parser is faced with when parsing a sentence is to operate the argument-adjunct distinction: thus the subset of GRs we will take into account is the following:</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CCOMP
</SectionTitle>
    <Paragraph position="0"> Difficulties in building up a comparable version of our output, include the inconsistencies present in the non typographical text distributed for the test. We also had to give up using the internally provided tool for evaluation because our system builds multiword expressions which are almost totally absent in the annotated Gold version. So even though the authors admit to the need of improving this aspect, its lack makes it impossible for real systems to use automatic evaluation tools.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Multiwords
</SectionTitle>
      <Paragraph position="0"> Multiwords go from obvious cases such as United States, which is wrongly treated in the annotated corpus as two separate words - &amp;quot;state&amp;quot; modified by &amp;quot;united&amp;quot; - to prepositional and adverbial locutions some of which have been individuated but have been left with an SGML markup, as shown here,  e.&lt;blank&gt;g.</Paragraph>
      <Paragraph position="1"> In addition to these 15 multiwords, we produced over 220 nouns, adverbials and adjectives which contributed in an important manner to disambiguate both syntactic and semantic processing, as well as to facilitate tagging.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.2 Mapping from LFG to DGRs
</SectionTitle>
      <Paragraph position="0"> However the most difficult part of the work was the mapping itself. The mapping from LFG to DGRs requires the setting up of a principled distinction between GRs.</Paragraph>
      <Paragraph position="1"> The main difficulties were due to the treatment of OBLiques vs. PP adjuncts: LFG erases the preposition of oblique arguments which is no longer available for comparison, on the contrary, the preposition is preserved in case it is semantically needed to identify the semantic role associated to PP adjuncts. In DGRs instead we have two options reported below and taken from the online Readme document, ncmod(type, head, dependent) is the most common relation type and is used to encode PP, adjectival/adverbial modification, nominal modifiers, and so on. ncmod is also used to indicate a particle `modifier' in a verb particle combination, e.g. Bill looked the word up is encoded as ncmod(prt, look, up); of in consist of is treated as a particle.</Paragraph>
      <Paragraph position="2"> iobj(type, head, dependent) is the relation between a predicate and a non-clausal complement introduced by a preposition; iobj(in, arrive, Spain) arrive in Spain iobj(into, put, box) put the tools into the box iobj(to, give, poor) give to the poor The definitions do not respect the actual decisions taken in the annotation process because only adjective and verb predicates are assigned complement labels like nsubj, dobj and iobj. Nouns are only assigned generic modifiers, apart from 9 cases of &amp;quot;wrongly&amp;quot; labeled IOBJ complements which we comment in more detail below. On the contrary, in our representation, possessor relations and other agent-like relations in noun modification are computed as SUBJ. The remaining &amp;quot;of&amp;quot; headed PPs are all computed as OBJ in case the noun is a deverbal predicate and is transitive; other prepositions introduce OBLiques and ADJuncts again according to the governing noun.</Paragraph>
      <Paragraph position="3"> As to verb and adjective predicates, in order to be able to sort out NCMODs from IOBJs, all subcategorization relations must be suitably encoded, be it obligatory or optional ones, as can be understood from the examples of IOBJs listed above and taken from the online accompanying document.</Paragraph>
      <Paragraph position="4"> In particular, ARRIVE would be computed as taking a Locative PP complement at the same level of PUT, whereas only the latter requires the complement obligatorily. As to the third example, GIVE, a ditransitive verb allowing Dative Shift, its indirect object would be computed as a case of IOBJ thus collapsing an important linguistic distinction existing between OBLiques and Indirect Objects.</Paragraph>
      <Paragraph position="5"> However, given the generic criteria followed by the annotators, we still wanted to verify the consistency of annotation of IOBJs, so we checked with COMLEX subcategorization frames whether the relation would be predicted or not. When collecting the IOBJs of the corpus we soon discovered that 9 IOBJS constitute cases of non verbal complementation which we report below, (iobj to akin future) ;;; non-verbal complementation (iobj to adherence principle)</Paragraph>
      <Paragraph position="7"> then, there is one dubious case of IOBJ: in sentence 316 the ellipsed governing adjective predicate &amp;quot;atune&amp;quot; is done away with and the IOBJ relation is assigned to the verb BE, (3) Indeed, the old Jeffersonians were far more atune to the Hamilton oriented Whigs than they were to the Jacksonian Democrats.</Paragraph>
      <Paragraph position="9"> From the search in COMLEX of the remaining 127 predicates governing IOBJ relations, we derived 20 predicates missing the preposition required for the complement discriminative choice. In more than one case the choice of complement (iobj) vs. adjunct (ncmod) is highly disputable. This situation makes the comparison and evaluation of IOBJs very uncertain and bound to low scoring as happened with the parsers included in the test reported under (Preis, 2003). As an additional remark, from the definition it would appear that OBLiques are treated as DOBJ of a preposition particle which is in turn treated itself as ncmod. To better clarify the issue we partially report the annotation of example 244 from the corpus, where we italicize the relevant relations, (4) Meanwhile, the experts speak of wars triggered by false pre-emption, escalation, unauthorized behavior and other terms that will be discussed in this report.</Paragraph>
      <Paragraph position="10"> ncsubj(speak, expert, _) dobj(speak, war, _) ncsubj(trigger, war, obj) arg_mod(by, trigger, pre-emption, subj) arg_mod(by, trigger, escalation, subj) arg_mod(by, trigger, behaviour, subj) arg_mod(by, trigger, term, subj) ncsubj(discuss, term, obj) ncmod(prt, speak, of) So we decided that in our evaluation we treat as IOBJs both those actually produced by our parser and matched directly with the Gold corpus, as well as those which appear in the Gold corpus as DOBJ governed by a preposition NCMOD and also those that have been computed as NCMODs directly, as long as the head and the dependent are identical. We then individuated a number of mismatches in the Gold annotation which would not receive a suitable mapping in our output, which were then interpreted as mistakes by Ted Briscoe (p.c.) in particular cases of secondary predication for the class of ECM verbs (consider, believe, term, etc) which were treated as OBJ2, as well as the relation of DOBJ associated to complements of verb HAVE, which we compute as  copulative verb.</Paragraph>
      <Paragraph position="11"> (5) Sunday he had added, We can love Eisenhower the man, even if we considered him a mediocre president... but there is nothing left of the Republican Party without his leadership.</Paragraph>
      <Paragraph position="12">  ncsubj(add, he, _) ncsubj(love, we, _) dobj(love, Eisenhower, _) ncsubj(consider, we, _) dobj(consider, he, _) obj2(consider, president, _)  For this reason we didn't include data for OBJ2 in our comparison which required too many changes in our parser architecture in order to have the appropriate mapping. As to clause level GRs, we computed both open and closed sentential (CCOMP) and small clause (XCOMP) complements, but we did not compute open adjuncts - XMODS - which again did not seem to be easily comparable in our mapping. As a matter of fact, these clausal complements have not been separated in the published test and only figure as CLAUSAL. For comparison reasons we had to erase all subject relations made available by our LFG-based representation for all open predicative complements which were however not represented in the Gold manually annotated GREVAL corpus.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Parsing and Robust Techniques
</SectionTitle>
    <Paragraph position="0"> As far as parsing is concerned, we purport the view that the implementation of sound parsing algorithm must go hand in hand with sound grammar construction. Extragrammaticalities can be better coped with within a solid linguistic framework rather than without it. Our parser is a rule-based deterministic parser in the sense that it uses a lookahead and a Well-Formed Substring Table to reduce backtracking. It also implements Finite State Automata in the task of tag disambiguation, and produces multiwords whenever lexical information allows it. In our parser we use a number of parsing strategies and graceful recovery procedures which follow a strictly parameterized approach to their definition and implementation. Recovery procedures are also used to cope with elliptical structures and uncommon orthographic and punctuation patterns. A shallow or partial parser, in the sense of (Abney, 1996) is also implemented and always activated before the complete parse takes place, in order to produce the default baseline output to be used by further computation in case of total failure.</Paragraph>
    <Paragraph position="1"> The grammar is equipped with a lexicon containing a list of fully specified inflected word forms where each entry is followed by its lemma and a list of morphological features, organized in the form of attribute-value pairs. However, morphological analysis for English has also been implemented and used for OOV words. The system uses a core fully specified lexicon, which contains approximately 10,000 most frequent entries of English. In addition to that, there are all lexical forms provided by a fully revised version of COMLEX. In order to take into account phrasal and adverbial verbal compound forms, we also use lexical entries made available by UPenn and TAG encoding. Their grammatical verbal syntactic codes have then been adapted to our formalism and is used to generate an approximate subcategorization scheme with an approximate aspectual and semantic class associated to it.</Paragraph>
    <Paragraph position="2"> Semantic inherent features for Out of Vocabulary words , be they nouns, verbs, adjectives or adverbs, are provided by a fully revised version of WordNet 270,000 lexical entries - in which we used 75 semantic classes similar to those provided by CoreLex.</Paragraph>
    <Paragraph position="3"> All parser rules from lexicon to c-structure to f-structure amount to 7532 rules, thus subdivided:  1. Calls to lexical entries - morphology and lexical forms: 3865 rules 2. Syntactic and semantic rules in the parser proper: 2617 rules</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
rules
</SectionTitle>
      <Paragraph position="0"> All semantic f-structure building rules: 1050 rules The parser itself is made up of 51,000 lines of code.</Paragraph>
      <Paragraph position="1"> This does not take into account the code for the lexicon - with fully specified subcategorization frames - and the dictionary for morphological decomposition: 6600 entries for the lexicon and 76,000 entries for the dictionary. These are all consulted at runtime. Eventually the semantics from the WordNet and other sources derived from the web make up three hash-tables for 5 Mb overall sitting on the hard disk and accessed when needed.</Paragraph>
      <Paragraph position="2"> Our training corpus for the complete system is made up 200,000 words and is organized by a number of texts taken from different genres, portions of the UPenn WSJ corpus, test-suits for grammatical relations, narrative texts, and sentences taken from</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Lookahead and FSA
</SectionTitle>
      <Paragraph position="0"> One of the important differences we would like to highlight is the use of topdown lookahead based parsing strategies. The following list of preterminal  14 symbols is used: 1. v=verb-auxiliary-modal-clitic-cliticized verb 2. n=noun - common, proper; 3. c=complementizer 4. s=subordinator; 5. e=conjunction 6. p=preposition-particle 7. a=adjective;  As has been reported in the literature (see Tapanainen and Voutilainen 1994; Brants and Samuelsson 1995), English is a language with a high level of homography: readings per word are around 2 (i.e. each word can be assigned in average two different tags depending on the tagset). Lookahead in our system copes with most cases of ambiguity: however, we also had to introduce a disambiguating tool before the input string could be safely passed to the parser. Disambiguation is applied to the lookahead stack and is operated by means of Finite State Automata. The reason why we use FSA is simply due to the fact that for some important categories, English has unambiguous tags which can be used as anchoring in the input string, to reduce ambiguity. I am now referring to the class of determiners which is used to tell apart words belonging to the ambiguity class [verb,noun], the most frequent in occurrence in English. Besides, all FSA may be augmented by tests related to linguistic properties needed for disambiguation; some such tests are, g1 check subcategorization frame for current word This is used for [n,v], [a,n,v] ambiguity classes followed by a preposition, or followed by &amp;quot;that&amp;quot; g1 check for gerundive verb form This is used to check for -ing endings of words g1 check for auxiliary and modals This is used to disambiguate [n,v], [a,n,v] ambiguity classes when preceded by an auxiliary or a modal g1 check for noun belonging to factive class This is used to disambiguate &amp;quot;that&amp;quot; [a,c,r] ambiguity class when preceded by a governing noun g1 check for verbs of saying This is used to disambiguate verbs preceded or followed by punctuation marks  3. GETARUNS: a Linguistically and  The parser is divided up into a pipeline of sequential but independent modules which realize the subdivision of a parsing scheme as proposed in LFG theory where a c-structure is built before the f-structure can be projected by unification into a DAG. In this sense we try to apply phrase-structure rules in a given sequence as they are ordered in the grammar: whenever a syntactic constituent is successfully built, it is checked for semantic consistency, both internally for head-spec agreement, and externally, in case of a non-substantial head like a preposition dominates the lower NP constituent; other important local semantic consistency checks are performed with modifiers like attributive and predicative adjuncts. In case the governing predicate expects obligatory arguments to be lexically realized they will be searched and checked for uniqueness and coherence as LFG grammaticality principles require. We assume that from a psycholinguistic point of view, parsing requires setting up a number of disambiguating strategies, basically to tell arguments apart from adjuncts and reduce the effects of backtracking. The use of strategies calls for psychologically related disambiguation processes which are strictly bound to linguistic parameters. For instance, English is a language that freely allows compless (complementizer-less) complement and relative clauses. Being the sentence the highest recursive structural level, it is plausible that English speakers will adopt some strategy in order to avoid falling in a garden path - thus freezing the parser. Another peculiar feature of English regards the inherent ambiguity of Past Tense/Past Participle verb forms, exception made for irregular verbs which however only constitute a small subset in the verb lexicon of English amounting in our case to some 20,000 entries. Seen that Reduced Relative Clauses are headed by the past participle verb form, and that Participial Adjuncts may be attached to any NP head nouns quite consistently; and seen also that is very hard to apply strict subcategorization tests for participial SUBJect - or deep OBJect in case of passives - with good enough confidence we assume that such tests will only be performed in case the parser is at the complement level of the SUBJect NP. The reason for this being that we need to prevent as much as possible failures at the I_bar level. For this reason we pass grammatical function information down into the NP complement level in order to be used for that purpose.</Paragraph>
      <Paragraph position="1"> Whenever a given predicate has expectancies for a given argument to be realized either optionally or obligatorily this information will be passed below to the recursive portion of the parsing: this operation allows us to implement parsing strategies like Minimal Attachment, Functional Preference and other ones (see Delmonte, 2000a; Delmonte, 2000b). As said above, English allows an empty Complementizer for finite complement and relative clauses, two structures which contribute a lot of indeterminacy to the parsing process. However, in our system, this can be nicely accomodated by using linguistic information to prevent the rule to be entered by the parser. Syntactic and semantic information is accessed and used as soon as possible: in particular, both categorial and subcategorization information attached to predicates in the lexicon is extracted as soon as the main predicate is processed, be it adjective, noun or verb, and is used in association with local lookahead to restrict the number of possible structures to be built. Adjuncts are computed by semantic compatibility tests on the basis of selectional restrictions of main predicates and adjuncts heads.</Paragraph>
      <Paragraph position="2"> The grammar formalism implemented in our system is not fully compliant with the one suggested by LFG theory (Bresnan, 2001), in the sense that we do not use a specific Feature-Based Unification algorithm but a Prolog-based parsing scheme. On the other hand, Prolog language gives full control of a declarative rule-based system, where information is clearly spelled out and passed on and out to higher/lower levels of computation. In addition, we find that topdown parsing policies are better suited to implement parsing strategies that are essential in order to cope with attachment ambiguities (but see below).</Paragraph>
      <Paragraph position="3"> We will need here to make clear what we intend for &amp;quot;LFG-related grammar organization&amp;quot;: as said above, we are not following LFG theory strictly in that the parser is not organized as would all constraint unification-based parsers, with a context-free or context-augmented grammar that produces constituents which are then passed to a unification mechanism to check for consistency, uniqueness and coherence in LFG terms, or simply to check for feature agreement and subcategorization constraints satisfaction. In our parser the grammar is organized by Grammatical Functions which call syntactic constituents. Each Grammatical Function call passes functional information to the constituent level which is paramount to the lookahead mechanism and makes available syntactic constraints of a higher level than the constituent itself.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 The Organization of Grammar Rules
</SectionTitle>
      <Paragraph position="0"> The grammar is divided up into five main levels: - the complex utterance level, where choices are made for subordinate/coordinate, direct/indirect discourse markers, or as simple assertion; - the utterance level, where choices are made for detecting a question vs. assertion; - the simple assertion level, where we may have assertions with a sentential subject; a verbal structure as a gerundive as subject; a fronted OBJect NP as focalized constituent or a Locative Inversion sentence structure.</Paragraph>
      <Paragraph position="1"> In case none of the previous structures are detected the parser enters the CP level of canonical sentences where Aux-to-Comp may take place.</Paragraph>
      <Paragraph position="2"> All sentential fronted constituents are taken at the CP level. Adjuncts at this level may be of many different kinds, some of them conflicting with the same role the constituent may have at a lower level. For instance, a vocative NP may be present, fronted PP complements, as well as various types of parenthetical structures: here again, the parser must be told at which level of computation it is actually situated in the grammar. This is done again by passing down the corresponding grammatical function. When the parser leaves this level of computation it will enter the canonical IP level where the SUBJect NP must be computed, either as a lexically present nominal head or as a string passed in the Extraposition variable. Then again a number of ADJuncts may be present between SUBJect and verb, and they can be adverbials and parentheticals. When this level is left the parser enters the I_bar level where it is expecting a verb in the input string. This can be a verb complex with a number of internal constituents, but the first item must be definitely a verb. In case there is none, a number of fail-soft and recovery strategies are tried to check whether the parser has taken a participial as ADJunct or as reduced relative clause in a previous parse which is passed down to perusal. Also in case there was no nominal element available at IP level a number of recovery strategy are tried to check whether the parser has taken an Appositive or a Vocative in a previous parse. So all the previously parser material is either passed down or is recorded in a WFST to be used in an upper level in case of failure at a lower level.</Paragraph>
      <Paragraph position="3"> The parser is strictly top-down, depth-first, one-stage parser with backtracking: differently from most principle-based parsers presented in (Berwick, Abney, and Tenny, 1991) which are two-stage parsers, our parser computes its representations in one pass. This makes it psychologically more realistic. The final output of the parsing process is an f-structure which serves as input to the binding module and logical form: in other words, it constitutes the input to the semantic component to compute logical relations. In turn the binding module may add information as to pronominal elements present in the structure by assigning a controller/binder in case it is available, or else the pronominal expression will be available for discourse level anaphora resolution.</Paragraph>
      <Paragraph position="4"> What are the main advantages of performing a topdown lookahead driven parse as compared to unification procedures applied to a LR parse table as commented in (Carroll, 2000)? First of all, our grammar is not a CF grammar seen that CF rules are multiplied by all different sentence positions at which they may occur: in order to do that, a NP may be called by a SUBJect, an OBJect, a NCOMPlement, an APPosition, a VOCative etc. so that different properties and constraints may be associated with each NP realization.</Paragraph>
      <Paragraph position="5"> Consider now the wellknown case of COMPlementizerLess complement clauses in English as represented by the following example (all examples are taken from Greval): (6) A Yale historian, writing a few years ago in The Yale Review, said We in New England have long since segregated our children.</Paragraph>
      <Paragraph position="6"> And now consider the case in which a sentential complement is used in subject position, as in (7) That any sort of duty was owed by his nation to other nations would have astonished a nineteenth century statesman.</Paragraph>
      <Paragraph position="7"> No rule accessing sentential complement would be used to look for subject complement clauses which are only accessed at sentence level as a special case of sentence structure. In order to take care of compless complement clauses the parser checks subcategorization frames, then in case the complementizer is missing, it activates a check for the semantic typology of verb predicates which allow the complementizer to be omitted, which coincides with bridge verbs or non-factive verbs.</Paragraph>
      <Paragraph position="8"> Now consider the case in which the complement clause follows the object (direct,indirect) NP, as in (8) He told the committee the measure would merely provide means of enforcing the escheat law which has been on the books since Texas was a republic.</Paragraph>
      <Paragraph position="9"> This strategy is organized in a similar way to checking for the attachment of a PP complement following a NP. The complementation information is turned into a &amp;quot;that&amp;quot; word in case of sentential complement, and into the whole set of prepositions subcategorized by the verb with PP complements.</Paragraph>
      <Paragraph position="10"> These words are used as Cues-set to prevent the NP from entering relative clause rules, or any PP headed with one of the prepositions listed in the Cues-set, after the head has been taken and the parser is in the complement block of rules. The Cues-set is passed as a list from the I_bar level down to the vp level and into the object NP if any.</Paragraph>
      <Paragraph position="11"> Now consider cases in which the parser has to choose between an OBJect NP/Sentence complement SUBJect in case the verb is compless as shown in: (9) Mitchell decried the high rate of unemployment in the state and said the Meyner administration and the Republican controlled State Senate Must share the blame for this.</Paragraph>
      <Paragraph position="12"> Or this sentence where the complement is started by a comma, and a vocative, (10) A man must be able to say, Father, I have sinned, or there is no hope for him.</Paragraph>
      <Paragraph position="13"> As said above, we ascertain the verb belongs to the semantic class of non-factive verbs and then look for a finite verb ahead before allowing the Sentence complement rule to be fired. Other similarly difficult cases that can be adequately treated in our parser are shown below, (11) I told him what Liston had said and he said Liston was a double crosser and said anything he (Liston) got was through a keyhole.</Paragraph>
      <Paragraph position="14"> where both the complement clause and the following relative clause are compless. Getting the final results reported in Table 2. with Greval took us one month/man work to account for rules and lexical entries missing in the parser. Some such coverage problems were caused by sentences like 12 and 13 below,  (12) Wagner replied, Can't you just see the headline City Hooked for $172,000 ? (13) Yet, I responded, could not similar things be said about the art of the past ? Or an imperative/exhortative followed by a question as in, (14) Take Augustine's doctrine of grace given and grace withheld : have you pondered the dramatic qualities in this theology ? or a subordinate clause followed by a question as in, (15) If he attaches little importance to personal liberty, why not make this known to the world ? Hard sentences to parse were the following ones, (16) Battalion Chief Stanton M. Gladden,  42, the central figure in a representation dispute between the fire fighters association and the teamsters union, suffered multiple fractures of both ankles.</Paragraph>
      <Paragraph position="15"> (17) He bent down, a black cranelike figure, and put his mouth to the ground.</Paragraph>
      <Paragraph position="16"> where in 16 there are two long parentheticals fairly hard to process before the main verb comes; in 17 the appositive comes after the verb and not after head noun, the pronoun &amp;quot;he&amp;quot;.</Paragraph>
      <Paragraph position="17"> New lexical entries had to be added to account for special multiwords basically in the area of grammatical function words. We also added some new lexical multiwords which caused the FSA disambiguation problems.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4. Evaluation and Discussion
</SectionTitle>
    <Paragraph position="0"> The parser parses 89% of all text top down: then it parses 9.3% of the remaining linguistic material bottom up and adds it up to the parsed portion of the current sentence. That may produce wrong results in case a list has been partially parsed by the top down parser. But it produces right results whenever any additional complete subordinate or coordinate sentence structure has been left over - which constitutes the majority of cases. Overall almost the whole text - 98.3% - is turned into semantically consistent structures which have already undergone Pronominal Binding at sentence level in their DAG structural representation.</Paragraph>
    <Paragraph position="1"> We find it very important to remark the fact that the performance of our parser is mainly to be appreciate for the high coverage. None of the statistically and stochastically based parser reported under (Preis, 2003) reached such a high score.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ALL-RELS GOLD VENICE CORRECT %PRECIS
RECALL/
GOLD
RECALL/
GUESS
</SectionTitle>
    <Paragraph position="0"> For the sake of comparison we also report the main data taken from the table presented under (Preis, 2003) to allow the reader to appreciate the results of our parser. In particular, the number of main Grammatical Relations treated in the previous test is half the number in comparison with ours. If we look at the easiest GR to parse, i.e. the NCSUBJ GR, we see that the number of cases found by the best parser in the previous test is by far lower that our result.</Paragraph>
    <Paragraph position="1"> The highest case of precision is for DOBJ in the BU parser which reaches 88.42% which is 8 points lower that our result. In absolute terms, limiting the comparison to the two most frequent GRs, the best  The recall is also accordingly lower in absolute terms: there are 277 cases of correct DOBJs in BU against our 331, and 702 cases of correct NCSUBJs - in this case it is the BC parser that gets best recall against our 883. And 277 and 702 are slightly better than chance - 67.72 and 67.63 respectively.</Paragraph>
    <Paragraph position="2"> The impression one gets from the performance of statistically and stochastically based parsers is that they are inherently unable to cope with deep linguistic information. They are certainly impossible to undergo substantial improvements. On the contrary, rule based parsers would benefit from additional subcategorization frames as in our case: and for all those constructions which require setting up of new additional peripheral rules in the grammar, they would typically increase their coverage, as did our parser.</Paragraph>
    <Paragraph position="3"> As a last comment, we started evaluating subsets of GREVAL corpus with the online version of &amp;quot;Connexor&amp;quot; dependency parser, on the assumption that that version would be identical or even better than the one commercially available. We did that because this parser is regarded the best dependency parser on the market. We tried out a subset of 50 sentences, and on a first perusal of the output we discovered that only 40 sentences contained correct, and fully connected representations. The remaining 10 sentences either presented unconnected heads, or misconnected ones due to wrong attachments. Some remarks on the possible reasons for that: g1 bottom up local parsing techniques are good at coping with typically hard to parse structures for a top down parser like coordinate structures but they are bad at computing long distance dependencies; g1 they are good at computing attachment whenever it is local, but they make mistakes when there are extraposed elements; g1 dependency parsing does not seem to obey to generally accepted grammaticality principles like the obligatoriness of SUBJect constituents, nor the need to provide some landing site for extracted wh- elements in relative and interrogative clauses; g1 control structures like small clauses for predicative complements and adjuncts are all attached locally, which is not always the case.</Paragraph>
    <Paragraph position="4"> So, even though word-level parsing may be more effective as to the number of connections (constituents) safely produced, without leaving off any fragment or skimmed fragment, it is nonetheless faced with the hard task of recomposing clause level control mechanisms which in a top down constituency-based parser are given for granted.</Paragraph>
    <Paragraph position="5"> The F-measure derived from our P and R according to the usual formula:</Paragraph>
    <Paragraph position="7"> is 89.38%, which is by far higher than the 75% reported in (Crouch et al., 2002) as being the best result obtained by linguistic parsers today.</Paragraph>
    <Paragraph position="8"> We are currently experimenting with a &amp;quot;mildly&amp;quot; topdown/bottomup version of the parser in which rather than starting from Clause level we search recursively for Arguments and Adjuncts. In other words, we look for fully semantically interpreted constituents in which choice for argumenthood has already been partially performed. In addition to collecting Arguments/Adjuncts, the new parser scatters in the output list punctuation marks and coordinate/subordinate words which are deemed responsible to determine clause boundaries. To that aim, we devised a procedure for clause creation under the restriction that a main tensed Verb constituent complex has been found. This can be iterated on the input list and the procedure may decide to fuse portion of the output list which have been left stranded without independent clause status, and append it to the preceding prospective clause.</Paragraph>
    <Paragraph position="9"> Interpretation procedures follows by recovering subcategorization frames for the main tensed verb and assignment of grammatical function and semantic roles takes place. The Clause level procedure is then followed by an Utterance level procedure that produces simple utterance or complex ones - coordinated or subordinated - according to the Clause input list.</Paragraph>
    <Paragraph position="10"> We experimented the new version of the parser with the Greval Corpus and discovered that in some cases it was much slower than the fully topdown version.</Paragraph>
    <Paragraph position="11"> However, we also recovered parsing time in highly ambiguous and complex sentences, where the &amp;quot;mildly&amp;quot; bottomup parser actually followed a totally linear behaviour: no increase in computation time resulted and the performance is only conditioned by the number of words/number of argumentsadjuncts/number of clauses to build. We haven't been able to compute this proportion systematically but will do so in the future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML