File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0407_metho.xml

Size: 29,659 bytes

Last Modified: 2025-10-06 14:15:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0407">
  <Title>FAME: a Functional Annotation Meta-scheme for multi-modal and multi-lingual Parsing Evaluation</Title>
  <Section position="3" start_page="0" end_page="45" type="metho">
    <SectionTitle>
2 FAME: Basics
</SectionTitle>
    <Paragraph position="0"> What we intend to offer here is not yet another off-the-shelf annotation scheme, but rather a formal framework for comparison and evaluation of existing annotation practices at the level of linguistic analysis traditionally known as &amp;quot;functional&amp;quot;. Hereafter, this framework will be referred to as an annotation &amp;quot;meta-scheme&amp;quot;.</Paragraph>
    <Section position="1" start_page="0" end_page="39" type="sub_section">
      <SectionTitle>
2.1 Why functional evaluation
</SectionTitle>
      <Paragraph position="0"> The choice of evaluating parsing systems at the functional level is largely motivated on the basis of a number of practical concerns. We contend that information about how functional relations are actually instantiated in a text is important for the following reasons: * it is linguistically valuable, both as an end in itself and as an intermediate linguistic resource; in fact, it is sufficiently close to semantic representations to be used as an intermediate stage of analysis in systems requiring full text understanding capabilities;  * it is likely to become a more and more heavily used information asset in its own right for NLP applications: a shift of emphasis from purely pattern matching methods operating on n-word windows to functional information about word pairs has recently been witnessed both in the context of information retrieval/filtering systems (Grefenstette, 1994) and for the purposes of word sense disambiguation (see the last SEN-SEVAL and ROMANSEVAL evaluation campaigns); null * it is comparatively easy and &amp;quot;fair&amp;quot; to evaluate since it overcomes some of the shortcomings of constituency-based evaluation (Carroll and Briscoe, 1996; Carroll et al., 1998; Sampson, 1998; Lin, 1998); * it represents a very informative &amp;quot;lowest common ground&amp;quot; of a variety of different syntactic annotation schemes (Lin, 1998); * it is naturally multi-lingual, as functional relations probably represent the most significant level of syntactic analysis at which cross-language comparability makes sense; * it permits joint evaluation of systems dealing with both spoken and written language. Spoken data are typically fraught with cases of disfluency, anacoluthon, syntactic incompleteness and any sort of non-canonical syntactic structure (Antoine, 1995): the level of functional analysis naturally reflects a somewhat standardized representation, which abstracts away from the surface realization of syntactic units in a sentence, thus being relatively independent of, and unconcerned with disfluency phenomena and phrase partials (Klein et al., 1998); * it is &amp;quot;lexical&amp;quot; enough in character to make provision for partial and focused annotation: since a functional relation always involves two lexical heads at a time, as opposed to complex hierarchies of embedded constituents, it is comparatively easy to evaluate an annotated text only relative to a subset of the actually occurring headwords, e.g. those carrying a critical information weight for the intended task and/or specific domain.</Paragraph>
    </Section>
    <Section position="2" start_page="39" end_page="40" type="sub_section">
      <SectionTitle>
2.2 Why an* annotation meta-scheme
</SectionTitle>
      <Paragraph position="0"> FAME is designed to meet the following desiderata: * provide not only a measure of coverage but also of the utility of the covered information as opposed to missing information; * make explicit, through annotation, information which is otherwise only indirectly derivable from the parsed text; * factor out linguistically independent (but possibly correlated) primitive dimensions of functional information.</Paragraph>
      <Paragraph position="1"> All these requirements serve the main purpose of making evaluation open to both annotationdependent and task-dependent parameterization. This is felt important since the definition of closeness to a standard, and the utility of an analysis that is less-than-perfect along some dimension can vary from task to task, and, perhaps more crucially, from annotation scheme to annotation scheme.</Paragraph>
      <Paragraph position="2"> The basic idea underpinning the design of the annotation meta-scheme is that information about how functional relations are actually instantiated in context can be factored out into linguistically independent levels. In many cases, this can in fact be redundant, as information at one level can be logically presupposed by a piece of information encoded at another level: for example, &amp;quot;nominative case&amp;quot; is often (but not always) a unique indicator of &amp;quot;subjecthood&amp;quot;, and the same holds for grammatical agreement. Yet, there is a general consensus that redundancy should not be a primary concern in the design of a standard representation, as syntactic schemes often differ from each other in the way levels of information are mutually implied, rather than in the intrinsic nature of these levels (Sanfilippo et al., 1996). By assuming that all levels are, in a sense, primitive, rather than some of them being derivative of others, one provides considerable leeway for radically different definitions of functional relations to be cast into a common, albeit redundant, core of required infor* mation. We will return to this point in section 3 of the paper.</Paragraph>
      <Paragraph position="3"> To be more concrete, a binary functional relationship can be represented formally as consisting of the following types of information: i. the unordered terms of the relationship (i.e. the linguistic units in text which enter a given functional relationship): example (give, Mary); ii. the order relationship between the terms considered, conveying information about the head and the dependent: example &lt;give, Mary&gt;; iii. the type of relationship involved: example, the functional relation of the pair (give, Mary) in the sentence John gave the book to Mary is &amp;quot;indirect object&amp;quot;; iv. morpho-syntactic features associated with the dependent and the head; e.g. the dependent in the pair (give, Mary) is &amp;quot;non-clausal&amp;quot;; v. the predicate-argument status of the terms involved: for example give(John, book, Mary) in John gave the book to Mary.</Paragraph>
      <Paragraph position="4"> Most available tag taxonomies for functional annotation (such as those provided by, e.g., Karls- null son's Constraint Grammar (Karlsson et al., 1995), or the SPARKLE annotation scheme (Carroll et al., 1996), to mention but two of them) typically collapse the levels above into one level only, for reasons ranging from a theoretical bias towards a maximally economic description of the phenomena in question or a particular view of the way syntactic phenomena are mutually implied from a logical standpoint, to choices chiefly motivated by the intended application. A typical example of this is the tag xcomp in the SPARKLE scheme, which (following LFG) covers all subcategorized open predicates: namely, traditional predicative complements (whether subject or object predicative), and unsaturated clausal complements, such as embedded infinitival and participial clauses (as opposed to, e.g., that-clauses). In Constraint Grammar, predicative nominal and adjectival phrases are tagged as &amp;quot;subject complement&amp;quot; or &amp;quot;object complement&amp;quot;, while, say, controlled infinitive clauses, as in Mary wants to read, are marked functionally as an &amp;quot;object&amp;quot; of the main verb. Any context-free attempt to map SPARKLE xcomp onto a Constraint Grammar tag, would inevitably be one-to-many and not necessarily information-preserving. Clearly, both these aspects make it very hard to provide any sort of fair baseline for comparing a SPARKLE annotated text against the same text tagged with Constraint Grammar labels.</Paragraph>
      <Paragraph position="5"> The design of a meta-scheme is intended to tackle these difficulties by spelling out the levels of information commonly collapsed into each tag. More concretely, SPARKLE xcomp (want, leave), for the sentence She wants to leave, appears to convey two sorts of information: (a) that leave is a complement of want, (b) that leave is an open predicate. Both pieces of information can be evaluated independently against levels i, ii, iii and v above.</Paragraph>
      <Paragraph position="6"> Surely, a translation into FAME is not guaxan* teed to always be information preserving. For example, xcomp(want,leave) can also be interpreted as conveying information about the intended functional control of leave, given some (lexical) information about the main verb want, and some (contextual) information concerning the absence of a direct object in the sentence considered. However, this sort of context-sensitive translation would involve a more or less complete reprocessing of the entire output representation.! In our view, a partial context-free translation into FAME represents a sort of realistic compromise between a fairly uninformative one-to-many mapping and the complete translation of the information conveyed by one scheme into another 1In fact, the SPARKLE annotation scheme annotates control information explicitly, as illustrated later in the paper: the point here is simply that this information cannot be de~ rived directly from xcomp(want,leave).</Paragraph>
      <Paragraph position="7"> format.</Paragraph>
    </Section>
    <Section position="3" start_page="40" end_page="42" type="sub_section">
      <SectionTitle>
2.3 Information layers in FAME
</SectionTitle>
      <Paragraph position="0"> To date, FAME covers levels i-iv only. The building blocks of the proposed annotation scheme are functional relations, where a functional relation is an asymmetric binary relation between a word called HEAD and another word called DEPENDENT. We assume only relations holding between lexical or full words. Therefore, we exclude functional relations involving grammatical elements such as determiners, auxiliaries, complementizers, prepositions, etc. The information concerning these elements is conveyed through features, as described below in section 2.3.3.</Paragraph>
      <Paragraph position="1"> Each functional relation is expressed as follows: dep_type (lex_head.&lt;head_features&gt;, dependent.&lt;dep_features&gt;) Dep_type specifies the relationship holding between the lexical head (lex__head) and its dependent (dependent). The head and the dependent of the relation are further specified through a (possibly empty) list of valued features (respectively head_features and dep..features), which complement functional information.</Paragraph>
      <Paragraph position="2">  Dep_types are hierarchically structured to make provision for underspecified representations of highly ambiguous functional analyses (see further below).</Paragraph>
      <Paragraph position="3"> The hierarchy of relations is given in figure 1 below. In the hierarchy, the function subj (for &amp;quot;subject&amp;quot;)  is opposed to other grammatical relations by being assigned a higher prominence in the taxonomy, as customary in contemporary grammar theories (e.g.</Paragraph>
      <Paragraph position="4"> HPSG, GB). Moreover, modifiers and arguments are subsumed under the same comp node (mnemonic for complement), allowing for the possibility of leaving underspecified the distinction between an adjunct and a subcategorised argument in those cases where the distinction is difficult to draw in practice. In turn, the node arg (for argument) is split into pred, subsuming all and only classical predicative complements, and non-pred, further specified into dobj  (for direct objects), iobj (for indirect objects) and oblobj (for oblique arguments).</Paragraph>
      <Paragraph position="5"> The hierarchy of figure 2.3.1 is a revision of the SPARKLE functional hierarchy (Carroll et al., 1996), in the light of the methodological points raised in section 2.2. The main point of departure can be found under the node comp, which, in SPARKLE, dominates the nodes obj and clausal, thus reflecting a view of predicative complements as small clauses, to be assimilated with other unsaturated clausal constructions such as infinitival and participial clauses. This is in clear conflict with another grammatical tradition that marks clausal complements with the functional relations also assigned to non clausal complements, when the latter appear to be in a parallel distribution with the former, as in I accept his position and I accept that he leaves, where both his position and that he leaves are tagged as objects (Karlsson et al., 1995). This is a typical example of how functions may differ due to a difference in the levels of the linguistic information taken to be criterial for tag assignment. As we will see in more detail in section 2.3.2, the FAME hierarchy circumvents the problem by assigning all non-subject clausal complements the tag arg, which subsumes both traditional predicatives (pred) and non clausal arguments (non-pred), thus granting sentential complements a kind of ambivalent (underspecifled) functional status.</Paragraph>
      <Paragraph position="6">  In what follows we sketchily define each functional relation; examples are provided for non generic nodes of the hierarchy only.</Paragraph>
      <Paragraph position="7"> dep(head,dependent) is the most generic relation between a head and a dependent, subsuming the distinction between a subject and a complement, subj(head,dependent) is the relation between a verb predicate and its subject: subj (arrive, John) John arrived in Paris subj (employ,IBM) IBM employed 10 C programmers subj (employ,Paul) Paul was employed by IBM Subj refers to the superficial subject of a verb, regardless of the latter being used in the active or passive voice. Moreover, it can also be used to mark subject control relations and, possibly, raising to object/subject Phenomena, as exemplified below: sabj (leave, John) John promised Mary to leave subj (leave,Mary) John ordered Mary to leave subj (be,her) John believes her to be intelligent subj (be, John) John seems to be intelligent Also clausal subjects are marked as sub j: subj (mean,leave) that Mary left meant she was sick subj (require,win) to win the America's Cup requires  heaps of cash comp (bead, dependent) is the most generic relation between a head and a complement, whether a modifier or a subcategorized argument.</Paragraph>
      <Paragraph position="8"> rood(head, dependent) holds between a head and its modifier, whether clausal or non-clausal; e.g.</Paragraph>
      <Paragraph position="9"> rood(flag,red) a red flag rood(walk,slowly) walk slowly rood(walk,John) walk with John mod(Picasso,painter) Picasso the painter mod(valk,talk) walk while talking Mod is also used to encode the relation between an event noun (including deverbal nouns) and its participants, and the relation between a head and a semantic argument which is syntactically realised as a modifier (as in the passive construction), e.g.: mod(destruction,city) the destruction of the city rood(kill,Brutus) he was killed by Brutus arg(head,dependent) is the most generic relation between a head and a subcategorized argument; besides functional underspecification, it is used to tag the syntactic relation between a verbal head and a non-subject clausal argument (see section 2.3.1 above): arg(say,accept) He said that he will accept the job pred(bead,dependent) is the relation which holds between a head and a predicative complement, be it subject or object predicative, e.g.</Paragraph>
      <Paragraph position="10"> pred(be,intelligent) John is intelligent pred(consider,genius) John considers Mary a genius null nonpred(head,dependent) is the relation which holds between a head and a non predicative complement. null dobj(head,dependent) is the relation between a predicate and its direct object (always non-clausal), e.g.: dobj (read,book) John read many books iobj (head,dependent) is the relation between a predicate and the indirect object, i.e. the complement expressing the recipient or beneficiary of the action expressed by the verb, e.g.</Paragraph>
      <Paragraph position="11"> iobj (speak,Mary) John speaks to Mary iobj (give,Mary) John gave Mary the contract iobj (give,Mary) John gave the contract to Mary oblobj (bead,dependent) is the relation between a predicate and a non-direct non clausal complement, e.g.</Paragraph>
      <Paragraph position="12"> oblobj (live,Rome) John lives in Rome oblobj (inforra,ruu) John informed me of his run In order to represent conjunctions and disjunctions, FAME avails itself of the two symmetric relations conj and dis j, lying outside the dependency hierarchy. Consider, for instance, the FAME representation of the following sentence, containing a conjoined subject: John and Mary arrived&amp;quot;</Paragraph>
      <Paragraph position="14"> The FAME representation of the sentence John or Mary arrived differs from the previous one only in the type of relation linking John and Mary: namely, disj (John,Mary).</Paragraph>
      <Paragraph position="15">  In FAME, a crucial role is played by the features associated with both elements of the relation. Dep(endent)_features are as follows: * Intro(ducer): it refers to the grammatical word (a preposition, a conjunction etc.) which possibly introduces the dependent in a given functional relation, e.g.</Paragraph>
      <Paragraph position="16"> iobj (give, Mary.&lt;intro=' 'to' '&gt;) give to</Paragraph>
    </Section>
    <Section position="4" start_page="42" end_page="45" type="sub_section">
      <SectionTitle>
Mary
</SectionTitle>
      <Paragraph position="0"> arg(say,accept.&lt;intro=' 'that' '&gt;) Paul said that he accepts his offer $ Case: it encodes the case of the dependent, e.g. iobj (dare, gli.&lt;case=DAT&gt;) dargli 'give to him' * Synt__real: it refers to a broad classification of the syntactic realization of a given dependent, with respect to its being clausal or non-clausal, or with respect to the type of clausal structure (i.e. whether it is an open function or a closed function). Possible values of this feature are: - x: a subcategorized argument or modifier containing an empty argument position which must be controlled by a constituent outside it, e.g.</Paragraph>
      <Paragraph position="1"> arg (decide, leave. &lt;synt_real=x&gt;) John decided to leave c: a subcategorized argument or modifier which requires no control by a constituent outside it, e.g.</Paragraph>
      <Paragraph position="2"> arg(say, leave.&lt;synt_real=c&gt;) John said he left - nc: a non-clausal argument or modifier, e.g.</Paragraph>
      <Paragraph position="3"> dobj (eat,pizza. &lt;synt_real=nc&gt;) John ate a pizza Head_features are as follows: * Diath: it specifies the diathesis of a verbal head, e.g.</Paragraph>
      <Paragraph position="4"> subj (employ.&lt;diath=passive&gt;, Paul) Paul was employed by IBM subj (employ.&lt;diath=active&gt;, IBM) IBM employed Paul * Person: it specifies the person of a verbal head, e.g.</Paragraph>
      <Paragraph position="5"> subj (eat. &lt;person=3&gt;, he) he eats a pizza * Number: it specifies the number of a verbal head. e.g.</Paragraph>
      <Paragraph position="6"> subj (eat.&lt;number=sing&gt;, he) he eats a pizza * Gender: it specifies the gender of a head, e.g. subj (arrivare.&lt;gender=fem&gt;, Maria) Maria arrivata 'Maria has come' 3 FAME at work  Theory-neutrality Theory-neutrality is an often emphasised requirement for reference annotation schemata to be used in evaluation campaigns (see GRACE, (Adda et al., 1998)). The problem with theory neutrality in this context is that, although some agreement can be found on a set of basic labels, problems arise as soon as the definition of these labels comes in. For example, the definition of &amp;quot;subject&amp;quot; as a noun constituent marked with nominative case is not entirely satisfactory, since a system might want to analyse the accusative pronoun in John believes her to be intelligent as the subject of the verb heading the embedded infinitival clause (as customary in some linguistic analyses of this type of complements). Even agreement, often invoked as a criterial property for subject identification, may be equally tricky and too theory-loaded for purposes of parser comparison and evaluation.</Paragraph>
      <Paragraph position="7"> The approach of FAME to this bunch of issues is to separate the repertoire of functional relation types (labels), from the set of morpho-syntactic features associated with the head and dependent, as shown in the examples below: subj (be, she. &lt;case=accusative&gt;) John believes her to be intelligent  subj (be,she.&lt;case=nominative&gt;) She seems to be intelligent By doing this way, emphasis is shifted from theory-neutrality (an almost unattainable goal) to modularity of representation: a functional representation is articulated into different information levels, each factoring out different but possibly inter-related linguistic facets of functional annotation. Intertranslatability A comparative evaluation campaign has to take into account that participant systems may include parsers based on rather different approaches to syntax (e.g. dependencybased, constituency-based, HPSG-Iike, LFG-like, etc.) and applied to different languages and test corpora. For a comparative evaluation to be possible, it is therefore necessary to take into account the specificity of a system, while at the same time guaranteeing the feasibility and effectiveness of a mapping of the system output format onto the reference annotation scheme. It is important to bear in mind at this stage that: * most broad-coverage parsers are constituencybased; null * the largest syntactic databases (treebanks) use constituency-based representations.</Paragraph>
      <Paragraph position="8"> It is then crucial to make it sure that constituency-based representations, or any other variants thereof, be mappable onto the functional reference annotation recta-scheme. The same point is convincingly argued for by Lin (1998), who also provides an algorithm for mapping a constituency-based representation onto a dependency-based format. To show that the requirement of intertranslatability is satisfied by FAME, we consider here four different analyses for the sentence John tried to open the window together with their translation equivalent in the FAME format: null  Let us suppose now that the reference analysis for the evaluation of the same sentence in FAME is as follows: subj (try, John) arg (try, open. &lt;introducer=&amp;quot;to&amp;quot;, synt_real=x&gt;) subj (open, John) dobj (open, window) Notice that this representation differs from the output of the ANLT Parser and of the Finite State Constraint Grammar Parser mainly because they both give no explicit indication of the control relationship between the verb in the infinitive clause and the matrix subject. This information is marked in the output of both the Fast Partial Parser and the PENN predicate-argument tagging. Note further that the Fast Partial Parser gives a different interpretation of the infinitival complement, which is marked as being modified by try, rather than being interpreted as a direct object of try.</Paragraph>
      <Paragraph position="9"> FAME does justice to these subtle differences as follows. First, it should be reminded that the FAME equivalents given above are in fact shorthand representations. Full representations are distributed over four levels, and precision and recall are to be gauged jointly relative to all such levels. To be concrete, let us first show a full version of the FAME standard representation for the sentence John tried to open the window (cf. Section 2.2): i. (try, John)  ii. &lt;try,John&gt; iii. subj i. (try,open) ii. &lt;try,open&gt; iii. arg iv. open.&lt;introducer=&amp;quot;to',synt_real=x&gt; i. (open,John) ii. &lt;open,John&gt; +-ii. subj i. (open,window) ii. &lt;open,window&gt; +-ii. dobj  Note first that information about the unsaturated clausal complement to open is separately encoded as synt_real=x in the standard representation. The failure to explicitly annotate this piece of information incurred by ANLT and the Constraint Grammar Parser will then be penalised in terms of recall, but would eventually not affect precision. By the same token, the subject control relation between John and open is recalled only by the Fast Partial Parser and PENN, and left untagged in the remaining schemes, thus lowering recall. The somewhat unorthodox functional dependency between try and open proposed by the Fast Partial Parser will receive the following full-blown FAME translation: mod (try,open) &lt;open,try&gt; When compared with the standard representation, this translation is a hit at the level of identification of the unordered dependency pair (try,open), although both the order of elements in the pair (&lt;open,try&gt;) and their functional dependency (rood) fail to match the standard. On this specific dependency, thus, recall will be 1/2. As a more charitable alternative to this evaluation, it can be suggested that the difference between the FAME standard and the Fast Partial Parser output is the consequence of theory internal assumptions concerning the analysis of subject-control structures, and that this difference should eventually be leveled out in the translation into FAME. This may yield a fairer evaluation, but has the drawback, in our view, of obscuring an important difference between the two representations.</Paragraph>
      <Paragraph position="10"> Evaluation of dialogue systems Dialogue management systems have to be able to deal with both syntactic and semantic information at the same time. These two levels of information are usually dealt with separately for reasons of higher ease of representation, and ease of change, updating and adaptation to different domains and different languages. Nonetheless, the formalisms used for syntax and semantics must have a certain degree of similarity and some additional knowledge about the relationships between syntax and semantics is necessary. An example is provided by what has been done in the ESPRIT SUNDIAL project (Peckam, 1991), where Syntax is defined using a dependency grammar augmented with morphological agreement rules; Semantics is declared through case frames (Fillmore, 1968; Fillmore, 1985) using a conceptual graph formalism. An additional bulk of knowledge, called mapping knowledge, specifies possible links between the symbols of the dependency grammar and the concepts of case frames. In this way syntactic and semantic controls are performed at the same time, avoiding the generation of parse trees that must afterwards be validated semantically. The FAME meta-scheme fits in comparatively well with this approach to parsing, as (a) functional annotation is readily translatable into dependency-like tags, and (b) the scheme makes provision for integration of syntactic and semantic information.</Paragraph>
      <Paragraph position="11"> Furthermore, the lexical character of FAME functional analysis as a dependency between specific headwords, makes annotation at the functional level compatible with score driven, middle-out parsing algorithms, whereby parsing may &amp;quot;jump&amp;quot; from one place to another of the sentence, beginning, for example, with the best-scored word, expanding it with adjacent words in accordance with the language model (Giachin, 1997). Scoring can be a function of the reliability of speech recognition in the word lattice, so that the parser can start off from the most-reliably recognized word(s). Alternatively, higher scores can be assigned to the most relevant content words in the dialogue, given a specific domain/task at hand, thus reducing the complexity space of parses.</Paragraph>
      <Paragraph position="12"> Use of underspecification FAME hierarchical organization of functional relations makes it possible to resort to underspecified tags for notoriously hard cases of functional disambiguation. For example, both Gianni and Mario can be subject or object in the Italian sentence Mario, non l'ha ancora visto, Gianni, which can mean both 'Mario has not seen Gianni yet' and 'Gianni has not seen Mario yet'. In this case, the parser could leave the ambiguity unresolved by using the underspecified functional relation dep, e.g. dep(vedere,Mario) and dep (vedere, Gianni). Similarly, the underspecified relation comp comes in handy for those cases where it is difficult to draw a line between adjuncts and subcategorized elements. This is a crucial issue if one considers the wide range of variability in the subcategorization information contained by the lexical resources used by participant systems. Given  the sentence John pushed the cart to the station, for example, a comp relation is compatible both with an analysis where to the station is tagged as a modifier, and with an analysis which considers it an argument. We already considered (section 2.3.1) the issue of tagging sentential complements as arg, as a way to circumvent the theoretical issue of whether the functional relations of clauses should be defined on the basis of their predicative status, or, alternatively, of their syntactic distribution. To sum up, underspecification thus guarantees a more flexible and balanced evaluation of the system outputs, especially relative to those constructions whose syntactic analysis is controversial.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML