XML Viewer - w03-1904

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1904_metho.xml
Size: 18,678 bytes
Last Modified: 2025-10-06 14:08:41
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1904">
  <Title>Putting FrameNet Data into the ISO Linguistic Annotation Framework</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 A Data Category Specification for Frame
</SectionTitle>
    <Paragraph position="0"> Semantics in RDF The World Wide Web (WWW) contains a large amount of information which is expanding at a rapid rate. Most of that information is currently being represented using the Hypertext Markup Language (HTML), which is designed to allow web developers to display information in a way that is accessible to humans for viewing via web browsers. While HTML allows us to visualize the information on the web, it doesn't provide much capability to describe the information in ways that facilitate the use of software programs to find or interpret it. The World Wide Web Consortium (W3C) has developed the Extensible Markup Language (XML) which allows information to be more accurately described using tags. As an example, the word crawl on a web site might represent an offline search process (as in web crawling) or an exposition of a type of animate motion. The use of XML to provide metadata markup, such as for crawl, makes the meaning of the word unambiguous. However, XML has a limited capability to describe the relationships (schemas or ontologies) with respect to objects. The use of ontologies provides a very powerful way to describe objects and their relationships to other objects. The DAML language was developed as an extension to XML and the Resource Description Framework (RDF). The latest release of the language (DAML+OIL) (http://www.daml.org) provides a rich set of constructs with which to create ontologies and to markup information so that it is machine readable and understandable.</Paragraph>
    <Paragraph position="1"> Framenet-1 has been translated into DAML+OIL.</Paragraph>
    <Paragraph position="2"> We developed an automatic translator from FrameNet to DAML+OIL which is being updated to reflect FrameNet2 data. With periodic updates as the FrameNet data increases, we expect it to become useful for various applications on the Semantic Web. DAML+OIL is written in RDF (http://www.w3.org/TR/daml+oilwalkthru/#RDF1), i.e., DAML+OIL markup is a specific kind of RDF markup. RDF, in turn, is written in XML, using XML Namespaces (http://www.w3.org/TR/daml+oil-walkthru/#XMLNS), and URIs. Thus, our framenet declaration begins with an RDF start tag including several namespace declarations of the form:</Paragraph>
    <Paragraph position="4"> So in this document, the rdf: prefix should be understood as referring to things drawn from the namespace called http://www.w3.org/1999/02/22-rdf-syntaxns#. This is a conventional RDF declaration appearing verbatim at the beginning of almost every rdf document. The second and third declarations make similar statements about the RDF Schema and XML Schema datatype namespaces. The fourth declaration says that in this document, elements prefixed with daml: should be understood as referring to things drawn from the namespace called http://www.w3.org/2001/03/daml+oil#. This again is a conventional DAML+OIL declaration. We use the XML entity model to use shortcuts with referring to the URIs.2 The other DAML+OIL ontologies used in the FrameNet description include the DAML-S (http://www.daml.org/services) service ontologies, the OpenCYC DAML ontology (http:// www.cyc.com/2002/04/08/cyc.daml), and the SRI time ontology (http:// www.ai.sri.com/ daml/ontologies/ sribasic/1-0/Time.daml) which is currently being revised with the new DAML+OIL time ontology effort.</Paragraph>
    <Paragraph position="5"> http://www.icsi.berkeley.edu/ snarayan/frame-2.daml has a complete namespace and imported ontology list.</Paragraph>
    <Paragraph position="6"> The most general object of interest is a frame. We define the FRAME class as a daml:class We then define abunch of bookkeeping properties on the FRAME class. An example of the name property is shown below.</Paragraph>
    <Paragraph position="7">  entire path has to be specified.</Paragraph>
    <Paragraph position="8"> Roles are relations defined on frames ranging over the specific type of the filler. We use daml:objectProperty to define the roles of a frame. The domain of a role is its frame. We leave the type of the filler unrestricted at this level, allowing specific roles to specialize this further. Note that we use the daml:samePropertyAs relation to specify synonyms. The fragment below specifies that Frame Element, Role, and FE are synonyms.</Paragraph>
    <Paragraph position="9">  We use the various constructs daml:maxCardinality, daml:minCardinality, daml:cardinalityQ, etc. from DAML to specify cardinality restrictions on the fillers of a role property. The markup fragment below shows the specification of a single valued role.</Paragraph>
    <Paragraph position="10">  The relation between frames (such as ARREST) and CRIMINAL PROCESS is often captured by a set of bindings between frame elements (such as the arrested person is the same individual as the person charged who is the same individual as the defendant in a criminal process). To capture such bindings, we introduce a special relation called bindingRelation whose domain and range are roles (either from the same or different frames).</Paragraph>
    <Paragraph position="12"> By far the most important binding relation is the identification of roles (i.e. they refer to the same value (object)). This can be specified through the relation identify which is a subProperty of bindingRelation. Note that in order to do this, we have to extend the DAML+OIL language which does not allow properties to be defined over other properties. We use the DAML-S ontology primitive daml-s:sameValuesAs to specify the identify relations.</Paragraph>
    <Paragraph position="13">  In FrameNet, a frame may inherit (A ISA B) from other frames or be composed of a set of subframes (which are frames themselves). For instance, the frame CRIMINAL PROCESS has subframes that correspond to various stages (ARREST, ARRAIGNMENT, CHARGE, etc.). Subframe relations are represented using the  Note that the temporalOrdering property only says it is transitive, not that it is a transitive version of precedes. DAML+OIL does not currently allow us to express this relation. (see http://www.daml.org/2001/03/daml+oilwalkthru#properties). null Frame Elements may also inherit from each other. We use the rdfs:subPropertyOf to specify this dependences. For example, the following markup in DAML+OIL specifies that the role (Frame Element) MOTHER inherits from the role (Frame Element) PARENT. Note we can add further restrictions to the new role. For instance, we may want to restrict the filler of the MOTHER to be female (as opposed to animal for PARENT).</Paragraph>
    <Paragraph position="15"> With these basic frame primitives defined, we are ready to look at an example using the Criminal Process frames.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 An Example: The Criminal Process Frame
</SectionTitle>
      <Paragraph position="0"> The basic frame is the CRIMINAL PROCESS Frame. It is a type of background frame. CP is used as a shorthand for this frame.</Paragraph>
      <Paragraph position="1"> 3The subFrameOf relation has a direct translation to a richer semantic representation that is able to model and reason about complex processes (such as buying, selling, reserving tickets) and services on the web. While the details of the representation are outside the scope of the this paper, the interested reader can look at (Narayanan and McIlraith, 2002) for an exposition of the markup language and its operational semantics.</Paragraph>
      <Paragraph position="2">  The CRIMINALPROCESS frame has a set of associated roles. These roles include that of COURT, DEFENDANT, PROSECUTION, DEFENSE, JURY, and CHARGES. Each of these roles may have a filler with a specific semantic type restriction. FrameNet does not specify the world knowledge and ontology required to reason about Frame Element filler types. We believe that one of the possible advantages in encoding FrameNet data in DAML+OIL is that as and when ontologies become available on the web (uch as OpenCYC), we can link to them for this purpose.</Paragraph>
      <Paragraph position="3"> In the example fragment below we use the CYC Court-Judicial collection to specify the type of the COURT and the CYC Lawyer definition to specify the type restriction on the frame element DEFENSE. For illustrative purposes, the DAML+OIL markup below shows the use of a different ontology (from CYC) to restrict the defendant to be of type PERSON as defined in the example ontology. This restriction uses the DAML+OIL example from  The set of binding relations involves a set of role identification statements that specify that a role of a frame (subframe) has the same value (bound to the same object) as the role of a subframe (frame). We could specify these constraints either a) as anonymous subclass restrictions on the criminal process class (see http://www.daml.org/2001/03/daml+oil-ex for examples) or b) we could name each individual constraint (and thus obtain a handle onto that property). We chose the later method in our DAML+OIL encoding of FrameNet to allow users/programs to query any specific constraint (or modify it). Note also that the use of the dotting notation (A.b) to specify paths through simple and complex frames and is not fully supported in DAML+OIL (see http://www.daml.org/services/damls/2001/10/rationale.html and also (Narayanan and McIlraith, 2002) for more info).</Paragraph>
      <Paragraph position="4">  To specify the the relation precedes(Arrest, Arraignment) we restrict the property precedes within (the domain of) the ARREST frame to have as one of its range values the frame (class) ARRAIGNMENT. This is done using the property restriction feature with DAML+OIL as follows.</Paragraph>
      <Paragraph position="5">  With this markup of the ontology, we can create annotation instances for examples with targets that belong to the CRIMINALPROCESS (or its associated) frames.</Paragraph>
      <Paragraph position="6"> At the current stage, we have converted all of FrameNet 1 data (annotations and frame descriptions) to DAML+OIL. The translator has also been updated to handle the more complex semantic relations (both frame and frame element based) in FrameNet 2. We plan to release both the XML and the RDF-based DAML+OIL versions of all FrameNet 2 releases.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="6" type="metho">
    <SectionTitle>
4 Examples of Annotated Sentences
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Basic Annotation of Verb Arguments and
Complements as Triplets
</SectionTitle>
      <Paragraph position="0"> Consider the following sentence, which is annotated for the target nab, a verb in the ARREST frame; the frame elements represented are the arresting AUTHORITIES, the SUSPECT and the TIME when the event took place: [ Authorities Police] nabbed [ Suspect the man], who was out on licence from prison, [ Time when he returned home].</Paragraph>
      <Paragraph position="1"> The phrase who was out on licence from prison provides additional information about the SUSPECT, but it is not syntactically an argument or complement of the target verb nab, nor semantically an element of the ARREST frame, so it is not annotated.</Paragraph>
      <Paragraph position="2"> How do we intend to represent this in XML conforming to the proposed standards? The header of the file will refer to the FrameNet Data Category specification discussed in the last section, but hereafter we will omit the domain name space specifications and use a more human-readable style of XML. The conversion to the full ISO style should be straightforward.</Paragraph>
      <Paragraph position="3">  The entity &lt;lexunit-annotation&gt;, which comprises the rest of the file includes attributes giving the name of the lexical unit (nab), the name of the frame (ARREST), and the part of speech of the lemma (verb).</Paragraph>
      <Paragraph position="4"> The first included element is a definition of the lemma within the frame, seen on line 4.</Paragraph>
      <Paragraph position="5"> The entities contained within the lexunit-annotation are called subcorpora; each represents a particular syntactic pattern, combination of collocates, etc. In the case of nab, there are so few instances of the word that we have lumped them all into one subcorpus as indicated by the subcorpus name &amp;quot;all&amp;quot; on line 5. It might seem logical that the entities within the subcorpus should be sentences, but in fact, we recognize the possibility that one sentence might be annotated several times, for several targets. There might even be several instances of the same target lemma in the same sentence in the same frame (e.g. The FBI nabbed Jones in NYC, while the Mounties nabbed Smith in Toronto), each with its own set of FEs. Therefore, the next smaller entity is the annotation set (line 6).</Paragraph>
      <Paragraph position="6"> The annotation set4, shown below, consists of the &lt;sentence&gt;, which contains only the &lt;text&gt; of the sentence, and a set of layers, each consisting of a set of labels. Each label has attributes start and end, giving the stating and ending position in the text to which it is applied. This sentence is typical of the basic FrameNet annotation style, in that there are three main layers, one for frame elements (&amp;quot;FE&amp;quot;, line 8), one for the phrase type (PT) of each FE (line 22), and one for the grammatical function (GF) of each FE (line 15). In each case, there are three coextensive labels; thus the word Police, in text positions 0-5 expresses the FE AUTHORITIES (line 10), has the phrase type &amp;quot;NP&amp;quot; (line 24) and is the subject of the verb nab, which we refer to as external argument &amp;quot;Ext&amp;quot; (line 17). The other two frame elements are shown by similar triplets, SUSPECT-NP-Obj and TIME-Swh-Comp, the latter meaning a complement of the verb consisting of a clause (S-node) introduced by a WH-relative.</Paragraph>
      <Paragraph position="7">  resentation being distributed by FrameNet, which includes attributes on each label giving an ID number, the date and time of creation, the name of the annotator, etc. In these examples, we use several XML tags without defining them. Without going into unnecessary detail, we note here that they can be defined in the DCS and the Dialect specification as described in (Ide and Romary, 2001a). We are also using a condensed notation with multiple attributes on entities for reasons of space, although proper RDF requires that they be split out.</Paragraph>
      <Paragraph position="8"> There are three other layers shown in the example, none of which contain labels, called Sentence, Verb, and Other. The layer Target contains the single label Target; the fact that nab is the target word is indicated in the same way as the information about FEs.</Paragraph>
      <Paragraph position="9"> Note that this XML format is &amp;quot;standoff&amp;quot; annotation in the sense that the labels refer to text locations by character positions (allowing any number of labels on various layers, overlapping labels, etc.), but that the text and the annotations appear in the same document. This is contrary to the general sense of the ISO standard, which uses indirect pointers to an entirely separate document containing the primary data. The indirect approach has certain advantages, and where the primary data is audio or video, is virtually unavoidable. But in the case of the current FrameNet data, where the annotations all apply to individual sentences, there seem to be some advantages, at least for human readers, of having the text of the sentence and the annotation contained within a fairly low-level XML entity, allowing the reader to glance back and forth between them.5 In formulating standards for linguistic annotation, it might be wise to take these advantages and disadvantages into consideration; perhaps either situation might be allowable under the standard.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="6" type="sub_section">
      <SectionTitle>
4.2 Other Types of Annotation
</SectionTitle>
      <Paragraph position="0"> As the basic unit of annotation is the label, which can be applied to anything ranging from a single character to an entire sentence, and there are no a priori constraints on labels overlapping, a great variety of information can be represented in this way. We will not be able to demonstrate all the possibilities here, but we will give a some representative examples.</Paragraph>
      <Paragraph position="1"> In FrameNet, event nouns are annotated in the same frame (and hence with the same FEs) as the corresponding verbs; the main differences are that the syntactic patterns for the FEs of nouns are more varied, and (with rare exceptions), no FEs of nouns are required to be expressed. Consider the noun arrest, also in the ARREST frame, in the sentence: Two witnesses have come forward with information that could lead to [ Suspect the killer 's] arrest .</Paragraph>
      <Paragraph position="2"> In this case the SUSPECT is expressed as a possessive (the killer's; it could equally well have been in a PP headed by of (the arrest of the killer).</Paragraph>
      <Paragraph position="3"> &lt;annotationSet status=&amp;quot;MANUAL&amp;quot;&gt; 5The location of the sentences in the original corpora is still recoverable from the aPos attribute, which gives the absolute position from which the sentence was abstracted. The name of the corpus is given in another attribute which has been omitted in the example.</Paragraph>
      <Paragraph position="4">  In addition to marking the FE SUSPECT from ARREST, we could also annotate the same sentence again in the CAUSATION frame with the target lead, which would create an annotation set listed under the the LU lead to: Two witnesses have come forward with [ Cause information that] could lead [ Effect to the killer's arrest].</Paragraph>
      <Paragraph position="5"> The same sentence would be annotated in two different frames, and the semantics of the two frames could (in theory) be combined compositionally to get the semantics of the phrase information that could lead to the killer's arrest. Similar processes of annotating in multiple frames with targets come forward (and possibly witness as well) should yield a full semantics of the sentence.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML