XML Viewer - n04-2009

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-2009_metho.xml
Size: 19,649 bytes
Last Modified: 2025-10-06 14:08:53
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-2009">
  <Title>Construction of Conceptual Graph representation of texts</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 System overview
</SectionTitle>
    <Paragraph position="0"> We use a two-step approach for conceptual graph representation of texts: first, by using VerbNet and WordNet, we identify the semantic roles in a sentence, and second, using these semantic roles and a set of syntactic/semantic rules we construct a conceptual graph.</Paragraph>
    <Paragraph position="1"> The general architecture of the system is represented in Figure 1.</Paragraph>
    <Paragraph position="2"> To apply our algorithms we use documents from two corpora in different domains. The first corpus is the freely available Reuters-21578 text categorization test collection (Reuters, 1987). The other corpus we use is the collection of aviation incident reports provided by the Irish</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Air Accident Investigation Unit (AAIU) (2004) .
</SectionTitle>
      <Paragraph position="0"> All documents are converted to XML format and sentential boundaries are identified. The documents are then parsed using Eugene Charniak's maximum entropy inspired parser (Charniak, 2000). This probabilistic parser produces Penn tree-bank style trees and achieves 90.1% average accuracy for sentences not exceeding 40 words long and 89.5% for sentences with length under 100 words when trained and tested on the Wall Street Journal treebank.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Semantic role identification
</SectionTitle>
    <Paragraph position="0"> The problem of automatic semantic role identification is an important part of many natural language processing systems and while recent syntactic parsers can correctly label over 95% of the constituents of a sentence, finding a representation in terms of semantic roles is still unsatisfactory. null There are number of quite different existing approaches for identifying semantic roles. The traditional parsing approaches, such as HPSG grammars and Lexical functional grammars, to a certain extent all suggest semantic relationships corresponding to the syntactic ones. They rely strongly on manually-developed grammars and lexicons, which must encode all possible realisations of the semantic roles. Developing such grammars is a time-consuming and tedious process and such systems usually work well within limited domains only.</Paragraph>
    <Paragraph position="1"> The data-driven approach is an alternative approach, based on filling semantic templates. Applying such a model to information extraction, in AutoSlog Riloff (1993) builds a list of patterns for filling in semantic slots in a specific domain, as well as a method for automatic acquisition of case frames (Riloff and Schmelzenbach, 1998). In the domain of the Air Traveler Information System, Miller at al. (1996) apply statistical methods to compute the probability of a constituent in order to fill in a semantic slot within a semantic frame.</Paragraph>
    <Paragraph position="2"> Gildea and Jurafsky (2000, 2002) describe a statistical approach for semantic role labelling using data collected from FrameNet. They investigate the influence of the following features for identification of a semantic role: phrase type, grammatical function (the relationship of the constituent to the rest of the sentence), position in the sentence, voice and head word, as well as a combination of features. They also describe a model for estimating the probability a phrase to be assigned a specific semantic role.</Paragraph>
    <Paragraph position="3"> The approach we propose for semantic role identification uses information about each verb's behaviour, provided in VerbNet, and the WordNet taxonomy when deciding whether a phrase can be a suitable match for a semantic role.</Paragraph>
    <Paragraph position="4"> VerbNet (Kipper et al., 2000) is a computational verb lexicon, based on Levin's verb classes (Levin, 1993), that contains syntactic and semantic information for English verbs. Each VerbNet class defines a list of members, a list of possible thematic roles, and a list of frames (patterns) of how these semantic roles can be realized in a sentence. WordNet (Fellbaum, 1998) is an English lexical database containing about 120 000 entries of nouns, verbs, adjectives and adverbs, hierarchically organized in synonym groups (called synsets), and linked with relations, such as hypernym, hyponym, holonym and others.</Paragraph>
    <Paragraph position="5"> The algorithm for semantic role identification of a sentence that we propose consists of the following three steps:  1. Firstly, for each clause in the sentence we identify the main verb and build a sentence pattern using the parse tree; 2. Secondly, for each verb in the sentence we extract a list of possible semantic frames from VerbNet, together with selectional restrictions for each semantic role; 3. Thirdly, we match the sentence pattern to each of  the available semantic frames, taking into account the semantic role's constraints. As a result we are presented with a list of all possible semantic role assignments, from which we have to identify the correct one.</Paragraph>
    <Paragraph position="6"> These steps are described in more detail in the following sub-sections.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Constructing sentence patterns for the verbs in
</SectionTitle>
      <Paragraph position="0"> a sentence As mentioned earlier, during the pre-processing stage we produce a parse tree for each sentence using the Charniak parser. From this parse tree for each clause of the sentence we construct a sentence pattern, which is a flat parse representation that identifies the main verb and the other main categories of the clause. For example, from the parse tree for the sentence USAir bought Piedmont for 69 dlrs cash per share we construct the following pattern:</Paragraph>
      <Paragraph position="2"> As a sentence can have subordinate clauses, we may have more than one syntactic pattern per sentence. Each such pattern is processed individually.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Extracting VerbNet semantic role frames
</SectionTitle>
      <Paragraph position="0"> Each verb can be described in VerbNet as a member of more than one class (for example the verb make is listed as a member of the verb classes dub-29.3 and build-26.1, each of which correspond to different verb senses), and therefore the list of its possible semantic frames is a combination of the semantic frames defined in each of the classes in which it participates (currently we do not distinguish between different verb senses and therefore do not process the WordNet sense information attached to each verb class member).</Paragraph>
      <Paragraph position="1"> We extract all the semantic frames in a class and considers them to be possible semantic frames for each of the verbs that are members of this class. For example, for all the verbs that are members of the VerbNet class get13.5.1 (including the verb buy) we extract the semantic  tracted for the verbs in class get-13.5.1 The verb classes also define a list of selectional constraints each semantic roles should satisfy. For example, the roles defined in the VerbNet class get-13.5.1 should satisfy the restrictions shown in Figure 3.</Paragraph>
      <Paragraph position="2"> Some frames define additional restrictions local to the frame. In this case these restrictions are combined with the restrictions defined in the frames.</Paragraph>
      <Paragraph position="3">  defined in class get-13.5.1</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Matching algorithm
</SectionTitle>
      <Paragraph position="0"> The matching algorithm matches the sentence pattern against each of the possible semantic role frames extracted from VerbNet. We independently match the constituents before and after the verb in the sentence pattern to the semantic roles before and after the verb in the semantic role frame.</Paragraph>
      <Paragraph position="1"> If the number of the available constituents in the sentence pattern is less than the number of the required slots in the frame, the match fails.</Paragraph>
      <Paragraph position="2"> If there is more than one constituent available to fill a slot in a semantic frame, they are assigned priorities using heuristic rules. For example, in the cases where we have a choice of a few possible role fillers for the Agent, a higher weight is given to noun phrases, especially if they are marked as proper nouns (NNP) or contain at least one proper noun.</Paragraph>
      <Paragraph position="3"> If, for a semantic frame, we find a constituent for each of the semantic role slots that complies with the selectional constraints, the algorithm considers this a possible match. Currently, if the algorithm returns more than one match, we manually select the best one.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Selectional constraints check
</SectionTitle>
      <Paragraph position="0"> The selectional constraints check verifies if a candidate constituent for a thematic role fulfills the selectional constraints assigned to this role. For example, a common requirement for a constituent to fill the role of Agent is to be of type animate or organization.</Paragraph>
      <Paragraph position="1"> The selectional constraints check is implemented using one or combination of the following techniques: hypernym relations defined in WordNet, pattern matching techniques, syntactic rules and some heuristics.</Paragraph>
      <Paragraph position="2"> For example, the restriction machine is a type restriction and is fulfilled if the word represented by the constituent is a member of a synset that is a hyponym of the synset containing the word machine.</Paragraph>
      <Paragraph position="3"> Other restrictions, like infinitival and sentential, are resolved only by checking the syntactic parse structure of the parse tree.</Paragraph>
      <Paragraph position="4"> Restrictions such as animate and organization are resolved by applying a combination of the synset hierarchy in WordNet and pre-compiled lists of organization and personal names, and if no satisfactory answer is found, using heuristics to identify if the phrase contains proper nouns.</Paragraph>
      <Paragraph position="5"> We also check for a suitable preposition before the constituent to be matched. For example, for the frame Agent V Topic Prep(to) Recipient the constituent filling the semantic role of Recipient should be a prepositional phrase headed by the preposition to (e.g. Bob said a few words to Mary).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Building conceptual graphs
</SectionTitle>
    <Paragraph position="0"> The previous section describes the process of identifying the semantic roles of the constituents in a sentence.</Paragraph>
    <Paragraph position="1"> These roles are used to build a conceptual graph representation of the sentence by applying series of transformations, starting with more generic concepts and relations and replacing them with more specific ones.</Paragraph>
    <Paragraph position="2"> The conceptual graph is built through the following steps: a0 Step 1 - For each of the constituents of the sentence we build a conceptual graph representation Each phrase (part of the sentence) should be represented by a conceptual graph. This is done recursively by analysing the syntactical structure of the phrase.</Paragraph>
    <Paragraph position="3"> a0 Step 2 - Link all the conceptual graphs representing the constituents in a single graph All the conceptual graphs built during the previous step are attached to the concept representing the verb, thus creating a conceptual graph representation for the complete sentence.</Paragraph>
    <Paragraph position="4"> a0 Step 3 - Resolve the unknown relations This step attempts to identify all generic labels assigned during the previous two steps. This is done by using a list of relation correction rules.</Paragraph>
    <Paragraph position="5"> Each of these steps are described in more detail in the following sub-sections.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Building a conceptual graph representation of a
</SectionTitle>
      <Paragraph position="0"> phrase This step involves building a conceptual graph for a phrase. Our general assumption is that each lexeme in the sentence is represented using a separate concept, therefore all nouns, adjectives, adverbs and pronouns are represented using concepts, while the determiners and numbers are used as a referent of the relevant concept (thus further specifying the concept).</Paragraph>
      <Paragraph position="1"> Here we will outline the process of building a conceptual graph for a phrase depending on the part of speech category of the phrase.</Paragraph>
      <Paragraph position="2">  The list of some of the most common syntactic patterns for noun phrases is shown in Table 1.</Paragraph>
      <Paragraph position="3"> Syntactic pattern % AAIU % Reuters  (1) NP - a1 DT NN 20.42% 9.10% (2) NP - a1 NP PP 12.99% 14.17% (3) NP - a1 DT JJ NN 5.32% 2.49% (4) NP - a1 NN 5.18% 4.01% (5) NP - a1 NNP 4.59% 6.09% (6) NP - a1 PRP 3.57% 4.47% (7) NP - a1 NNP NNP 3.22% 2.15% (8) NP - a1 CD NNS 2.88% 1.81% (9) NP - a1 DT NN NN 2.20% 1.17% (10) NP - a1 NP SBAR 0.88% 1.29%  patterns for noun phrases Each of these cases is resolved individually. For example, for pattern (1) we create a concept for the NN with a referent, corresponding to the type of the determiner (an existential quantifier referent if the word marked as DT is the, a defined quantifier if the word is every, or none if the word is a). For pattern (3) we create concepts representing the adjective and the noun and link them by an Attribute relation. Pattern (10) represents phrases where the noun is further specified by the SBAR (for example, The co-pilot, who was acting as a main pilot, landed the plane.) For these patterns a conceptual graph is built for the SBAR and the head concept, which could be a WHNP phrase (e.g. which or who) or WHADVP (e.g. where) is replaced by the concept, created for the NP (also see Table 3).</Paragraph>
      <Paragraph position="4">  The conceptual graph representation of propositional phrases, similarly on the noun phrases, depends on their syntactic structure. A list of the most common syntactic patterns for prepositional phrases is shown in Table 2.  prepositional phrases The two most common patterns consist of a preposition followed by a noun phrase. For such prepositional phrases we construct a conceptual graph representing the noun phrase. We also keep track of the preposition heading the prepositional phrase, as it is used to mark the relation between this phrase and the rest of relevant phrases in the sentence.</Paragraph>
      <Paragraph position="5">  The list of the most common syntactic patterns for phrases representing subordinate clauses (and marked as SBAR) is shown in Table 3.</Paragraph>
      <Paragraph position="6"> Syntactic pattern % AAIU % Reuters  (1) SBAR - a1 IN S 52.76% 24.33% (2) SBAR - a1 WHNP S 18.90% 12.57% (3) SBAR - a1 WHADVP S 12.60% 2.53% (4) SBAR - a1 S 3.94% 56.34%  For all these cases the embedded clause S is treated as an independent sentence, and we recursively create a conceptual graph for it. To link the resulting graph to the main graph we either use a relation with label related to the preposition marked as IN (in case (1)) or by replacing the concept representing the WHNP or the WHADVP node with the concept representing the node it refers to.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Attaching all constituents to the verb
</SectionTitle>
      <Paragraph position="0"> After building separate graphs for each of the constituents, we link them together in a single conceptual graph. As each of them describe some aspect of the concept represented with the verb, we link them to that concept. Here we use the term main node to denote the node (concept) in the conceptual graph representing the head of the constituent. We identify the head using syntactic information about the constituent. For example, if the constituent is a noun phrase consisting of a noun phrase, followed by a prepositional phrase, its head is the head of the noun phrase and the PP is a modifier. Alternatively, if the constituent is a noun phrase that consists of an adjective followed by a noun, the noun is the head and the adjective is a modifier.</Paragraph>
      <Paragraph position="1"> If the constituent already has a semantic role attached to it, the same relation is used when constructing the conceptual graph between the CG representing the constituent and the verb.</Paragraph>
      <Paragraph position="2"> If the constituent does not have any semantic roles attached to it, a relation with a generic label is used. Using a generic type of relation allows us to build the structure of the CG, concentrating on the concepts involved, and to resolve the remaining relations later. If the constituent is not a propositional phrase (this includes NP, SBAR, etc.), we use a generic label REL.</Paragraph>
      <Paragraph position="3"> If the constituent is a prepositional phrase (PP) headed with a proposition prep, we use a generic label REL prep.</Paragraph>
      <Paragraph position="4"> For example, for the phrase a flight from Dublin we create a concept of a flight and a concept of a city, called Dublin and link them with a generic relation REL from.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Resolving unknown relations
</SectionTitle>
      <Paragraph position="0"> This is the final step in the conceptual graph construction, where we resolve the unknown (generic) relations in the conceptual graph.</Paragraph>
      <Paragraph position="1"> We keep a database of most common syntactic realisation of relations between concepts with specific types. Figure 4 shows some of the relation correction rules we use for the documents in the AAIU corpus. The left part of the rule represents the two concepts linked with a generic relation, while the right side represents this graph after the correction. For example, the first pattern states that if in our graph there are concepts Runway and Airport linked with relation REL at, we replace the relation with Location.</Paragraph>
      <Paragraph position="2">  Building the relation correction rules database is a challenging task. Currently, the process is semi-automated by scanning the corpus for commonly occurring syntactic patterns. Such patterns are then manually evaluated and the semantic relations are identified.</Paragraph>
      <Paragraph position="3"> Here is an example of applying a relation correction rule: for the NP the flight from Dublin on step 2 we create the conceptual graph [FLIGHT:*a]- a2 (REL from)- a2 [City:Dublin] Using the correction rule 3 we substitute the relation REL from with Source to produce the graph [FLIGHT:*a]- a2 (Source)- a2 [City:Dublin] This is an useful approach for resolving relations between nouns, as no such information is available in VerbNet. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML