XML Viewer - p06-2057

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2057_metho.xml
Size: 17,497 bytes
Last Modified: 2025-10-06 14:10:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2057">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A FrameNet-based Semantic Role Labeler for Swedish</Title>
  <Section position="5" start_page="436" end_page="441" type="metho">
    <SectionTitle>
2 Automatic Annotation of a Swedish
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="436" end_page="436" type="sub_section">
      <SectionTitle>
Training Corpus
2.1 Training an English Semantic Role
Labeler
</SectionTitle>
      <Paragraph position="0"> We selected the 150 most frequent frames in FrameNet and applied the Collins parser (Collins, 1999) to the example sentences for these frames.</Paragraph>
      <Paragraph position="1"> We built a conventional FrameNet parser for English using 100,000 of these sentences as a training set and 8,000 as a development set. The classifiers were based on Support Vector Machines that we trained using LIBSVM (Chang and Lin, 2001) with the Gaussian kernel. When testing the system, we did not assume that the frame was known a priori. We used the available semantic roles for all senses of the target word as features for the classifier.</Paragraph>
      <Paragraph position="2"> On a test set from FrameNet, we estimated that the system had a precision of 0.71 and a recall of 0.65 using a strict scoring method. The result is slightly lower than the best systems at Senseval-3 (Litkowski, 2004), possibly because we used a larger set of frames, and we did not assume that the frame was known a priori.</Paragraph>
    </Section>
    <Section position="2" start_page="436" end_page="437" type="sub_section">
      <SectionTitle>
2.2 Transferring the Annotation
</SectionTitle>
      <Paragraph position="0"> We produced a Swedish-language corpus annotated with FrameNet information by applying the SRL system to the English side of Europarl (Koehn, 2005), which is a parallel corpus that is derived from the proceedings of the European Parliament. We projected the bracketing of the target words and the frame elements onto the Swedish side of the corpus by using the Giza++ word aligner (Och and Ney, 2003). Each word on the English side was mapped by the aligner onto a (possibly empty) set of words on the Swedish side.</Paragraph>
      <Paragraph position="1"> We used the maximal span method to infer the bracketing on the Swedish side, which means that the span of a projected entity was set to the range from the leftmost projected token to the rightmost.</Paragraph>
      <Paragraph position="2"> Figure 2 shows an example of this process.</Paragraph>
      <Paragraph position="3"> To make the brackets conform to the FrameNet annotation practices, we applied a small set of heuristics. The FrameNet conventions specify that linking words such as prepositions and subordinat- null ing conjunctions should be included in the bracketing. However, since constructions are not isomorphic in the sentence pair, a linking word on the target side may be missed by the projection method since it is not present on the source side.</Paragraph>
      <Paragraph position="4"> For example, the sentence the doctor was answering an emergency phone call is translated into Swedish as doktorn svarade pa ett larmsamtal, which uses a construction with a preposition pa 'to/at/on' that has no counterpart in the English sentence. The heuristics that we used are specific for Swedish, although they would probably be very similar for any other language that uses a similar set of prepositions and connectives, i.e.</Paragraph>
      <Paragraph position="5"> most European languages.</Paragraph>
      <Paragraph position="6"> We used the following heuristics: * When there was only a linking word (preposition, subordinating conjunction, or infinitive marker) between the FE and the target word, it was merged with the FE.</Paragraph>
      <Paragraph position="7"> * When a Swedish FE was preceded by a linking word, and the English FE starts with such a word, it was merged with the FE.</Paragraph>
      <Paragraph position="8"> * We used a chunker and adjusted the FE brackets to include only complete chunks.</Paragraph>
      <Paragraph position="9"> * When a Swedish FE crossed the target word, we used only the part of the FE that was on the right side of the target.</Paragraph>
      <Paragraph position="10"> In addition, some bad annotation was discarded because we obviously could not use sentences where no counterpart for the target word could be found. Additionally, we used only the sentences where the target word was mapped to a noun, verb, or an adjective on the Swedish side.</Paragraph>
      <Paragraph position="11"> Because of homonymy and polysemy problems, applying a SRL system without knowing target words and frames a priori necessarily introduces noise into the automatically created training corpus. There are two kinds of word sense ambiguity that are problematic in this case: the &amp;quot;internal&amp;quot; ambiguity, or the fact that there may be more than one frame for a given target word; and the &amp;quot;external&amp;quot; ambiguity, where frequently occurring word senses are not listed in FrameNet. To sidestep the problem of internal ambiguity, we used the available semantic roles for all senses of the target word as features for the classifier (as described above). Solving the problem of external ambiguity was outside the scope of this work.</Paragraph>
      <Paragraph position="12"> Some potential target words had to be ignored since their sense ambiguity was too difficult to overcome. This category includes auxiliaries such as be and have, as well as verbs such as take and make, which frequently appear as support verbs for nominal predicates.</Paragraph>
    </Section>
    <Section position="3" start_page="437" end_page="438" type="sub_section">
      <SectionTitle>
2.3 Motivation
</SectionTitle>
      <Paragraph position="0"> Although the meaning of the two sentences in a sentence pair in a parallel corpus should be roughly the same, a fundamental question is whether it is meaningful to project semantic markup of text across languages. Equivalent words in two different languages sometimes exhibit subtle but significant semantic differences. However, we believe that a transfer makes sense, since the nature of FrameNet is rather coarsegrained. Even though the words that evoke a frame may not have exact counterparts, it is probable that the frame itself has.</Paragraph>
      <Paragraph position="1"> For the projection method to be meaningful, we must make the following assumptions: * The complete frame ontology in the English FrameNet is meaningful in Swedish as well, and each frame has the same set of semantic roles and the same relations to other frames.</Paragraph>
      <Paragraph position="2"> * When a target word evokes a certain frame in English, it has a counterpart in Swedish that evokes the same frame.</Paragraph>
      <Paragraph position="3"> * Some of the FEs on the English side have counterparts with the same semantic roles on the Swedish side.</Paragraph>
      <Paragraph position="4">  In addition, we made the (obviously simplistic) assumption that the contiguous entities we project are also contiguous on the target side.</Paragraph>
      <Paragraph position="5"> These assumptions may all be put into question. Above all, the second assumption will fail in many cases because the translations are not literal, which means that the sentences in the pair may express slightly different information. The third assumption may be invalid if the information expressed is realized by radically different constructions, which means that an argument may belong to another predicate or change its semantic role on the Swedish side. Pado and Lapata (2005) avoid this problem by using heuristics based on a target-language FrameNet to select sentences that are close in meaning. Since we have no such resource to rely on, we are forced to accept that this problem introduces a certain amount of noise into the automatically annotated corpus.</Paragraph>
      <Paragraph position="6"> 3 Training a Swedish SRL System Using the transferred FrameNet annotation, we trained a SRL system for Swedish text. Like most previous systems, it consists of two parts: a FE bracketer and a classifier that assigns semantic roles to FEs. Both parts are implemented as SVM classifiers trained using LIBSVM. The semantic role classifier is rather conventional and is not described in this paper.</Paragraph>
      <Paragraph position="7"> To construct the features used by the classifiers, we used the following tools:  We constructed shallow parse trees using the clause trees and the chunks. Dependency and shallow parse trees for a fragment of a sentence from our test corpus are shown in Figures 3 and 4, respectively. This sentence, which was translated from an English sentence that read the doctor was answering an emergency phone call, comes from the English FrameNet example corpus.</Paragraph>
      <Paragraph position="8"> doktorn svarade pa ett larmsamtal</Paragraph>
    </Section>
    <Section position="4" start_page="438" end_page="441" type="sub_section">
      <SectionTitle>
3.1 Frame Element Bracketing Methods
</SectionTitle>
      <Paragraph position="0"> We created two redundancy-based FE bracketing algorithms based on binary classification of chunks as starting or ending the FE. This is somewhat similar to the chunk-based system described by Pradhan et al. (2005a), which uses a segmentation strategy based on IOB2 bracketing. However, our system still exploits the dependency parse tree during classification.</Paragraph>
      <Paragraph position="1"> We first tried the conventional approach to the problem of FE bracketing: applying a parser to the sentence, and classifying each node in the parse tree as being an FE or not. We used a dependency parser since there is no constituent-based parser available for Swedish. This proved unsuccessful because the spans of the dependency subtrees frequently were incompatible with the spans defined by the FrameNet annotations. This was especially the case for non-verbal target words and when the head of the argument was above the target word in the dependency tree. To be usable, this approach would require some sort of transformation, possibly a conversion into a phrase-structure tree, to be applied to the dependency trees to align the spans with the FEs. Preliminary investigations were unsuccessful, and we left this to future work.</Paragraph>
      <Paragraph position="2"> We believe that the methods we developed are more suitable in our case, since they base their decisions on several parse trees (in our case, two clause-chunk trees and one dependency tree). This redundancy is valuable because the dependency parsing model was trained on a treebank of just 100,000 words, which makes it less robust than Collins' or Charniak's parsers for English. In addition, the methods do not implicitly rely on the common assumption that every FE has a counterpart in a parse tree. Recent work in semantic role labeling, see for example Pradhan et al. (2005b), has focused on combining the results of SRL systems based on different types of syntax. Still, all  systems exploiting recursive parse trees are based on binary classification of nodes as being an argument or not.</Paragraph>
      <Paragraph position="3"> The training sets used to train the final classifiers consisted of one million training instances for the start classifier, 500,000 for the end classifier, and 272,000 for the role classifier. The features used by the classifiers are described in Subsection 3.2, and the performance of the two FE bracketing algorithms compared in Subsection 4.2.</Paragraph>
      <Paragraph position="4">  The first FE bracketing algorithm, the greedy start-end method, proceeds through the sequence of chunks in one pass from left to right. For each chunk opening bracket, a binary classifier decides if an FE starts there or not. Similarly, another binary classifier tests chunk end brackets for ends of FEs. To ensure compliance to the FrameNet annotation standard (bracket matching, and no FE crossing the target word), the algorithm inserts additional end brackets where appropriate. Pseudocode is given in Algorithm 1.</Paragraph>
      <Paragraph position="5"> Algorithm 1 Greedy Bracketing Input: A list L of chunks and a target word t Binary classifiers starts and ends Output: The sets S and E of start and end brackets Split L into the sublists Lbefore, Ltarget, and Lafter, which correspond to the parts of the list that is before, at, and after the target word, respectively. Initialize chunk-open to FALSE forLsub in{Lbefore,Ltarget,Lafter}do for c in Lsub do if starts(c) then if chunk-open then Add an end bracket before c to E</Paragraph>
      <Paragraph position="7"> Figure 5 shows an example of this algorithm, applied to the example fragment. The small brackets correspond to chunk boundaries, and the large brackets to FE boundaries that the algorithm inserts. In the example, the algorithm inserts an end bracket after the word doktorn 'the doctor', since no end bracket was found before the target word svarade 'was answering'.</Paragraph>
      <Paragraph position="8">  The second algorithm, the globally optimized start-end method, maximizes a global probability score over each sentence. For each chunk opening and closing bracket, probability models assign  spectively) at that chunk. The probabilities are estimated using the built-in sigmoid fitting methods of LIBSVM. Making the somewhat unrealistic assumption of independence of the brackets, the global probability score to maximize is defined as the product of all start and end probabilities. We added a set of constraints to ensure that the segmentation conforms to the FrameNet annotation standard. The constrained optimization problem is then solved using the JACOP finite domain constraint solver (Kuchcinski, 2003). We believe that an n-best beam search method would produce similar results. The pseudocode for the method can be seen in Algorithm 2. The definitions of the predicates no-nesting and no-crossing, which should be obvious, are omitted.</Paragraph>
      <Paragraph position="9">  timized start-end method. In the example, the global probability score is maximized by a bracketing that is illegal because the FE starting at doktorn is not closed before the target (0.8 * 0.6 * 0.6 * 0.7 * 0.8 * 0.7 = 0.11). The solution of the constrained problem is a bracketing that contains an end bracket before the target (0.8 * 0.4 * 0.6 * 0.7 *  Most of the features that we use have been used by almost every system since the first well-known description (Gildea and Jurafsky, 2002). The following of them are used by all classifiers:  In addition, all classifiers use the set of allowed semantic role labels as a set of boolean features. This is needed to constrain the output to a label that is allowed by FrameNet for the current frame. In addition, this feature has proven useful for the FE bracketing classifiers to distinguish between event-type and object-type frames. For event-type frames, dependencies are often longdistance, while for object-type frames, they are typically restricted to chunks very near the target word. The part of speech of the target word alone is not enough to distinguish these two classes, since many nouns belong to event-type frames.</Paragraph>
      <Paragraph position="10"> For the phrase/chunk type feature, we use slightly different values for the bracketing case and the role assignment case: for bracketing, the value of this feature is simply the type of the current chunk; for classification, it is the type of the largest chunk or clause that starts at the leftmost token of the FE. For prepositional phrases, the preposition is attached to the phrase type (for example, the second FE in the example fragment starts with the preposition pa 'at/on', which causes the value of the phrase type feature to be PP-pa).  Similarly to the chunk-based PropBank argument bracketer described by Pradhan et al.</Paragraph>
      <Paragraph position="11"> (2005a), the start-end methods use the head word, head POS, and chunk type of chunks in a window of size 2 on both sides of the current chunk to classify it as being the start or end of an FE.</Paragraph>
      <Paragraph position="12">  Parse tree path features have been shown to be very important for argument bracketing in several studies. All classifiers used here use a set of such features: * Dependency tree path from the head to the target word. In the example text, the first chunk (consisting of the word doktorn), has the value SUB- |for this feature. This means that to go from the head of the chunk to the target in the dependency graph (Figure 3), you traverse a SUB (subject) link upwards.</Paragraph>
      <Paragraph position="13"> Similarly, the last chunk (ett larmsamtal) has the value PR-|-ADV-|.</Paragraph>
      <Paragraph position="14"> * Shallow path from the chunk containing the head to the target word. For the same chunks as above, these values are both NG_nom-|-Clause-|-VG_fin, which means that to traverse the shallow parse tree (Figure 4) from the chunk to the target, you start with a NG_nom node, go upwards to a Clause node, and finally down to the VG_fin node.</Paragraph>
      <Paragraph position="15"> The start-end classifiers additionally use the full set of paths (dependency and shallow paths) to the target word from each node starting (or ending, respectively) at the current chunk, and the greedy end classifier also uses the path from the current chunk to the start chunk.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML