File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-4037_evalu.xml

Size: 2,785 bytes

Last Modified: 2025-10-06 13:59:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4037">
  <Title>A Lightweight Semantic Chunking Model Based On Tagging</Title>
  <Section position="5" start_page="9978025" end_page="9978025" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> All experiments were carried out using sections 15-18 of the PropBank data holding out Section-00 and Section-23 for development and test, respectively. We used chunklink  to flatten syntactic trees. Then using the predicate argument annotation we obtained a new corpus of the tree structure introduced in Section 2. All SVM classifiers, for POS tagging, syntactic phrase chunking and semantic argument labeling, were realized using the TinySVM  with the polynomial kernel of degree 2 and the general purpose SVM based chunker YamCha  . The results were evaluated using precision and recall numbers along with the F metric. Table 1 compares W-by-W and P-by-P approaches. The base features described in Section 4 along with two additional predicate specific features were used; the lemma of the predicate and a binary feature that indicates the word is before or after the predicate.  In these experiments the accuracy of the POS tagger was 95.5% and the F-metric of the phrase chunker was 94.5%. The figures in parantheses are for gold standard  (i.e. POS and phrase features are derived from hand-annotated trees). The others show the performance of the sequential bottom-up tagging scheme that we have described in section 4. We experimented with a reduced set of PropBank arguments. The set contains the most frequent 19 arguments in the corpus.</Paragraph>
    <Paragraph position="1"> It is interesting to note that there is a huge drop in performance for &amp;quot;chunked&amp;quot; semantic analysis as compared with the performances at mid 90s for the syntactic and lexical analyses. This clearly shows that the extraction of even &amp;quot;chunked&amp;quot; semantics of a text is a very difficult task and still a lot remains to be done to bridge the gap. This is partly due to the difficulty of having consistent semantic annotations, partly due to the missing information/features for word senses and usages, partly due to the absence of world knowledge and partly due to the relatively small size of the training set. Our other experiments clearly show that with more training data and additional features it is possible to improve the performance by 10-15% absolute (Hacioglu et. al., 2004). The feature engineering for semantic chunking is open-ended and the discussion of it is beyond the scope of the short paper. Here, we have illustrated that the P-by-P approach is a promising alternative to the recently proposed W-by-W approach (Hacioglu and Ward, 2003).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML