File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-4037_metho.xml

Size: 5,663 bytes

Last Modified: 2025-10-06 14:08:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4037">
  <Title>A Lightweight Semantic Chunking Model Based On Tagging</Title>
  <Section position="3" start_page="9978025" end_page="9978025" type="metho">
    <SectionTitle>
2 Representation of Language
</SectionTitle>
    <Paragraph position="0"> We assume a flat, non-overlapping (or chunked) representation of language at the lexical, syntactic and semantic levels. In this representation a sentence is a sequence of base phrases at a syntactic level. A base phrase is a phrase that does not dominate another phrase. At a semantic level, the chosen predicate has a number of arguments attached to it. The arguments are filled by a sequence of base phrases that span sequences of words tagged with their part of speech. We propose to organize this flat structure in a lexicalized tree as illustrated in Fig 1. The root is the standard non-terminal S lexicalized with the predicate. One level below, arguments attached to the predicate are organized in a flat structure and lexicalized with headwords. The next level is organized in terms of the syntactic chunks spanned by each argument. The lower levels consist of the part of speech tags and the words. The lower level can also be extended to include flat morphological representations of words to deal with morphologically rich languages like Arabic, Korean and Turkish. One can introduce a relatively deeper structure using a small set of rules at the phrasal level under each semantic non-terminal. For example, the application of simple rules in order on THEME's chunks, such as (1) combine flat PP NP into a right branching PP and then (2) combine flat NP with PP into a recursive NP, will result in a relatively deeper tree. Although the main focus of the paper is on the structure presented in Figure 1, we note that a deeper structure obtained by using a small number of simple hand-crafted rules on syntactic chunks (applied in a bottom-up manner) is worthy of further research.</Paragraph>
  </Section>
  <Section position="4" start_page="9978025" end_page="9978025" type="metho">
    <SectionTitle>
3 Model for Tree Decomposition
</SectionTitle>
    <Paragraph position="0"> The tree structure introduced in the preceding section can be generated as a unique sequence of derivation actions in many different ways. We propose a model that decomposes the tree into a sequence of tagging actions at the word, phrase and argument levels. In this model the procedure is a bottom up derivation of the tree that is accomplished in several steps. Each step consists of a number of actions. The first step is a sequence of actions to tag the words with their Part-Of-Speech (POS). Then the words are tagged as inside a phrase (I), outside a phrase (O) or beginning of a phrase (B) (Ramhsaw and Marcus, 1995). For example, in Figure 1, the word For is tagged as B-PP, fiscal is tagged as B-NP, 1989 is tagged as I-NP, etc. This step is followed by a sequence of join actions. A sequence that starts with a B-tag and continues with zero or more I-tags of the same type is joined into a single tag that represents the type of the phrase (e.g. NP, PP etc.). The next step tags phrases as inside an argument, outside an argument or beginning of an argument. Finally, we join IOB argument tags as we did for base phrases.</Paragraph>
    <Paragraph position="1">  with a good g increasing the context window, adding new sentence level and predicate dependent features, and introducing alternate organizations of the input. An alternative to our approach is the W-by-W approach proposed in (Hacioglu and Ward, 2003). We show it below: Here the labeling is carried out in a word-by-word basis. We note that the Phrase-by-Phrase (P-by-P) tagging classifies larger units, ignores some of the words  hree components that are sequentially applied to the t text for a chosen predicate to determine its argus. These components are POS, base phrase and antic taggers/chunkers. In the following, each coment will be described along the dimensions of its (i) t, (ii) decision context, (ii) features, (iv) classifier ) output.</Paragraph>
    <Paragraph position="2"> In the first stage, the input is the sequence of words are processed from left-to-right. The context is ded to be a fixed-size window centered around the n focus. The features are derived from a set of fic features and previous tag decisions that r in the context. A Support Vector Machine ) (Vapnik, 1995) as a multi-class classifier is used el words with their POS tags  . In the second e, the input is the sequence of word/tag pairs. Conis defined in the same way as in the first stage. The ures are the word/tag pairs and previous phrase IOB that appear in the context. An SVM classifier is to classify the base phrase IOB label. This is very ilar to the set up in (Kudo and Matsumato, 2000). In stage (the major contribution of the paper) we p the input, context, features and decisions as wn below.</Paragraph>
    <Paragraph position="3"> The input is the base-phrase labels and headwords g with their part of speech tags and positions in the phrase. The context is -2/+2 window centered at ase phrase in question. An SVM classifies the base to semantic role tags in an IOB representation a context including the two previous semantic tag It is possible to enrich the set of features by  Although not limited to, SVMs are selected because of ability to manage a large number of overlapping features eneralization performance.</Paragraph>
    <Paragraph position="4"> (modifiers), uses effectively a wider linguistic context for a given window size and performs tagging in a smaller number of steps.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML