File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0408_metho.xml

Size: 3,663 bytes

Last Modified: 2025-10-06 14:09:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0408">
  <Title>tion with known sentiment terms</Title>
  <Section position="5" start_page="58" end_page="59" type="metho">
    <SectionTitle>
3 Experimental Setup
</SectionTitle>
    <Paragraph position="0"> Our experiments were performed as follows: We started with a small set of manually-selected and annotated seed terms. We used 4 positive and 6 negative seed terms. We decided to use a few more negative seed words because of the inherent positive skew in the data that makes the identification of negative sentences particularly hard. The terms we used are: positive: negative:  There was no tuning of the set of initial seed terms; the 10 words were originally chosen intuitively, as words that we observed frequently when manually inspecting the data.</Paragraph>
    <Paragraph position="1"> We then used these seed terms in two basic ways: (1) We used them as seeds for a Turneystyle determination of the semantic orientation of words in the corpus (semantic orientation, or SO method). As mentioned above, this process is based on the assumption that terms of similar orientation tend to co-occur. (2) We used them to mine sentiment vocabulary from the unlabeled data using the additional assumption that sentiment terms of opposite orientation tend not to co-occur at the sentence level (sentiment mining, or SM method). This method yields a set of sentiment terms, but no orientation for that set of terms. We continue by using the SO method to find the semantic orientation for this set of sentiment terms, effectively using SM as a feature selection method for sentiment terminology.</Paragraph>
    <Paragraph position="2"> Pseudo-code for the SO and SM approaches is provided in Figure 1 and Figure 2. As a first step for both SO and SM methods (not shown in the pseudocode), PMI needs to be calculated for each pair (f, s) of feature f and seed word s over the col- null In the first scenario (using straightforward SO), features F range over all observed features in the data (modulo the aforementioned count cutoff of 10). In the second scenario (SM + SO), features F range over the n% of features with the lowest PMI scores with respect to any of the seed words that were identified using the sentiment mining technique in Figure 2.</Paragraph>
    <Paragraph position="3"> The result of both SO and SM+SO is a list of unigram features which have an associated semantic orientation score, indicating their sentiment orientation: the higher the score, the more &amp;quot;positive&amp;quot; a term, and vice versa.</Paragraph>
    <Paragraph position="4"> This list of features and associated scores can be used to construct a simple classifier: for each sentence with unknown sentiment, we take the sum of the semantic orientation scores for all of the unigrams in that sentence. This overall score determines the classification of the sentence as  The two thresholds used in classification need to be determined empirically by taking the distribution of class values in the corpus into account. For our experiments we simply took the distribution of class labels in the 400 sentence development test set as an approximation of the overall class label distribution: we determined that distribution to be 15.5% for negative sentences, 21.5% for neutral sentences, and 63.0% for positive sentences.</Paragraph>
    <Paragraph position="5"> Scores for all sentence vectors in the corpus are then collected using the scoring part of the algorithm in Figure 3. The scores are sorted and the thresholds are determined as the cutoffs for the top 63% and bottom 15.5% of scores respectively.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML