XML Viewer - n06-3005

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-3005_metho.xml
Size: 4,315 bytes
Last Modified: 2025-10-06 14:10:13
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-3005">
  <Title>Identifying Perspectives at the Document and Sentence Levels Using Statistical Models</Title>
  <Section position="5" start_page="227" end_page="227" type="metho">
    <SectionTitle>
3 Corpus
</SectionTitle>
    <Paragraph position="0"> Our corpus consists of articles published on the bitterlemons website1. The website is set up to &amp;quot;contribute to mutual understanding [between Palestinians and Israels] through the open exchange of ideas&amp;quot;. Every week an issue about Israeli-Palestinian conflict is selected for discussion, for example, &amp;quot;Disengagement: unilateral or coordinated?&amp;quot;, and a Palestinian editor and an Israeli editor contribute a article addressing the issue. In addition, the Israeli and Palestinian editors invite or interview one Israeli and one Palestinian to express their views, resulting in a total of four articles in a weekly edition.</Paragraph>
    <Paragraph position="1"> We evaluate the subjectivity of each sentence using the patterns automatically extracted from foreign news documents (Riloff and Wiebe, 2003), and find that 65.6% of Palestinian sentences and 66.2% of Israeli sentences are classified as subjective. The high but almost equivalent percentages of subjective sentences from two perspectives supports our observation in Section 2 that perspective is largely expressed in subjective language but subjectivity ratio is not necessarily indicative of the perspective of a document. null</Paragraph>
  </Section>
  <Section position="6" start_page="227" end_page="228" type="metho">
    <SectionTitle>
4 Statistical Modeling of Perspectives
</SectionTitle>
    <Paragraph position="0"> We approach the problem of learning perspectives in a statistical learning framework. Denote a training corpus as pairs of documents Wn and their perspectives labels Dn, n = 1,...,N, N is the total number of documents in the corpus. Given a new document ~W with a unknown document perspective ~D, identifying its perspective is to calculate the following conditional probability,</Paragraph>
    <Paragraph position="2"> We are interested in how strongly each sentence in the document convey perspective. Denote the intensity of the m-th sentence of the n-th document as a binary random variable Sm,n, m = 1,...,Mn, Mn is the total number of sentences of the n-th document. Evaluating how strongly a sentence conveys  a particular perspective is to calculate the following conditional probability,</Paragraph>
    <Paragraph position="4"/>
    <Section position="1" start_page="228" end_page="228" type="sub_section">
      <SectionTitle>
4.1 Document Perspective Models
</SectionTitle>
      <Paragraph position="0"> The process of generating documents from a particular perspective is modeled as follows,</Paragraph>
      <Paragraph position="2"> The model is known as na&amp;quot;ive Bayes models (NB), which has been widely used for NLP tasks such as text categorization (Lewis, 1998). To calculate (5) under NB in a full Bayesian manner is, however, complicated, and alternatively we employ Markov Chain Monte Carlo (MCMC) methods to simulate samples from the posterior distributions.</Paragraph>
    </Section>
    <Section position="2" start_page="228" end_page="228" type="sub_section">
      <SectionTitle>
4.2 Latent Sentence Perspective Models
</SectionTitle>
      <Paragraph position="0"> We introduce a new binary random variables, S, to model how strongly a perspective is expressed at the sentence level. The value of S is either s1 or s0, where s1 means the sentence is written strongly from a perspective, and s0 is not. The whole generative process is modeled as follows,</Paragraph>
      <Paragraph position="2"> pi and th carry the same semantics as those in NB.</Paragraph>
      <Paragraph position="3"> S is naturally modeled as a binary variable, where t is the parameter of S and represents how likely a perspective is strongly expressed at the sentence given on the overall document perspective. We call this model Latent Sentence Perspective Models (LSPM), because S is never directly observed in either training or testing documents and need to be inferred. To calculate (6) under LSPM is difficult. We again resort to MCMC methods to simulate samples from the posterior distributions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML