XML Viewer - w00-1302

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1302_metho.xml
Size: 27,938 bytes
Last Modified: 2025-10-06 14:07:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1302">
  <Title>What's yours and what's mine: Determining Intellectual Attribution in Scientific Text</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
OTHER WORK:
</SectionTitle>
    <Paragraph position="0"> Recently, Researcher-4 has suggested the following solution to this problem \[...\].</Paragraph>
    <Paragraph position="1"> WEAKNESS/CONTRAST: But this solution cannot be used to interpret the following Japanese examples: \[...\]</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="9" type="metho">
    <SectionTitle>
OWN CONTRIBUTION:
</SectionTitle>
    <Paragraph position="0"> We propose a solution which circumvents this prow while retaining the explanatory power of Researcher-4's approach.</Paragraph>
    <Paragraph position="1"> Figure h Fictional introduction section across different articles. Subject matter, on the contrary, is not constant, nor are writing style and other factors.</Paragraph>
    <Paragraph position="2"> We work with a corpus of scientific papers (80 computational linguistics conference articles (ACL, EACL, COLING or ANLP), deposited on the CMP_LG archive between 1994 and 1996). This is a difficult test bed due to the large variation with respect to different factors: subdomain (theoretical linguistics, statistical NLP, logic programming, computational psycholinguistics), types of research (implementation, review, evaluation, empirical vs. theoretical research), writing style (formal vs. informal) and presentational styles (fixed section structure of type Introduction-Method-Results-Conclusion vs. more idiosyncratic, problem-structured presentation). null One thing, however, is constant across all articles: the argumentative aim of every single article is to show that the given work is a contribution to science (Swales, 1990; Myers, 1992; Hyland, 1998). Theories of scientific argumentation in research articles stress that authors follow well-predictable stages of argumentation, as in the fictional introduction in figure 1.</Paragraph>
    <Paragraph position="3">  Are the scientific statements expressed in this sentence attributed to the authors, the general field, or specific other n work / Other Work Does this sentence contain material that describes the specific aim of the paper? Does this sentence make reference to the external structure of the paper?</Paragraph>
    <Paragraph position="5"> D.~s it describe.a negative aspect of me omer worK, or a contzast or comparison of the own work to it?</Paragraph>
  </Section>
  <Section position="7" start_page="9" end_page="10" type="metho">
    <SectionTitle>
I CONTRAST I Does this sentence mention the other work as basis of
</SectionTitle>
    <Paragraph position="0"> or support for own work?  Our hypothesis is that a segmentation based on regularities of scientific argumentation and on attribution of intellectual ownership is one of the most stable and generalizable dimensions which contribute to the structure of scientific texts. In the next section we will describe an annotation scheme which we designed for capturing these effects. Its categories are based on Swales' (1990) CARS model.</Paragraph>
    <Section position="1" start_page="9" end_page="9" type="sub_section">
      <SectionTitle>
1.1 The scheme
</SectionTitle>
      <Paragraph position="0"> As our corpus contains many statements talking about relations between own and other work, we decided to add two classes (&amp;quot;zones&amp;quot;) for expressing relations to the core set of OWN, OTHER and BACKGROUND, namely contrastive statements (CONTRAST; comparable to Swales' (1990) move 2A/B) and statements of intellectual ancestry (BAsis; Swales' move 2D). The label OTHER is thus reserved for neutral descriptions of other work. OWN segments are further subdivided to mark explicit aim statements (AIM; Swales' move 3.1A/B), and explicit section previews (TEXTUAL; Swales' move 3.3). All other statements about the own work are classified as OwN. Each of the seven category covers one sentence.</Paragraph>
      <Paragraph position="1"> Our classification, which is a further development of the scheme in Teufel and Moens (1999), can be described procedurally as a decision tree (Figure 2), where five questions are asked about each sentence, concerning intellectual attribution, author stance and continuation vs. contrast. Figure 3 gives typical example sentences for each zone. The intellectual-attribution distinction we make is comparable with Wiebe's (1994) distinction into subjective and objective statements. Subjectivity is a property which is related to the attribution of authorship as well as to author stance, but it is just one of the dimensions we consider.</Paragraph>
    </Section>
    <Section position="2" start_page="9" end_page="10" type="sub_section">
      <SectionTitle>
1.2 Use of Argumentative Zones
</SectionTitle>
      <Paragraph position="0"> Which practical use would segmenting a paper into argumentative zones have? Firstly, rhetorical information as encoded in these zones should prove useful for summarization. Sentence extracts, still the main type of summarization around, are notoriously contextinsensitive. Context in the form of argumentative relations of segments to the overall paper could provide a skeleton by which to tailor sentence extracts to user expertise (as certain users or certain tasks do not require certain types of information).</Paragraph>
      <Paragraph position="1"> A system which uses such rhetorical zones to produce task-tailored extracts for medical articles, albeit on the basis of manually-segmented texts, is given by Wellons and Purcell (1999).</Paragraph>
      <Paragraph position="2"> Another hard task is sentence extraction from long texts, e.g. scientific journal articles of 20 pages of length, with a high compression. This task is hard because one has to make decisions about how the extracted sentences relate to each other and how they relate to the overall message of the text, before one can further compress them.</Paragraph>
      <Paragraph position="3"> Rhetorical context of the kind described above is very likely to make these decisions easier.</Paragraph>
      <Paragraph position="4"> Secondly, it should also help improve citation indexes, e.g. automatically derived ones like Lawrence et al.'s (1999) and Nanba and Okumura's (1999). Citation indexes help organize scientific online literature by linking cited (outgoing) and citing (incoming) articles with a given text.</Paragraph>
      <Paragraph position="5"> But these indexes are mainly &amp;quot;quantitative&amp;quot;, listing other works without further qualifying whether a reference to another work is there to extend the  AIM &amp;quot;We have proposed a method of clustering words based on large corpus data.&amp;quot; TEXTUAL &amp;quot;Section $ describes three unification-based parsers which are... &amp;quot; OWN &amp;quot;We also compare with the English language and draw some conclusions on the benefits of our approach.&amp;quot; BACKGROUND &amp;quot;Part-of-speech tagging is the process of assigning grammatical categories to individual words in a corpus.&amp;quot; CONTRAST &amp;quot;However, no method for extracting the relationships from superficial linguistic expressions was described in their paper.&amp;quot;  earlier work, correct it, point out a weakness in it, or just provide it as general background. This &amp;quot;qualitative&amp;quot; information could be directly contributed by our argumentative zones.</Paragraph>
      <Paragraph position="6"> In this paper, we will describe the algorithm of an argumentative zoner. The main focus of the paper is the description of two features which are particularly useful for attribution determination: prototypical agents and actions.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="10" end_page="10" type="metho">
    <SectionTitle>
2 Human Annotation of
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
Argumentative Zones
</SectionTitle>
      <Paragraph position="0"> We have previously evaluated the scheme empirically by extensive experiments with three subjects, over a range of 48 articles (Teufel et al., 1999).</Paragraph>
      <Paragraph position="1"> We measured stability (the degree to which the same annotator will produce an annotation after 6 weeks) and reproducibility (the degree to which two unrelated annotators will produce the same annotation), using the Kappa coefficient K (Siegel and Castellan, 1988; Carletta, 1996), which controls agreement P(A) for chance agreement P(E):</Paragraph>
      <Paragraph position="3"> Kappa is 0 for if agreement is only as would be expected by chance annotation following the same distribution as the observed distribution, and 1 for perfect agreement. Values of Kappa surpassing .8 are typically accepted as showing a very high level of agreement (Krippendorff, 1980; Landis and Koch, 1977).</Paragraph>
      <Paragraph position="4"> Our experiments show that humans can distinguish own, other specific and other general work with high stability (K=.83, .79, .81; N=1248; k=2, where K stands for the Kappa coefficient, N for the number of items (sentences) annotated and k for the number of annotators) and reproducibility (K=.78, N=4031, k=3), corresponding to 94%, 93%, 93% (stability) and 93% (reproducibility) agreement.</Paragraph>
      <Paragraph position="5"> The full distinction into all seven categories of the annotation scheme is slightly less stable and reproducible (stability: K=.82, .81, .76; N=1220; k=2 (equiv. to 93%, 92%, 90% agreement); reproducibility: K=.71, N=4261, k=3 (equiv. to 87% agreement)), but still in the range of what is generally accepted as reliable annotation. We conclude from this that humans can distinguish attribution and full argumentative zones, if trained. Human annotation is used as trMning material in our statistical classifier.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="10" end_page="11" type="metho">
    <SectionTitle>
3 Automatic Argumentative
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="10" end_page="11" type="sub_section">
      <SectionTitle>
Zoning
</SectionTitle>
      <Paragraph position="0"> As our task is not defined by topic coherence like the related tasks of Morris and Hirst (1991), Hearst (1997), Kan et al. (1998) and Reynar (1999), we predict that keyword-based techniques for automatic argumentative zoning will not work well (cf. the results using text categorization as described later). We decided to perform machine learning, based on sentential features like the ones used by sentence extraction. Argumentative zones have properties which help us determine them on the surface: * Zones appear in typical positions in the article (Myers, 1992); we model this with a set of location features.</Paragraph>
      <Paragraph position="1"> * Linguistic features like tense and voice correlate with zones (Biber (1995) and Riley (1991) show correlation for similar zones like &amp;quot;method&amp;quot; and &amp;quot;introduction&amp;quot;). We model this with syntactic features.</Paragraph>
      <Paragraph position="2"> * Zones tend to follow particular other zones (Swales, 1990); we model this with an ngram model operating over sentences.</Paragraph>
      <Paragraph position="3"> * Beginnings of attribution zones are linguistically marked by meta-discourse like &amp;quot;Other researchers claim that&amp;quot; (Swales, 1990; Hyland, 1998); we model this with a specialized agents and actions recognizer, and by recognizing formal citations.</Paragraph>
      <Paragraph position="4"> * Statements without explicit attribution are interpreted as being of the same attribution as previous sentences in the same segment of attribution; we model this with a modified agent feature which keeps track of previously recognized agents.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
3.1 Recognizing Agents and Actions
</SectionTitle>
      <Paragraph position="0"> Paice (1981) introduces grammars for pattern matching of indicator phrases, e.g. &amp;quot;the aim/purpose of this paper/article/study&amp;quot; and &amp;quot;we conclude/propose&amp;quot;. Such phrases can be useful indicators of overall importance. However, for our task, more flexible meta-diiscourse expressions need to be determined. The ,description of a research tradition, or the stateraent that the work described in the paper is the continuation of some other work, cover a wide range of syntactic and lexical expressions and are too hard to find for a mechanism like simple pattern matching.</Paragraph>
      <Paragraph position="1">  We suggest that the robust recognition of prototypical agents and actions is one way out of this dilemma. The agents we propose to recognize describe fixed role-players in the argumentation. In Figure 1, prototypical agents are given in bold-face (&amp;quot;Researchers in knowledge representation, &amp;quot;Researcher-4&amp;quot; and &amp;quot;we&amp;quot;). We also propose prototypical actions frequently occurring in scientific discourse (shown underlined in Figure 1): the researchers &amp;quot;agree&amp;quot;, Researcher-4 &amp;quot;suggested&amp;quot; something, the solution &amp;quot;cannot be used&amp;quot;.</Paragraph>
      <Paragraph position="2"> We will now describe an algorithm which recognizes and classifies agents and actions. We use a manually created lexicon for patterns for agents, and a manually clustered verb lexicon for the verbs. Figure 4 lists the agent types we distinguish. The main three types are US_aGENT, THEM-AGENT and GENERAL.AGENT. A fourth type is US.PREVIOUS_AGENT (the authors, but in a previous paper).</Paragraph>
      <Paragraph position="3"> Additional agent types include non-personal agents like aims, problems, solutions, absence of solution, or textual segments. There are four equivalence classes of agents with ambiguous reference (&amp;quot;this system&amp;quot;), namely REF_US_AGENT, THEM-PRONOUN_AGENT, AIM.-REF-AGENT, REF_AGENT. The total of 168 patterns in the lexicon expands to many more as we use a replace mechanism (@WORK_NOUN is expanded to &amp;quot;paper, article, study, chapter&amp;quot; etc).</Paragraph>
      <Paragraph position="4"> For verbs, we use a manually created the action lexicon summarized in Figure 6. The verb classes are based on semantic concepts such as similarity, contrast, competition, presentation, argumentation and textual structure. For example, PRESENTATION..ACTIONS include communication verbs like &amp;quot;present&amp;quot;, &amp;quot;report&amp;quot;, &amp;quot;state&amp;quot; (Myers, 1992; Thompson and Yiyun, 1991), RE-SEARCH_ACTIONS include &amp;quot;analyze&amp;quot;, &amp;quot;conduct&amp;quot; and &amp;quot;observe&amp;quot;, and ARGUMENTATION_ACTIONS &amp;quot;argue&amp;quot;, &amp;quot;disagree&amp;quot;, &amp;quot;object to&amp;quot;. Domain-specific actions are contained in the classes indicating a problem ( &amp;quot;.fail&amp;quot;, &amp;quot;degrade&amp;quot;, &amp;quot;overestimate&amp;quot;), and solution-contributing actions (&amp;quot; &amp;quot;circumvent', solve&amp;quot;, &amp;quot;mitigate&amp;quot;).</Paragraph>
      <Paragraph position="5"> The main reason for using a hand-crafted, genre-specific lexicon instead of a general resource such as WordNet or Levin's (1993) classes (as used in Klavans and Kan (1998)), was to avoid polysemy problems without having to perform word sense disambiguation. Verbs in our texts often have a specialized meaning in the domain of scientific argumentation, which our lexicon readily encodes.</Paragraph>
      <Paragraph position="6"> We did notice some ambiguity problems (e.g. &amp;quot;follow&amp;quot; can mean following another approach, or it can mean follow in a sense having nothing to do with presentation of research, e.g. following an arc in an algorithm). In a wider domain, however, ambiguity would be a much bigger problem.</Paragraph>
      <Paragraph position="7"> Processing of the articles includes transformation from I~TEX into XML format, recognition of formal citations and author names in running text, tokenization, sentence separation and POStagging. The pipeline uses the TTT software provided by the HCRC Language Technology Group (Grover et al., 1999). The algorithm for determining agents in subject positions (or By-PPs in passive sentences) is based on a finite automaton which uses POS-input; cf. Figure 5.</Paragraph>
      <Paragraph position="8"> In the case that more than one finite verb is found in a sentence, the first finite verb which has agents and/or actions in the sentences is used as a value for that sentence.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="11" end_page="12" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> We carried out two evaluations. Evaluation A tests whether all patterns were recognized as intended by the algorithm, and whether patterns were found that should not have been recognized.</Paragraph>
    <Paragraph position="1"> Evaluation B tests how well agent and action recognition helps us perform argumentative zoning automatically.</Paragraph>
    <Section position="1" start_page="11" end_page="12" type="sub_section">
      <SectionTitle>
4.1 Evaluation A: Correctness
</SectionTitle>
      <Paragraph position="0"> We first manually evaluated the error level of the POS-Tagging of finite verbs, as our algorithm crucially relies on finite verbs. In a random sample of 100 sentences from our corpus (containing a total of 184 finite verbs), the tagger showed a recall of  1. Start from the first finite verb in the sentence. 2. Check right context of the finite verb for verbal forms of interest which might make up more complex tenses. Remain within the assumed clause boundaries; do not cross commas or other finite verbs. Once the main verb of that construction (the &amp;quot;semantic&amp;quot; verb) has been found, a simple morphological analysis determines its lemma; the tense and voice of the construction follow from the succession of auxiliary verbs encountered.</Paragraph>
      <Paragraph position="1"> 3. Look up the lemma of semantic verb in Action Lexicon; return the associated Action Class if successful. Else return Action 0.</Paragraph>
      <Paragraph position="2"> 4. Determine if one of the 32 fixed negation words contained in the lexicon (e.g. &amp;quot;not, don't, neither&amp;quot;) is present within a fixed window of 6 to the right of the finite verb. 5. Search for the agent either as a by-PP to the right, or as a subject-NP to the left, depending on the voice of the construction as determined in step 2. Remain within assumed clause boundaries. 6. If one of the Agent Patterns matches within that area in the sentence, return the Agent Type. Else return Agent 0.</Paragraph>
      <Paragraph position="3"> 7. Repeat Steps 1-6 until there are no more finite verbs left.</Paragraph>
      <Paragraph position="4">  we hope to improve our results we argue against a model of we are not aware of attempts our system outperforms ...</Paragraph>
      <Paragraph position="5"> we extend &lt;CITE/&gt;'s algorithm null we tested our system against...</Paragraph>
      <Paragraph position="6"> we follow &lt;REF/&gt; ...</Paragraph>
      <Paragraph position="7"> our approach differs from ...</Paragraph>
      <Paragraph position="8"> we intend to improve ...</Paragraph>
      <Paragraph position="9"> we are concerned with ...</Paragraph>
    </Section>
  </Section>
  <Section position="11" start_page="12" end_page="14" type="metho">
    <SectionTitle>
POSSESSION
</SectionTitle>
    <Paragraph position="0"> this approach, however, lacks...</Paragraph>
    <Paragraph position="1"> we present here a method for.. .</Paragraph>
    <Paragraph position="2"> this approach fails...</Paragraph>
    <Paragraph position="3"> we collected our data from...</Paragraph>
    <Paragraph position="4"> our approach resembles that of we solve this problem by...</Paragraph>
    <Paragraph position="5"> the paper is organize&amp;..</Paragraph>
    <Paragraph position="6"> we employ &lt;REF/&gt; 's method...</Paragraph>
    <Paragraph position="7"> our goal ~ to...</Paragraph>
    <Paragraph position="8"> we have three goals...</Paragraph>
    <Paragraph position="9">  We found that for the 174 correctly determined finite verbs (out of the total 184), the heuristics for negation worked without any errors (100% accuracy). The correct semantic verb was determined in 96% percent of all cases; errors are mostly due to misrecognition of clause boundaries. Action Type lookup was fully correct, even in the case of phrasal verbs and longer idiomatic expressions (&amp;quot;have to&amp;quot; is a NEED..ACTION; &amp;quot;be inspired by&amp;quot; is a, CONTINUE_ACTION). There were 7 voice errors, 2 of which were due to POS-tagging errors (past participle misrecognized). The remaining 5 voice errors correspond to a 98% accuracy. Figure 7 gives an example for a voice error (underlined) in the output of the action/agent determination. Correctness of Agent Type determination was tested on a random sample of 100 sentences containing at least one agent, resulting in 111 agents. No agent pattern that should have been identified was missed (100% recall). Of the 111 agents, 105 cases were completely correct: the agent pattern covered the complete grammatical subject or by-PP intended (precision of 95%). There was one complete error, caused by a POS-tagging error. In 5 of the 111 agents, the pattern covered only part At the point where John &lt;ACTION  of a subject NP (typically the NP in a postmodifying PP), as in the phrase &amp;quot;the problem with these approaches&amp;quot; which was classified as REF_AGENT. These cases (counted as errors) indeed constitute no grave errors, as they still give an indication which type of agents the nominal phrase is associated with.</Paragraph>
    <Section position="1" start_page="13" end_page="14" type="sub_section">
      <SectionTitle>
4.2 Evaluation B: Usefulness for
Argumentative Zoning
</SectionTitle>
      <Paragraph position="0"> We evaluated the usefulness of the Agent and Action features by measuring if they improve the classification results of our stochastic classifier for argumentative zones.</Paragraph>
      <Paragraph position="1"> We use 14 features given in figure 8, some of which are adapted from sentence extraction techniques (Paice, 1990; Kupiec et eL1., 1995; Teufel and Moens, 1999).</Paragraph>
      <Paragraph position="2">  All features except Citation Location and Citation Type proved helpful for classification. Two different statistical models were used: a Naive Bayesian model as in Kupiec et al.'s (1995) experiment, cf. Figure 9, and an ngram model over sentences, cf. Figure 10. Learning is supervised and training examples are provided by our previous human annotation. Classification preceeds sentence by sentence. The ngram model combines evidence from the context (Cm-1, Cm-2) and from I sententiai features (F,~,o...Fmj-t), assuming that those two factors are independent of each other. It uses the same likelihood estimation as the Naive Bayes, but maximises a context-sensitive prior using the Viterbi algorithm. We received best results for n=2, i.e. a bigram model.</Paragraph>
      <Paragraph position="3"> The results of stochastic classification (presented in figure 11) were compiled with a 10-fold cross-validation on our 80-paper corpus, containing a total of 12422 sentences (classified items). As the first baseline, we use a standard text categorization method for classification (where each sentence is considered as a document*) Baseline 1 has an accuracy of 69%, which is low considering that the most frequent category (OWN) also coyerrs 69% of all sentences. Worse still, the classifier classifies almost all sentences as OWN and OTHER segments (the most frequent categories). Recall on the rare categories but important categories AIM, TEXTUAL, CONTRAST and BASIS is zero or very low. Text classification is therefore not a solution. *We used the Rainbow implementation of a Naive Bayes tf/idf method, 10-fold cross-validation.</Paragraph>
      <Paragraph position="4"> Baseline 2, the most frequent category (OWN), is a particularly bad baseline: its recall on all categories except OWN is zero. We cannot see this bad performance in the percentage accuracy values, but only in the Kappa values (measured against one human annotator, i.e. k=2). As Kappa takes performance on rare categories into account more, it is a more intuitive measure for our task.</Paragraph>
      <Paragraph position="5"> In figure 11, NB refers to the Naive Bayes model, and NB+ to the Naive Bayes model augmented with the ngram model. We can see that the stochastic models obtain substantial improvement over the baselines, particularly with respect to precision and recall of the rare categories, raising recall considerably in all cases, while keeping precision at the same level as Baseline 1 or improving it (exception: precision for BASIS drops; precision for AIM is insignificantly lower).</Paragraph>
      <Paragraph position="6"> If we look at the contribution of single features (reported for the Naive Bayes system in figure 12), we see that Agent and Action features improve the overall performance of the system by .02 and .04 Kappa points respectively (.36 to .38/.40).</Paragraph>
      <Paragraph position="7"> This is a good performance for single features.</Paragraph>
      <Paragraph position="8"> Agent is a strong feature beating both baselines.</Paragraph>
      <Paragraph position="9"> Taken by itself, its performance at K=.08 is still weaker than some other features in the pool, e.g.</Paragraph>
      <Paragraph position="10"> the Headline feature (K=.19), the Citation feature (K=.I8) and the Absolute Location Feature (K=.17). (Figure 12 reports classification results only for the stronger features, i.e. those who are better than Baseline 2). The Action feature, if considered on its own, is rather weak: it shows a slightly better Kappa value than Baseline 2, but does not even reach the level of random agreement (K=0). Nevertheless, if taken together with the other features, it still improves results.</Paragraph>
      <Paragraph position="11"> Building on the idea that intellectual attribution is a segment-based phenomena, we improved the Agent feature by including history (feature SAgent). The assumption is that in unmarked sentences the agent of the previous attribution is still active. Wiebe (1994) also reports segment-based agenthood as one of the most successful features.</Paragraph>
      <Paragraph position="12"> SAgent alone achieved a classification success of K=.21, which makes SAgent the best single features available in the entire feature pool. Inclusion of SAgent to the final model improved results to K=.43 (bigram model).</Paragraph>
      <Paragraph position="13"> Figure 12 also shows that different features are better at disambiguating certain categories. The Formulaic feature, which is not very strong on its own, is the most diverse, as it contributes to the disambiguation of six categories directly. Both Agent and Action features disambiguate cate-, gories which many of the other 12 features cannot disambiguate (e.g. CONTRAST), and SAgent additionally contributes towards the determination of BACKGROUND zones (along with the Fo~ulaic and the Absolute Location feature).</Paragraph>
      <Paragraph position="15"> index of sentence (ruth sentence in text) number of features considered target category associated with sentence at index m Probability that sentence rn has target category Cm, given its feature values Fro,o, ..., Fmj-1 and given its context Co, ...C,~-1; Probability that sentence rn has target category C, given the categories of the two previous sentences; Probability of feature-value pair Fj occu~ing within target category C at position m; Probability of feature value Fmj; Figure 10: Bigram Model</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML