XML Viewer - w98-0307

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0307_metho.xml
Size: 17,333 bytes
Last Modified: 2025-10-06 14:15:07
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0307">
  <Title>Meta-discourse markers and problem-structuring in scientific articles</Title>
  <Section position="5" start_page="44" end_page="44" type="metho">
    <SectionTitle>
3 Meta-discourse markers
</SectionTitle>
    <Paragraph position="0"> In this paper we focus on the linguistic realisations of rhetorical moves, i.e. the surfacy signals of the argumentative status of a given sentence. Consider the strings in boldface on the right hand side of Figure 8. They cover the activities of reporting about research (reporting and presenting verbs), the problem-solving process (problems, solutions, tasks); they also include other semantic links like necessity, causality and contrast. Due to explicit or implicit argumentation, many of these strings are evaluative (&amp;quot;efficient, elegant, innovative, insight.</Paragraph>
    <Paragraph position="1"> ftd&amp;quot; vs. &amp;quot;impossible, inadequate, inconclusive, insufficient&amp;quot;). We call them meta-comments because they talk about information units, as opposed to being sub-ject matter (scientific content). Such meta-comments are very frequent in our collection.</Paragraph>
    <Paragraph position="2"> Our meta-comments are similar Paice's (1981) indicator phrases (he was the first to use such phrases for abstracting); they are less similar to cue phrases, the discourse markers usually studied in discourse analysis, because they are not sentence connectives (with some exceptions), and because they are typically considerably longer and far more varied.</Paragraph>
    <Paragraph position="3"> The fact that the computational linguistics texts stem from an unmoderated medium (i.e. they are neither chosen for publication nor edited by a central authority), means that there were no external restrictions on how exactly to say things. Authors use idiosyncratic style, which can vary from formal to informal. There are meta-comments that tend to get used in a fixed, formulaic way, but interestingly, we observed a wide range of linguistic variability with respect to the realization of some of the meta-comments (whereas their semantics is usually perfectly unambiguous). This effect makes the meta-commen~ in this text type interesting linguistic objects to study.</Paragraph>
    <Paragraph position="4"> We observed that there are certain meta-comments which are restricted to certain moves, mostly the evaluative and contrastive phrases and the phrases occurring in moves of type I (Explicit mention). Others occur frequently across moves, particularly general argumentative phrases and relevance markers such as &amp;quot;important&amp;quot;, &amp;quot;in this paper, we&amp;quot;. Argumentative phrases like &amp;quot;we ar. 9ue that&amp;quot; appeared with solution and problem-related moves almost as often as with claims and conclusions.</Paragraph>
    <Paragraph position="5"> These phrases seemed to be the ones that were most formulaic/fixed across texts.</Paragraph>
    <Paragraph position="6"> Our goal is to automatically find meta-markers and associate them with rhetorical moves (where this makes sense). In the next section, we report on a first experiment in that direction.</Paragraph>
  </Section>
  <Section position="6" start_page="44" end_page="47" type="metho">
    <SectionTitle>
4 Our experiment
</SectionTitle>
    <Paragraph position="0"> If it is true that most meta-comments are formulaic, recurring expressions, then frequency information should help us separate meta-comments from domain-specific parts of the sentence. Those strings which occur rarely in the corpus will most likely be domain-specific and will appear low on frequency listings of strings, whereas meta-commeats should appear high on the lists.</Paragraph>
    <Paragraph position="1"> We also used a lexicon of 433 lexical seeds. Lexical seeds are words which are semantically related to the activities of reporting, problem-solving, argumenting or evaluating, or expressions of deixis ( &amp;quot;we... &amp;quot;) or other textual cues (e.g. literature references in text were marked up using the symbol \[REF\], which is a signal for mentions of other researchers' solutions, tasks or problems).</Paragraph>
    <Paragraph position="2"> The computational linguistics corpus was drawn from the computation and language archive (h~p://xxx.lanl.gov/cmp-lg) and contains 123 articles; the 129 articles of the cardiology corpus appeared in the American Heart Journal. The medical corpus is smaller in overall size (436,909 words vs.</Paragraph>
    <Paragraph position="3"> 654,477; 14,770 sentences vs. 23,072).</Paragraph>
    <Paragraph position="4"> For the computational linguistics corpus, we additionally had a collection of 948 sentences that had been identified as relevant by a human annotator in a prior experiment (Teufel &amp; Moens 1997). A human judge annotated these with respect to the 23 rhetorical moves introduced in Figure 8.</Paragraph>
    <Section position="1" start_page="44" end_page="46" type="sub_section">
      <SectionTitle>
4.1 Filtering
</SectionTitle>
      <Paragraph position="0"> First, we compiled the two corpora into those bigrams, trigrams, 4-grams, 5-grams and 6-grams which did not cross sentence boundaries. We worked with a short stop-list compiled from the corpus (60 highest-frequent words) from which we had excluded those which we expected to be important in an argumentative domain, e.g. personal and demonstrative pronouns. We lowercased all words and counted punctuation (including brackets) as a full word.</Paragraph>
      <Paragraph position="1"> We then filtered the n-grams through our seed lexicon, i.e. we retained those expressions which contain at  least one of the words of the seed-lexicon (or a morphological variant of it). We also compiled and counted n-grams for the 948 computational linguistics target sentences, to see how similar these phrases from the annotated parts were to the filtered or unfiltered bigrams from the entire corpus.</Paragraph>
      <Paragraph position="2">  The list of 4-grams for the computational linguistics corpus shows a typical picture of the outcome of this process (Figure 2). Before filtering, the frequent corpus n-grams contain general comments and expressions like &amp;quot;for example&amp;quot; but content specific expressions are already filtered out. (Very rarely, there are some content-specific phrases like &amp;quot;natural language ~' in the lists -- this is due to the fact that the corpus, even though interdisciplinary in nature, is composed of papers focusing on language.) After filtering, the meta-comments on the lists have two properties: (a) they are frequent (b) they contain lexical items that could be related to argumentation about problems, research tasks, solutions -- in the computational linguistics corpus, these conditions seem to be enough to produce expressions that are good candidates for meta-comments.</Paragraph>
      <Paragraph position="3"> Unfortunately, condition (a) means that a large number of recta-comments were lost, because they were of low frequency.</Paragraph>
      <Paragraph position="4">  How similar are the lists for annotated text and entire corpus in the computational linguistics domain? Table 3 shows that they look very similar apart from minor differences, e.g. the fact that the list gained from annotated data contains more \[CREF\] items (internal cross reference like ~section \[CREF\]&amp;quot;) which tend to appear frequently in the sentences where authors state the organisation of the paper. The expressions used in such organisation statements are typically formulaic and re- null rate content matter from meta-discourse. In these lists, there are phrases pertaining to statistical analyses (&amp;quot;p &lt; 0.0l&amp;quot;) and several domain-specific phrases. Filtering (right hand side of Figure 4) forces the few meta-comments that are being used to the top of the list; they are linguistically invariant. For instance, &amp;quot;studf seems the only acceptable expression used for the current research, whereas the range is much wider in the other corpus ( &amp;quot;paper, article, study, work, research... &amp;quot;, and all the meta-comment candidates in the top part of the list belonged to one single rhetorical move, viz. Explicit Mention of the Research Task (Ex-T in Figure 8).</Paragraph>
      <Paragraph position="5"> A certain amount of noise has been introduced through the seed-lexicon because word senses were not disambiguated: 'failure&amp;quot; was included in the seed lexicon to indicate mentions of failure of other researchers' solution. Because this term obviously has the different meaning of &amp;quot;heart failure&amp;quot; in the cardiology context, the desired distinction between subject matter strings and recta-comments got lost; similarly &amp;quot;New York&amp;quot; was included because the word &amp;quot;new&amp;quot; could potentially point to novel approaches. This might mean that it is necessary to use different stop-lists and/or seed lexicons for different domains.</Paragraph>
      <Paragraph position="6"> As we have seen before, associating meta-comments with rhetorical moves is a more difficult task for some meta-comments than for others. We tried to anchor the probable rhetorical move of a phrase in the lexical seed it contains, a simplification we are forced to make due  to the small amount of annotated text we have available (which is reflected in the low numbers). We are thus in the process of working on a larger scale annotation.</Paragraph>
      <Paragraph position="7"> We used the human judgements to count how often each word contained in the target sentences appears with a certain rhetorical move. If the difference in frequency between the best-scoring moves for that word was large enough, we assumed it was a good indicator for the highest-scoring move, and we then manually associated the given rhetorical move with the word if it was contained in the seed lexicon, or to semantically similar seeds. For example, seeds that are the most likely associated with the OWN SOLUTION BETTER (53 examples of this move in the target sentences) were &amp;quot;than&amp;quot; (39), &amp;quot;better&amp;quot; (36), &amp;quot;results&amp;quot; (21), &amp;quot;method&amp;quot; (19), &amp;quot;using&amp;quot; (15), &amp;quot;significantly&amp;quot; (14), &amp;quot; outperforms&amp;quot; (12) and &amp;quot;more&amp;quot; (12). Filtered meta-comments are then assigned the rhetorical move predicted by the first seed they contain. Figure 5 shows the meta-comments filtered for the seed &amp;quot;better&amp;quot; from both corpora. In the medical corpus, there is less argument about methodology/solutions, and as a result the phrases found are unfortunately not meta-comments but contain medical terminology.</Paragraph>
      <Paragraph position="8">  both corpora Also, we observed that it is not easy to predict the optimal length of a certain meta-comment which is indicative of a certain rhetorical move. For moves containing ocher problems/solutions~tasks the very short string &amp;quot;\[REF\]&amp;quot; is contained in all successful meta-comments, whereas for explicit mention of research goals, the maximal length 6 of meta-comments which we chose for these experiments might even be too short. As another example, for the STEP move (&amp;quot;goa/is achieved by doing solution&amp;quot;), the best indicator we found was &amp;quot;in order to&amp;quot;.</Paragraph>
    </Section>
    <Section position="2" start_page="46" end_page="47" type="sub_section">
      <SectionTitle>
4.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> We evaluate the quality of these automatically generated meta-comment lists by comparing them to a manually created meta-comment list used by a summarisation system, cf. (Teufel &amp; Moens To Appear). The performance of the system - with the two different meta-comment lists - is measured by precision and recall values of co-selection with the target extracts defined by human annotators mentioned earlier. The summarisation process consists of two consecutive steps, sentence extraction and rhetorical classification, and uses other heuristics like location and term frequency.</Paragraph>
      <Paragraph position="1"> The summarisation system requires a list of meta-comments of arbitrary length, containing a quality score for each phrase which estimates how predictive these phrases are in pointing to extract-worthy sentences, and the most likely rhetorical label that sentences with this  paper , this paper presents a in order to in this paper , we will in this paper we have unlike \[ref\] this paper is to in this paper , we describe paper is paper we this paper has presented , we propose a method in p~sage ( \[cref\] , we argue that argue that method for we show that the show how property and the number the advantages of the wall street journal  We automatically built the meta-comment list in Figure 6 (containing 318 entries). We started from all n-grams compiled from the target sentences and took the following heuristics into account: Firstly, choose phrases with a high ratio of target frequency to corpus frequency, because these are indicative phrases. Set the quality value accordingly. Secondly, exclude phrases with a low overall frequency, or decrease their quality score, because including/overestimating them might construct a model that is over-fitted to the data. Thirdly, associate each phrase with its most likely  rhetorical move, by taking the ratio between frequency in each rhetorical class and the frequency of the rhetorical label itself into account. If below a certain threshold, don't associate any move at all (e.g. &amp;quot;paper ,&amp;quot; in Figure 6).</Paragraph>
      <Paragraph position="2"> The manual meta-comment list, in contrast, was compiled in an extremely labour intensive manner and refined over the months. It consists of 1791 meta-comments (some of which are much longer than the maximum of 6 words that the automatic phrases consisted of), along with their most plausible rhetorical moves and quality scores.</Paragraph>
      <Paragraph position="3">  As Figure 7 shows, using the automatic meta-comment list instead of the manually created one decreased the summarizer's performance from 66.4% to 52.5% precision and recall for extraction, and from 66.3% to 54.3% precision and recall for classification.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="47" end_page="47" type="metho">
    <SectionTitle>
5 Discussion and further work
</SectionTitle>
    <Paragraph position="0"> The evaluation indicates that the quality of the automatic meta-comment list is not yet high enough to replace the manual list in our summarization system.</Paragraph>
    <Paragraph position="1"> However, a look at the automatic list itself shows that, even though it is far from perfect, most of the high-frequent strings found are plausible candidates for recta-comments (or parts of recta-comments). In most cases, subject matter can be successfully filtered out.</Paragraph>
    <Paragraph position="2"> We regard the simple method for automatic meta-comment identification discussed in this paper as a baseline for further work. We have simplified the problem of finding meta-comments enormously by only considering verbatim substrings. By doing so, we have ignored discontinuous strings, morphological variation and statistical interaction between the words in the string. In addition, the phrases considered so far have been short, and we have not collected many of them, because we wanted to rely only on the ones with reasonably high frequencies.</Paragraph>
    <Paragraph position="3"> The biggest problem for now is that highly indicative, but infrequent recta-comments cannot be found with a simple method like ours. Therefore, it is essential to perform some generalization over similar phrases. One way would be the automatic clustering of similar concepts, e.g. to find out that &amp;quot;argue&amp;quot; and &amp;quot;show&amp;quot; are presentational verbs with similar semantics. Another idea would be to allow for more flexible patterns consisting of short n-grams and other words, in order to skip over intervening words like adjectives and adverbs.</Paragraph>
    <Paragraph position="4"> This might avoid the data sparseness problems encountered with the longer n-grams.</Paragraph>
  </Section>
  <Section position="8" start_page="47" end_page="47" type="metho">
    <SectionTitle>
6 Summary
</SectionTitle>
    <Paragraph position="0"> We have presented some baseline results from our on-going work concerning automatic filtering of meta-comments (indicator phrases) from scientific papers.</Paragraph>
    <Paragraph position="1"> Meta-comments can vary considerably from one domain to another, as the comparison of the two corpora we considered shows. In the computational linguistics articles, authors argue explicitly about problems, solutions and research tasks, whereas this is less the case in the medical domain, where meta-comments are less frequent and more formulaic.</Paragraph>
    <Paragraph position="2"> We have shown that lists of recta-comments acquired in a simple automatic process can be used to automatically identify a shallow document structure in scientific text, albeit with a certain quality loss when compared to manually constructed resources.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML