File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/e99-1033_metho.xml

Size: 9,916 bytes

Last Modified: 2025-10-06 14:15:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1033">
  <Title>Investigating NLG Architectures: taking style into consideration</Title>
  <Section position="3" start_page="237" end_page="237" type="metho">
    <SectionTitle>
2 Investigating Style
</SectionTitle>
    <Paragraph position="0"> We use the term style to signify the variability in the use of features of a language that can be correlated with certain types of situation -- where situation can be regarded &amp;quot;as the context within which interaction of 'the speech event' occurs&amp;quot; (Brown and Fraser, 1979; p. 34), involving the participants, the setting and the purposes of the communication.</Paragraph>
    <Paragraph position="1"> In order to put this definition to work on our problem, first we need to know how to obtain the set of stylistic parameters, i.e., the parameters that, when varied, will be responsible for producing texts in different styles. Secondly, we need to find a way to group stylistically-similar texts into sets so that we can study the architec5 &amp;quot;The specific pragmatic \[and stylistic\] features used by PAULINE are but a first step. (...) The strategies PAULINE uses to link its pragmatic land stylistic\] features to the actual generator decisions, being depender~.~ the definitions of the features are equally primita;C/~ ~ (Hovy, 1990; p. 193).</Paragraph>
    <Paragraph position="2"> tural aspects of each set isolatedly and then try to have a more general view of how style and NLG architectures interact by making a crosscomparison among those isolated analyses.</Paragraph>
    <Paragraph position="3"> For the first part, there are two approaches in the literature linking stylistic parameters to characteristics of texts. We have already cited Hovy (1988) and its main problem: the lack of formality. DiMarco's (1990) approach was to construct a 'stylistic grammar' using the notion of norm and deviation from norm (see, e.g., Enkvist (1973)). While this approach is enough to obtain the stylistic parameters 6, we think that this characterisation brings a problem for grouping texts into sets -- it can create only two sets for studying: the texts that agree with the norm and those that do not. This can be a problem because, although the texts following the norm will comprise a set of similar texts, those in the 'deviant' set can be so dissimilar that any type of analysis based on them (and, consequently, its interpretation in architectural terms) is probably doomed to failure.</Paragraph>
    <Paragraph position="4"> In our approach we will avoid these problems by following a methodology that can provide two things: (1) a characterisation of the stylistic parameters of a corpus 7, and (2) a partition of a corpus into sets of linguistically similar texts. We are working with Biber's (1988) methodology.</Paragraph>
    <Paragraph position="5"> From a corpus tagged with a comprehensive set of linguistic features of English, we obtain the frequency count associated with each one) Using a statistical factor analysis, we group the linguistic features that co-occur, considering each group</Paragraph>
  </Section>
  <Section position="4" start_page="237" end_page="237" type="metho">
    <SectionTitle>
6 DiMarco refers to them as 'stylistic goals'.
7 Our corpus comprises texts written for two different
</SectionTitle>
    <Paragraph position="0"> audiences (patients and doctors) of more than 250 pharmaceutical products -- in total it is more than 500 texts.</Paragraph>
    <Paragraph position="1"> s There are two levels of tagging. First, the corpus is tagged using Brill's (1994) tagger. Secondly, programs for counting specific configurations of tags are run. The process is completely automatic.</Paragraph>
    <Paragraph position="2">  a stylistic parameter 9 that can then be analysed in functional terms, ldeg Several stylistic parameters can emerge from a corpus, each text of the corpus having a specific value for each of them (see an example with three stylistic parameters in Figure l-A).</Paragraph>
    <Paragraph position="3"> Our interest in Biber's work is also related to his definition of text types: &amp;quot;the texts within each type are maximally similar with respect to their linguistic characteristics, while the types are maximally distinct with respect to their linguistic characteristics&amp;quot; (Biber (1995), p. 10). In order to obtain the text types, a cluster analysis is used and results in the partitioning of the corpus (i.e., the texts with similar values for all the stylistic parameters will be grouped in a partition (see Figure I-B)). Following this procedure will allow us to analyse aspects of architecture for each text type (i.e., each partition) in isolation and, more importantly, make cross-comparisons among these analyses.</Paragraph>
  </Section>
  <Section position="5" start_page="237" end_page="237" type="metho">
    <SectionTitle>
3 Relating Style to Architecture
</SectionTitle>
    <Paragraph position="0"> We are using NLG tasks as the basis for our approach to relate style to NIX3 architectures. We are working with a set of core NLG task that we have found to be stable: all of them occurred in almost all the systems we surveyed (Paiva, 1998). The set comprises the following tasks: content determination, rhetorical structuring, lexicalisation, intra and inter-sentential ordering, referring expression generation, aggregation, segmentation, and linguistic realisation (for an explanation about those task, see Cahill et al., 1999). ~l Part of the process for relating style to architecture is depicted in Figure 2. As shown, we start by analysing the NLG tasks that are responsible for the presence of a specific linguistic feature (arrow B). The association of stylistic parameters with linguistic features obtained in the corpus analysis (arrow A) will be used to observe which NLG tasks are responsible for 9 Biber refers to this as a 'dimension of register variation'. raThe assumption is that strong co-occurrence patterns of linguistic features mark underlying functional dimensions (Biber, 1988; p. 13). Notice that the name of each stylistic parameter, per se, means nothing; it is the linguistic features grouped in each stylistic parameter that are important! Nonetheless, it is easier to refer to a stylistic parameter by its name than to the set of linguistic features it represents. So below we say that a certain text is formal, when, in fact, we want to say that it has certain linguistic features such as passives, formal words, conjuncts, etc. ~ We are aware that some of those tasks can be subdivided and that some authors assume different names for2l~ same task. If necessary, we will do extensions to this set. specific values of a stylistic parameter (arrow C). 12 Then we will observe the combinations of the NLG tasks in accordance with each text type (partition) obtained in the corpus analysis. This will give us an idea of which NLG tasks are most responsible for (the linguistic features associated with) the different text types; also, it will show us how the tasks are working (because of the links to the linguistic features (see Figure 2)).</Paragraph>
    <Paragraph position="1"> The result of this process will be sets of NLG tasks for each text type.</Paragraph>
    <Paragraph position="2">  Our work then will be to observe the NLG tasks (inside each text type first, but making cross-comparisons among text types afterwards) attacking the questions related to architecture, i.e., 'which kind of modularisation and interaction between modules is necessary/appropriate', 'which resources are used', 'what kind of data the modules/tasks exchange', etc.</Paragraph>
    <Paragraph position="3"> We will investigate architectural decisions at three different levels: * at the task level: how can a certain task be made sensitive to values of stylistic parameters? null * at the level of tasks interaction: is there a natural ordering of tasks for a certain type of text? 13 * at the global level: assuming that tasks are normally encapsulated inside modules, what characteristics of texts force the interaction between modules to be more intense? null ~2The statistical method by which arrow A in Figure 2 is derived gives a measure of how important the linguistic feature is for a certain stylistic parameter.</Paragraph>
    <Paragraph position="4"> ~SSee Danlos (1984) for examples of how the order of execution of tasks can favour a certain textual result over another.</Paragraph>
    <Paragraph position="5"> Proceedings of EACL '99 Faced with this classification we will propose solutions that can be used in the specification of an architecture that supports the generation of texts in different styles. We expect these solutions to lead to useful guidelines for helping designers of NLG systems to choose the appropriate architecture for the type of text they want their system to generate.</Paragraph>
  </Section>
  <Section position="6" start_page="237" end_page="237" type="metho">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"> One may question why we are repeating Biber's experiment, when he has already obtained a set of stylistic parameters and a set of text types. It is possible that other results emerge from applying his methodology to our corpus, and the only way to know this will be by re-doing the analysis. It is also possible that we obtain a subset of his results, which will at least make our task a more manageable one.</Paragraph>
    <Paragraph position="1"> We believe that our result will be of general utility. Although the precise set of stylistic parameters may be dependent on the corpus one is using, we expect that the set of valid task interaction patterns will be restricted, and that the text types emerging from our study will encompass most of the valid patterns. Our programs for counting the linguistic features will be made available for others to use.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML