XML Viewer - j95-1004

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/j95-1004_metho.xml
Size: 28,609 bytes
Last Modified: 2025-10-06 14:14:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-1004">
  <Title>Rochemont, Michael S., and Culicover,</Title>
  <Section position="2" start_page="83" end_page="83" type="metho">
    <SectionTitle>
2. Communicative Dynamism and Word Order
</SectionTitle>
    <Paragraph position="0"> To be able to characterize the procedure determining some of the main points of TFA and to illustrate the output language of our parser, we have to add a brief discussion of certain issues concerning word order.</Paragraph>
    <Paragraph position="1"> The word order of natural languages is determined not only by SO, but also by other factors. If an item occurs in the topic, it may be placed more to the left than would correspond to SO; the specific order of the elements of the topic is influenced by the speaker's discourse strategy. There are also grammatical rules, such as those concerning the positions of the verb (e.g., in the &amp;quot;second position&amp;quot; in German), of the adjective or another modifier before or after the head noun in a noun group, and of clitics. Cases in which the intonation center has a secondary (non-final) position must also be considered.</Paragraph>
    <Paragraph position="2"> The interplay of word order and these other factors allows for a specification of the scale of communicative dynamism (CD). This scale is responsible for the &amp;quot;dynamic&amp;quot; progression of parts of the sentence, from topic proper through intermediate parts to</Paragraph>
  </Section>
  <Section position="3" start_page="83" end_page="90" type="metho">
    <SectionTitle>
4 In German and in most Slavonic languages the situation differs in that Objective and Effect follow
</SectionTitle>
    <Paragraph position="0"> several of the adverbial modifications.</Paragraph>
    <Paragraph position="1">  Eva Haji~ov~ et al. Topic-Focus Identification focus proper as the most dynamic element (carrying the intonation center), s CD is  semantically relevant for the scopes of quantifiers, as illustrated by example (10). (10) (a) It was JOHN who talked to few girls about many problems. (b) It was JOHN who talked about many problems to few girls.  This example differs from (1) in that the two groups containing the relevant quantifiers (few girls and many problems) both are in the topic of the sentence, whereas in (1)(a) and (b), at least one of them belongs to the focus on all readings. Thus, with (1) it may be claimed that only the boundary between topic and focus is responsible for the different distribution of the scopes of quantifiers; however, (10) shows that the individual degrees in the scale of CD also influence the meaning of the sentence: even if the two quantifiers both are contained in the topic, the one contained in the less dynamic sentence part has the wide scope on the preferred reading. 6 Another point shows the importance of including CD in syntactic representations of sentences: on the scale of CD, there is always a certain step dividing the sentence (its syntactic representation) into the topic and the focus as the less dynamic and the more dynamic parts of the sentence, respectively. Therefore, in our syntactic representations of sentences, we work with the scale of CD as with the underlying word order. An alternative choice would be to mark the scale of CD by specific indexing of the lexical occurrences in the sentence.</Paragraph>
    <Paragraph position="2"> We can now formulate Rule 1, from Section 1, in a more precise form, as rule 1 ~ referring to the underlying word order (CD), rather than to the surface one. Rule 1 ~ If a sentence part A precedes another one, B, under SO, and both A and B are in the focus of a sentence S, then A precedes B in the underlying word order of S. It follows from Rule I t that B can be less dynamic than A (i.e., B can precede A in the underlying word order) in a sentence S only if B belongs to the topic of S. As mentioned above, in the topic part the underlying word order often differs from SO, which is conditioned mainly by the speaker's discourse strategies. The speaker chooses the topic proper (the least dynamic element) among the items assumed to be most salient in the hearer's memory. Often this is what was referred to by the focus proper of the preceding utterance. 7 Now we can see why the (b) examples in Section 1 lack the ambiguity present in  the (a) sentences. For example, in (1)(a) the underlying (and surface) order of the two 5 Like many other linguistic notions, that of the intonation center is far from clean Since it is not possible to discuss this issue in depth here (which has been the objective of a rich discussion), we can only characterize our standpoint as follows: (a) in a sentence having more than one sentence stress, we understand the last (rightmost) one as the intonation center, and (b) we assume that the prototypical (unmarked) position of the intonation center is (in English) at the last word of the sentence. We are aware that these formulations do not cover all the possible cases, but the more or less marginal exceptions must be left aside for the aim of the present paper.</Paragraph>
    <Paragraph position="3"> 6 We cannot discuss here the issues concerning other possible interpretations of sentences such as (1) and (10). Their acceptability often depends on the lexical setting of the sentence and on pragmatic factors.  This also concerns the cases of &amp;quot;group reading&amp;quot; (e.g., in The three men built those two houses) or of J. Hintikka's &amp;quot;branching quantifiers.&amp;quot; 7 More precisely, the topic proper refers to one of those items that, at the given time point, are most salient in the stock of knowledge shared by the speaker and (according to the speaker's assumption) by the hearer. The set of highly salient items (called &amp;quot;established&amp;quot; in our earlier writings) can be compared to the &amp;quot;focus list&amp;quot; of Grosz (1977).</Paragraph>
    <Paragraph position="4">  Computational Linguistics Volume 21, Number 1 rightmost complementations (to few girls and about many problems, i.e., Addressee and Objective) is in accordance with SO, but in (1)(b) this is not so: the Objective, which was most dynamic (rightmost in underlying word order) in (a) does not occupy this position in (b). This means that it is included in the topic of sentence (b) on all its readings (i.e., in all syntactic representations of the sentence). This is similar with examples (3)-(6), and with (2) the switch of the intonation center plays the same role as the switch of word order in the other examples.</Paragraph>
    <Paragraph position="5"> On the other hand, the ambiguity of the (a) sentences is determined by the fact that the scale of CD is in accordance with SO here and that one of the complementations thus belongs to the topic in some of the readings and to the focus in others. For example, in (1)(a) the group to few girls is in such an ambiguous position: in some of the readings, the boundary between topic and focus precedes this group; in others, the boundary follows it. In both (a) and (b), the most dynamic complementation belongs to the focus on all the readings.</Paragraph>
    <Paragraph position="6"> The dichotomy of topic and focus concerns the sentence as a whole. In sentences with items embedded more deeply than the immediate complementations of the main verb, it is necessary to characterize the positions of individual word occurrences in the sentence in a more specific way. We therefore work with the distinction of contextually bound (CB) and non-bound (NB) lexical occurrences. Operational criteria to distinguish between these two values again may be found in the question test and in similar procedures. For example, only CB items can have the shape of weak pronouns or be deleted (thus, in He LEFT the subject is CB, whereas in HE left it is NB).</Paragraph>
    <Paragraph position="7"> A CB item is always considered to be less dynamic than its head and than its NB sister nodes (i.e., nodes depending on the same head). Thus, in He left YESTERDAY, the subject is CB and thus less dynamic than its head, the verb, and also than its NB sister, the adverb. This implies that the main verb is always more dynamic than all its CB complementations and less dynamic than the NB ones; i.e., in the scale of CD the verb stands immediately after or before the boundary between topic and focus.</Paragraph>
    <Paragraph position="8"> To illustrate the notion of contextual boundness, we present two additional examples: 8  (11) (How do you find your neighborhood?) Our(CB) new(NB) neighbor(CB) has stolen(NB) my(CB) CAR(NB).</Paragraph>
    <Paragraph position="9"> (12) (Which teacher do you mean?) I(CB) mean(CB) our(CB) teacher(CB) of  CHEMISTRY(NB).</Paragraph>
    <Paragraph position="10"> These sentences can also be used to exemplify how, on the basis of the dichotomy  of CB and NB items, the notions of topic and focus can be defined more exactly (for a more explicit formulation, see Sgall et al. 1986, Chapter 3): (i) The main verb and its immediate complementations belong to the topic if they are CB and to the focus if they are NB.</Paragraph>
    <Paragraph position="11"> (ii) More deeply embedded items belong to the topic (focus) if their head words (in the framework of dependency syntax) belong there.</Paragraph>
    <Paragraph position="12"> 8 In our syntactic representations we do not handle the correlates of function words as (labels of) separate nodes; they have the shape of indices accompanying auto-semantic lexical units (see footnote 2). This appears to be more adequate, since both their semantic and syntactic properties differ substantially from auto-semantic words. Furthermore, it is not economical to enlarge the number of nodes beyond necessity, adding special nodes for prepositions or articles, which can accompany only  their nouns, or for conjunctions and auxiliary verbs, which can accompany only lexical verbs and which do not accept any (other) arguments or modifications of their own.</Paragraph>
    <Paragraph position="13">  Eva Haji~ova et al. Topic-Focus Identification (iii) If the verb and all its immediate complementations (in other words, all elements of the center of the sentence) are CB, then only the NB item(s) embedded under the most dynamic element of the center constitutes the focus, with the rest of the sentence belonging to its topic.</Paragraph>
    <Paragraph position="14"> In (11) the noun neighbor, being CB (as a definite subject noun usually is), belongs to the topic, according to (i), and so does new as its modifier, according to (ii), although it is NB. The verb and the noun car both belong to the focus, according to (i), and so does her, according to (ii).</Paragraph>
    <Paragraph position="15"> In (12) all of the CB words belong to the topic according to (i) or (with our) 9 to (ii); then (iii) determines the adjunct of chemistry as the focus of (12).</Paragraph>
    <Paragraph position="16"> The (underlying) syntactic representations of sentences in our framework can now be illustrated (with several simplifications) in the form of linearized dependency trees. With this notation, every dependent item is included in its pair of parentheses, labeled by the corresponding syntactic symbol. This symbol occurs as the label of the edge in the tree, or as a subscript following a parenthesis in the linearized representation: 1deg  (13) A neighbor gave a boy a book.</Paragraph>
    <Paragraph position="17"> (13') (neighbor.Indef)Act give.Pret (boy.Indef)Addr (book.Indef)obj (14) A painter arrived at a French village on a nice September day.</Paragraph>
    <Paragraph position="18"> (14') (painter.Indef)Act (village.Indef (FrenCh)Gener)Dir arrive.Pret (day.Indef (September)Gener (nice)Gener)Time (15) The neighbor met him yesterday.</Paragraph>
    <Paragraph position="19"> (15') (neighbor)Act (he)obj meet.Pret t (yesterday)Time  Most of our symbols (for Indefinite, Preterite, Actor, Addressee, Objective, Directional) should be self-explanatory; Gener(al Relationship) is the free modification typical for an adjectival modifier of a noun. In (15t), the superscript t denotes the verb as belonging to the topic (being CB), although this is not in an immediate correspondence with its position in the surface word order. In English, the word order is grammatically restricted; thus also in (14) the verb occupies the position after the subject, in the surface, although it is followed by a CB item. Typically, the position of the verb in TFA (and often also the position of a complementation) is ambiguous, and in the present examples we give only one of the possible readings of the sentence. The unmarked case, when the verb belongs to the focus, is left without a specific notation mark here. The TFA positions of the complementations are indicated by their positions in the underlying word order, i.e., in CD: those belonging to the focus stand to the right of the head verb, and those in the topic stand to the left of it.</Paragraph>
    <Paragraph position="20"> Let us note that, for example, the written shape of (14) may also be pronounced with a secondary placement of the intonation center, as in (16), with another TFA. This pronunciation is not probable, but it is possible, as after such a co-text as (17):  9 The contextual boundness of this pronoun in the given position is derived from its indexical character and its associative link with the speaker. For this reason, such an item can always be referred to as &amp;quot;established,&amp;quot; or &amp;quot;recoverable,&amp;quot; or &amp;quot;identifiable&amp;quot; in the terminology of Halliday (1967) or Chafe (1976). 10 A detailed discussion of dependency trees and the labels of their edges (the syntactic values, i.e., kinds of arguments and modifications) and of their nodes (the values of morphological categories) was presented by Sgall, Haji~ov~, and Panevov~ (1986, Chapter 2). In the notation presented here, the morphological categories are handled so that only their marked values are indicated. Unmarked (prototypical) values such as Singular, Present, and Definite are assumed &amp;quot;by default.&amp;quot;  In the autumn, painters often look for nice sceneries in most different environments.</Paragraph>
    <Paragraph position="21"> Similarly, with (15) there is a less probable pronunciation (possible only in specific contexts) with the pronoun HIM stressed. Many, though not all, such marked cases are accounted for by the parser described in Section 3. The output language of this parser has been illustrated by examples (13')-(16').</Paragraph>
    <Paragraph position="22"> Sections 1 and 2 have introduced our treatment of topic and focus. As such, the comments and the examples could not cover all the possible sentence structures. The interested reader can find a more general and precise characterization of the basic notions we work with in Sgall et al. (1986), Haji~ov~ and Sgall (1987), and Petkevi~ (in preparation). The main objective of the paper is to present a procedure specifying the TFA of a sentence. However, at this stage, not all combinations of marginal phenomena are covered by our algorithm.</Paragraph>
    <Paragraph position="23"> 3. A Procedure for the Identification of TFA An automatic identification of topic, focus, and degrees of communicative dynamism, discussed in a preliminary way by Haji~ov~ and Sgall (1985), can be based on the following considerations: u Languages with a high degree of &amp;quot;free&amp;quot; word order (such as most Slavonic ones) differ from English or French in that a secondary position of the intonation center is frequent there only in spoken discourse. 12 On the other hand, in technical texts (which typically are written), there is a strong tendency to arrange the words so that the intonation center falls on the last word of the sentence (where it need not be phonetically manifested), with the exception of course, of enclitic words. This usage, occasionally recommended by manuals and textbooks concerning, for example, the stylistics of Czech or Russian, makes it possible to read such a text aloud without paying much attention to the choice of the placement of the intonation center.</Paragraph>
    <Paragraph position="24"> A general procedure for determining TFA in such languages can then be based on the following points: (i) All complementations preceding the verb are CB and thus belong to the topic. As for the complementations following the verb, Rule 2 may be stated: Rule 2 The boundary between topic (to the left) and focus (to the right) can be drawn between any two elements following the verb, provided that those belonging to the focus are arranged in the surface word order in accordance with SO (see Section 1).</Paragraph>
    <Paragraph position="25"> 11 As usual in computational linguistics, it is impossible to handle all marginal and exceptional cases by a relatively simple, general procedure. Natural language processing always requires solutions covering first the typical (or most frequent) cases and only then more cemplex procedures accounting for peripheral phenomena. Thus, the present paper also does not aim at a complete solution that would handle all possible cases appropriately.</Paragraph>
    <Paragraph position="26"> 12 Note that one can specify the position of the intonation center even with a written sentence: the sentence can be read aloud either correctly (in accordance with the author's intention) or incorrectly. The fact that there are also cases in which different placements of the intonation center are suitable for the given context is not immediately relevant.</Paragraph>
    <Paragraph position="27">  Eva Haji~owi et al. Topic-Focus Identification (ii) The verb is ambiguous as to its position in the topic or in the focus.</Paragraph>
    <Paragraph position="28"> (iii) If a spoken utterance (with its intonation center identified) is analyzed, then (i) and (ii) hold for sentences with normal intonation (intonation center at the end).  However, if a non-final element carries the intonation center, then all the complementations standing after this element belong to the topic; for the rest of the sentence, (i) and (ii) hold; the bearer of the intonation center belongs to the focus.</Paragraph>
    <Paragraph position="29"> In English the surface word order is determined by grammatical rules to a large extent, so that intonation plays a more decisive role than in the Slavonic languages. The written shape of the sentence does not suffice here to determine TFA to such a degree as it does in Czech, for example. Rule 2 also applies, but otherwise only certain important regularities can be stated here on the basis of word order and grammatical values (especially a definite noun group is often CB, and an indefinite one regularly is NB). To be able to reduce the ambiguity of the written shape of the sentence as much as possible, it is necessary to take into account certain semantic clues.</Paragraph>
    <Paragraph position="30"> Especially with Locative and Temporal modifications, it is important to distinguish between specific information (e.g., on a nice September day, on October 22, 1991, seven months ago) and items containing just a general setting (e.g., always) or being directly determined by the utterance itself, such as indexicals, like today and this year. The latter examples usually belong to the topic, whereas the former ones typically occur in the focus.</Paragraph>
    <Paragraph position="31"> As for the verb, it is important to have access to the verb of the preceding utterance and to use a systematic semantic classification of the verbs. If the main verb of sentence n has the same meaning as (or a meaning included in) that of sentence n - 1 (in the sense of hyponymy), then it belongs to the topic. Also, verbs with very general lexical meanings (such as be, have, happen, carry out, and become) may be handled as belonging to the topic. Otherwise (i.e., in the unmarked case), the verb typically belongs to the focus (in which case no subscript is being used in our representations).</Paragraph>
    <Paragraph position="32"> An algorithmic procedure has been formulated by H. Skoumalov~, completing the parsing of a written English sentence so as to identify its TFA. In the output of this procedure, many ambiguities remain, but sentences (even in their spoken shape) often are ambiguous as to their TFA. Thus it should be understood as a good result if the procedure identifies such an ambiguity. In its present form, however, the algorithm has several limitations. It can process only simple sentences. It determines the appurtenance of an element to topic or to focus, but does not specify CD within topic. It also handles just the verb and its complementations; deeper embedded elements are left aside for the time being.</Paragraph>
    <Paragraph position="33"> The algorithm has been formulated as follows: (a) After the dependency structure of the sentence has been identified by the parser, so that also the underlying dependency relations (valency positions) of the complementations (to the governing verb) are known, the verb and all the complementations are first assumed to be NB, i.e., to belong to the focus, which we denote by f.</Paragraph>
    <Paragraph position="34">  (b) If the verb occupies the rightmost position in the sentence and its subject is (ba) definite (including noun groups with this, with oneofthe, etc.), then the verb is NB, i.e., f, and its subject is CB, belonging to the topic, which we denote as t; (bb) indefinite, then the subject is f and the verb is t. In either case, the other complementations are handled according to (cb) below.</Paragraph>
    <Paragraph position="35">  Computational Linguistics Volume 21, Number 1 (c) If the verb does not occupy the rightmost position, then: (ca) the verb itself is understood as t, if it has a very general lexical meaning (see above), or as f if its meaning is very specific, or else as ambiguous (t/f); (cb) the complementations preceding the verb are denoted as t, with the exception of an indefinite subject and of a specific (i.e., neither general nor indexical; see above) Temporal complementation; either of the latter two is characterized as t/f; (cc) to the right of the verb, (i) if there is a single complementation, and this is a definite noun group or a personal pronoun, it is t/f; (ii) if the rightmost complementation is Temp or Loc, and it is specific, it is f; otherwise it is t (i.e., is understood as standing to the left of the verb in the underlying word order and is shifted there); (iii) if A is the left item of the rightmost pair that now (after the possible change of word order carried out according to (ii) above) fails to follow SO (see Section i and Rule 2), then A belongs to the topic (t), and so do all the complementations between A and the verb; the rightmost complementation of the whole sentence is f (only a personal pronoun following another one (including those of the third person) is t/f in this position), all those standing between A and the rightmost one are t/f; (iv) if neither (ii) nor (iii) is met and the rightmost complementation is indefinite, it is f; (v) all remaining complementations to the right of the verb are t/f.</Paragraph>
    <Paragraph position="36"> (d) If all the complementations have been determined as t or t/f, then (da) if the verb was t/f after point (ca) and the rightmost complementation is a definite noun group, an indexical word, or pronoun, then this rightmost element gets t(f), which denotes a specific kind of ambiguity: this element is to be understood as having f only in case there is no other f in the reading of the sentence; (db) if (da) does not apply, then both the rightmost element of the sentence and its verb get t/f.</Paragraph>
    <Paragraph position="37"> (e) The remaining representations containing no f are deleted. 13  We are aware that our procedure does not cover all possibilities occurring in English sentences. Deeper embedded elements have not yet been properly analyzed, and different pronominal forms should be classified in a much more detailed way. Other cases, assumed to occur with low probability (such as, for example, The neighbor GAVE the boy a book, or The neighbor gave HIM the book), are not taken into account. 13 We assume that at least one reading of the sentence has been assigned an f (NB element) by now. The readings without a focus are not valid representations of sentences, since one of the basic assumptions is that every sentence contains a focus.</Paragraph>
    <Paragraph position="38">  Eva Haji~ov~i et al. Topic-Focus Identification When implemented (together with a simplified parser), 14 the algorithm was checked with a set of sentences having our examples (1)-(3) as its core, and it yielded the expected results, as presented in Section 4.</Paragraph>
  </Section>
  <Section position="4" start_page="90" end_page="91" type="metho">
    <SectionTitle>
4. Examples
</SectionTitle>
    <Paragraph position="0"> Let us first reproduce here examples (13)-(15) from Section 2 (with a changed numbering), accompanied by the corresponding input strings of our program, in which the occurring word forms are complemented already by the lexical data. The program presupposes that each word form occurring in the text has undergone lexical (and morphemic) analysis so that it has been assigned the relevant data found in the lexicon. These include word class and sem(antic features) such as hum(an). The verbs are accompanied by their valency frames (grids), which also include data on the surface shape of the individual kinds of complementations so that it is easy to reconstruct the original sentence at the end of the procedure: 15  basis of our procedure is founded on dependency syntax and covers just the simple shapes of English sentences. Its lexical scope can be enlarged easily, if the added lexical items are accompanied by appropriate grammatical data, especially by valency (case) frames specifying the optional and obligatory arguments (Actor, Addressee, Objective, Origin, and Effect, with verbs). Prepositions are being analyzed just in one or two meanings each.</Paragraph>
    <Paragraph position="1"> 15 The notation differs slightly here from that of Section 2; the complex symbols are reflected here by subtrees in which the nodes for function words are still present. The symbol topic denotes here whether the given item belongs to the topic or to the focus, touch stores the information if the complementation has been already determined, sem is the semantic information about the verb (general, specific, intermediate), and Itree and rtree are the left and right subtrees in the dependency tree. The word form is saved under label, so contains information about the position of the complementation in systemic ordering, and surf is the surface form (noun group, personal pronoun, indexical word, etc.). The other symbols are self-explanatory.</Paragraph>
    <Paragraph position="2">  To illustrate how our procedure works for the sentences differing from (1)-(3) in the values of delimiting features (definite-indefinite), in word order, and so on, we add a list of these sentences with simplified, perspicuous results of the procedure, i.e., with the values t and f produced by our algorithm added to the autonomous (autosemantic) lexical occurrences. Ambiguity is denoted here in an abbreviated way, so that &amp;quot;t/f&amp;quot; means &amp;quot;t in some readings and f in others&amp;quot; (in combination with the values of other words in the sentence), and &amp;quot;t(f)&amp;quot; means &amp;quot;obtaining f only in case there is no other f in the sentence.&amp;quot; In this way it is easy to check whether the decision points in the algorithm, which are illustrated by the examples, have been handled adequately.</Paragraph>
    <Paragraph position="3">  met(t/f) a boy(t) on a nice september met(t/f) a boy(t/f) yesterday(t(f)) met(t/f) the boy(t) on a nice september met(t/f) the boy(t/f) yesterday(t(f)) met(t/f) a boy(t) on a nice september met(t/f) a boy(t/f) yesterday(t(f)) met(t/f) the boy(t) on a nice september met(t/f) the boy(t/f) yesterday(t(f)) met(t/f) him(t/f) yesterday(t(f)) We assume that the sentences are pronounced so that the intonation center is carried by the rightmost sentence part bearing an index f. Thus, for instance, (3)(H) corresponds to the following sentences:  (3) (H1) The neighbor MET the boy yesterday.</Paragraph>
    <Paragraph position="4"> (H2) The neighbor met the BOY yesterday.</Paragraph>
    <Paragraph position="5"> (H3) The neighbor met the boy YESTERDAY.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML