File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0112_metho.xml

Size: 22,308 bytes

Last Modified: 2025-10-06 14:15:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0112">
  <Title>Reference Hashed</Title>
  <Section position="4" start_page="102" end_page="102" type="metho">
    <SectionTitle>
3 Hashing lists
</SectionTitle>
    <Paragraph position="0"> The following section describes how a hashing list works before the subsequent section shows how this data structure can be used for discourse processing.</Paragraph>
    <Section position="1" start_page="102" end_page="102" type="sub_section">
      <SectionTitle>
3.1 Data hashed
</SectionTitle>
      <Paragraph position="0"> One of the main problems for the de~,;ign of computer systems is the question of how data is stored and efficiently accessed. Hashing lists are often used for this purpose since this data structure is specifically designed for easy retrieval of stored data.</Paragraph>
      <Paragraph position="1"> I will first describe the data structure in more detail and then I will give an example of how data can be retrieved from a hashing list.</Paragraph>
      <Paragraph position="2">  Hashing lists. The basic data structure for a hashing list is an array A\[min..max\] (i.e. an indexed * list A that has a preset length of n elements). An  array with the name year could be defined as follows: null TYPE hash = ARRAY\[0..99\] of integer; The random access structure of this data type allows the programmer to assig n a single cell of the array directly (e.g. hash \[99 \] : = 9; ). This is an advantage over other data structures such as trees. Hashing functions. A function has to be designed that tells us how to store data on the hashing list. This function takes the item to be stored and gives back an appropriate key k. The item can now be stored at the fight place on the list.</Paragraph>
      <Paragraph position="3"> Suppose we want the program to store the integer 2000 on the hashing list year defined earlier. A hashing function H ( i ) has to be be chosen such that this function gives back an index k. With this information the assignment hash \[k\] : =2000; can take place. A hashing function for integers may be the Modulo function. For the given example the key k would be 2 0 (i.e. 2000 rood 99 = 20).</Paragraph>
      <Paragraph position="4"> The hashing function can also give back an index k for a new item that has already been taken by another item (e.g. 119 has the same key). For the case of a-collision a special treatment is required. The most common one is the administration of an overflow area. The single places on the hashing list are lists that Would handle colliding * items. Figure 2 shows a pan of the hash list hash \[ 19.. 21 \] with two items 2000 and 119 inserted.</Paragraph>
      <Paragraph position="5">  tion for two items</Paragraph>
    </Section>
    <Section position="2" start_page="102" end_page="102" type="sub_section">
      <SectionTitle>
3.2 Discourse hashed
</SectionTitle>
      <Paragraph position="0"> I now show how a hashing list can be employed as a data structure for linguistic data. This may not be obvious after using only integers for storing on a hashing list.</Paragraph>
      <Paragraph position="1"> Domains of referents. Natural language processing requires a richer data structure than storing integers. However, in the end a hashing function for linguistic data will also consist of an array.</Paragraph>
      <Paragraph position="2"> Considering the different types of discourse referents, we can assume at least the following list of mr. erents to be relevant: singular male, singular female, singular neuter, plural and event referents. 4 We now take these conceivable referents and reserve each of them a slot in the domain array: domain\[sgM, sgF, sON, pl, ev\] Note that this way of writing the hashing list is actually only syntactic sugar for a normal definition such as domain \[ 1.. 5 \].</Paragraph>
      <Paragraph position="3"> Referent function. A function is needed that can assign a cell on the array domin to a newly introduced discourse referent. The semantic and syntactic information that comes with the a discourse referent gives us the key for this. Take for example a proper name such as Peter. The information that comes with it could be encoded as a feature value matrix such as proposed by Dale (1992) (see figure 3). The hashing function  The function rezums sgM (or 1) as key for the array domain in the example given.</Paragraph>
      <Paragraph position="4"> Summing up, a hashing list was proposed to store discourse referents while processirlg natural language discourse. This kind of list contains several &amp;quot;slots&amp;quot; that await discourse referents described by a discourse. The grammatical features of gender and number distinguish the different referents.</Paragraph>
      <Paragraph position="5"> The following section discusses how this data structure is embedded into a discourse grammar.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="102" end_page="104" type="metho">
    <SectionTitle>
4 Referents in discourse
</SectionTitle>
    <Paragraph position="0"> The linguistic data presented earlier demonstrates the need for hierarchical constraints on anaphora resolution. But the data also show that previous approaches such as SDRT overemphasise this restriction. A refusal of any discourse structure consuaints, on the other hand, also does not seem to be appropriate. A cache storage that stores the frequently used discourse referents does not account for the data that were explainableby (S)DRT.</Paragraph>
    <Paragraph position="1"> This section describes how a hashing list can be used for the storage of discourse referents. The list is integrated into an SDRT framework. The information about the discourse segments is kept in order to cover data that is explainable by thehierarchicai discourse structure. In addition, the insight that a takes the information under the agreement feature sentence has a center as proposed by CT is also re-AGR and checks for the values regarding number': flected by the theow proposed. The discourse referand gender, ents are ordered according to centering preference. NUM: ,nng \] ~F.JqO: nude\] *This is only a first list of very fundamental referents. But . the list can easily he extended .by more differentiated plural types, speech acts or types of referents.</Paragraph>
    <Paragraph position="2"> The following sections describe in more detail how the different concepts are integrated in the system proposed by this paper. First, the way discourse referents are stored via a hashing list is explained.</Paragraph>
    <Paragraph position="3"> Second, the ordering regarding the centering preference is imposed on the slots of the hashing list. And finally, a tree structure is presented that binds all the components together.</Paragraph>
    <Section position="1" start_page="103" end_page="103" type="sub_section">
      <SectionTitle>
4.1 Referents hashed
</SectionTitle>
      <Paragraph position="0"> In the system proposed, a hashing list stores the rel'ercnts introduced by the discourse. The hashing list contains at least the following slots: scjl, sgF, sgN, pl, ev. Since the basic formalism is DRT, we need to incorporate the hashing list into the formalism. In DRT, a DRS consists of the domain of discourse referents and the set of conditions, imposed on the referents. A sentence such as Peter sighs is represented by the box notation as follows:</Paragraph>
    </Section>
    <Section position="2" start_page="103" end_page="104" type="sub_section">
      <SectionTitle>
4.2 Referents re-centered
</SectionTitle>
      <Paragraph position="0"> Alter blending a I)RT representation with a hashing list lor a structured representation of discourse referents. 1 will introduce the centering I'~ature into the formalism. The different slots already contain the ordering of the referents regarding the centering preference. An apparent advantage over the centering approach should become clear: the referents are already separated from each other.</Paragraph>
      <Paragraph position="1"> A discourse such as (4) without any competing antecedents for the pronoun she is formalised by a HDRS as follows:</Paragraph>
      <Paragraph position="3"> A hashing list substitutes, for the set of discourse referents offering ~ffezcnt slots for the discourse referents to be stored in:</Paragraph>
      <Paragraph position="5"> The representation of a more complex sentence such as Peter gave John a book containing several discourse referents is in the following DRS:</Paragraph>
      <Paragraph position="7"> This Hashed DRS (HDRS) contains a complex domain sub-box. '\['he slot for male and singular discourse referents is filled with the two items xt and zz. The two referents are on a collision list as described earlier. Additionally, the list reflects the ordering for the centering list. The subject NP Peter was processed before the object NP John &amp;quot;and * is therefore the first entry on the preferred centering list. Note that only referents that share the same grammatical features are listed in the same slot.</Paragraph>
      <Paragraph position="8">  (4) (a) Peter gave Mary a book. (b) It was about sailboats. (c) She was thrilled.</Paragraph>
      <Paragraph position="10"> CT would predict for (4.b) that the book is the preferred forward looking center Cp. The backward looking center of (4c) is Mary. This is called a rough slu'ft in CT. A continuation of the center (Cp(Ui) = Cs(Ui+t)) is the preferred and most coherent constellation according to this theory. However, contrary to what CT would predict, it is no problem to read(4).</Paragraph>
      <Paragraph position="11"> The HDRS format seems to work fine with pronominal references to persons or objects. But we run into problems when the slot regarding the descihed events and states is considez~l. The following example (5) illustrates that a simple flat list representation as indicated above by et, a2, ss is not sufficient for more complex anaphoriC/ expressions such as event anaphora (Allen 1995):  (5) (a) When Jack entered the room, everyone threw balloons at him. (b) In retaliation, he picked up the ladle and started throwing punch at everyone. (c) Just then; the chair-.</Paragraph>
      <Paragraph position="12">  man walked into the room. (d) Jack hithim.</Paragraph>
      <Paragraph position="13"> with a ladleful, right in the face. (e) Everyone talked about it for years afterwards.</Paragraph>
      <Paragraph position="15"> The pronoun h in (5e) may refer to the entire situation described by (5a) through (5d). BUt this is not the only conceivable antecedent for it. The situation described by (5d) may be referred to by it as well, if we consider an alternation of (5e) as in the following: (Se') It was a foolish thing to do.</Paragraph>
      <Paragraph position="16"> Note that the situation in (5d) is the only situation available from the sequence (5a-d). The list structure for the evene slot does not reflect the structure of the discourse. A segmented discourse structure is needed here.</Paragraph>
    </Section>
    <Section position="3" start_page="104" end_page="104" type="sub_section">
      <SectionTitle>
4.3 Discourse segments
</SectionTitle>
      <Paragraph position="0"> The derivation of discourse structure used in this account is that.proposed by SDRT. This discourse grammar, as well as others, claims that discourse * segments originate from the derivation of so-called discourse relations (e.g. Narration, Elaboration etc.) due to our background or world knowledge.</Paragraph>
      <Paragraph position="1"> The account proposed by this paper assumes that HDRSs are grouped together wrt their discourse segment. Consider now the following sequence (6) with the possible continuation (6e) with a male and female pronoun (depending on whether a male or female protagonist was introduced by the first sentence). null  (6) (a) Mary/Mark once organised a party. (b) Tom wrote the invitation cards. (c)Peter bought the booze.</Paragraph>
      <Paragraph position="2"> (re) She/He was glad that everything worked out so nicely.</Paragraph>
      <Paragraph position="3">  The first continuation does not cause any problems. although the antecedent for she was introthlced by the iirst sentence of the sequence. Since no odler COml~.'ting discourse refcrellls have been menlioned, the resolulion process works without problem. However; .substituting a male protagonist called Mark for the female protagonist in the first sentence does cause problems for the understanding of 16). In this case, it is unclear who was meant by he. Note furthermore thai only two antecedents are available, even though three male antecedents have been introduced. Only the one in the last sentence (i.e. Peter), or the one introduced by the first sentence (i.e. Mark) are conceivable antecedents. A different continuation does not show this ambiguity: (6e') He decided just to buy beer.</Paragraph>
      <Paragraph position="4"> The continuation in (6C/') is an elaboration of the last sentence. Hence Peter, who was responsible for the booze, is the only possible antecedent.</Paragraph>
      <Paragraph position="5"> The following sentence is the last piece of evidence that the discourse segment allows only antecedents that are available on the so..ealled right frontier. The following sentence shows that it is not possible to refer to Tom, who wrote the cards, with the last sentence: (6C/&amp;quot;) #He decided to use thick blue paper.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="104" end_page="105" type="metho">
    <SectionTitle>
5 Formalisation
</SectionTitle>
    <Paragraph position="0"> This section is an introduction to the formalism used. The formalism consists oftbe following parts: DRT The standard DRT theory is used to obtain a semantic representation for the meaning of a clau~ (Kamp and Reyle, &amp;quot;i993). However, the set of discourse referents is more structured than in the standard approach. It also goes beyond the approach by Asher and Wada (1988) (see below for further details).</Paragraph>
    <Paragraph position="1"> Hashing lists The data structure of hashing lists is used to divide the set of discourse referents up into different slots. Each slot contains only referents of the same type, as there are singular male, female, or neuter referents, plural entities and events.</Paragraph>
    <Paragraph position="2"> SDRT A hierarchical discourse structure is needed to explain anaphoric expressions that refer back over segments boundaries. In addition, a theory is needed that takes into account world knowledge for the derivation of discourse relations (Asher. 1993).</Paragraph>
    <Paragraph position="4"> discussed thatshow an ambiguity regarding the disC/oune structm~ In order to express the ambiguity formally an underspecification mechanism is employed (Schilder, 1998).</Paragraph>
    <Paragraph position="5"> * I will now present the derivation of the sequence in (6).</Paragraph>
  </Section>
  <Section position="7" start_page="105" end_page="108" type="metho">
    <SectionTitle>
$.1 Elaberation
</SectionTitle>
    <Paragraph position="0"> First, a HDRS representation is to be de~ved for the first sentence. The HDRS for (6a)looks like a norreal DRS, the only difference is the hashing list that contains the discourse referents in different slots.</Paragraph>
    <Paragraph position="1"> Second, a HDRS for the second sentence is derived ~ and, in addition, adiscourse relation is inferred from our world knowledge. An elaboration relation links the two HDRSs inthe given case. Within an under-specified version of SDRT this discourse structure is represented as shown in figure 4.</Paragraph>
    <Paragraph position="2">  The nodes in the tree are labels for (Segmented) HDRSs. The two labels st and a~ denote the semantic content of the two first sentences, respectively. The label KRt refers to the derived relation daboration that holds between the two segments K~x and K~t. Note that the left daughter node of the K~ is already deiermined by setting K~t equal to at. The right daughter node, however, is left open.</Paragraph>
    <Paragraph position="3"> This is indicated by ~the dotted line between K,~ l and s2. This fine expr~ses graphically the dominance relation between tree nodes (&lt;Z') in contrast to the straight line that indicates an immediate dominance relation (&lt;l). s The underspecification of the tree structure allows us to define where possible attachment points are on the right frontier of the discourse structure. The tree structure in figure 4 possesses two attach-Sl follow here the description of a tree logic such as that used by Kallmeyer ( 1996} or Muskens (1995).</Paragraph>
    <Paragraph position="4">  ment points: one is between K~ ~ and s:~ and the other one is between/CT and K~ x. This latter node denotes .the topic of the current discourse segment which is the situation described by sz for (6a-b).</Paragraph>
    <Paragraph position="5"> Two further remarks are to be made regarding the representation in figure 5 before continuing with the sequence. First, an additional condition is added to the topic node. The information about the temporal relation between the situation-et and the subordinated situation was added on this level of the discourse tree. Note that it is still open which event e will finally show up !n the node referred to by K~x. ~ Only afterclosing off this discourse segment will it be clear which event(s) elaborated the situation ez.</Paragraph>
    <Paragraph position="6"> Second, a plural entity Zl is stored in K~z.. This entity combines the singular entities into a plural one. A more elaborate mechanism is needed here in order to combine only entities of the same type (e.g. persons). For the time being, all plural entities s/C'~z.g can be described as a pointer to the plural slot of K,~t.</Paragraph>
    <Paragraph position="7"> are stored in this one slot.</Paragraph>
    <Section position="1" start_page="106" end_page="106" type="sub_section">
      <SectionTitle>
5.2 Continuing the thread
</SectionTitle>
      <Paragraph position="0"> A//st relation can be derived for the sentences (6b) and (6c) (see figure 6). The semantic content sa is added to the discourse tree linked by the discourse relation and furthermore a common topic is added at KTR2. The topic information has to be an abstract representation of the two HDRSs a2 and as. In order to achieve that, two new discourse referents am introduced: a plural entity Z4 comprising za.and zs (i.e. Tom and Peter) and a complex situation e4 temporally covering ea and e4.</Paragraph>
    </Section>
    <Section position="2" start_page="106" end_page="108" type="sub_section">
      <SectionTitle>
5.3 Looking back
</SectionTitle>
      <Paragraph position="0"> After the third sentence has been processed, the next sentence contains a pronoun. In case of she the pronoun looks for a female singular-antecedent. The appropriate discourse referent is found on the right frontier in the appropriate slot. Alternatively, if sequence (6) contains the male protagonist named Mark in the first sentence instead of Ma~., the  Ixzrsonal Pronoun he could IPS1vC/ two Posnll~lc ;mlC/cCdc:nts: I~'ler or MCar/,'. How can Ihal be L'xplaincd by the formalism?  coupe (6a-c) cmly showing the hashing lists on the right frontier. The dotted arrows indicate the hashing list as it is distributed over the right frontier of the discourse structure. There is only one entry for a female singular antecedent over the levels of nodes on the right frontier. However, if there were a male protagonist, the hashing list for the referents in node /x'~o would contain a discourse referent in the first slot. The list of possible antecedents for he would be xa and x t.</Paragraph>
      <Paragraph position="1"> The separation of different reference types also allows us to explain sequences such as ( I ). The discourse continues on the highest level, but it is possible to refer to discourse referents that got introduced on a lower level of the discourse structure. The link between two situations can be made via a rhetorical rela6on, and at the same time the slots for the other referents at the right frontier are still accessible. The hashing list also models a hashed right frontier. Past approaches always collapsed discourse attachment with the restriction regarding possible antecedents for anaphora (cf. the stack mechanism in CT or the tree representation for (S)DRT).</Paragraph>
      <Paragraph position="2"> The formalisation can also provide an explanation of why competing antecedents can cause an ambiguity for the pronoun resolution. The accessibility of hashifig lists on different levels of the discourse structure explains why, in this example, a female antecedent can be used as an antecedent even over several intervening sentences. It is important to highlight the difference of the account presented here to past approaches: The discourse referents are grouped together according to their agreement features. The DRT account by Asher and Wada, for instance, stores the discourse referents in a tree according to the accessibility conditions imposed by DRT and singles out the appropriate antecedents according to number and gender information as well as other criteria. There the agreement information is used to &amp;quot;weed out&amp;quot; possible antecedents, whereas within a HDRS the discourse referents are already accordingly stored.</Paragraph>
      <Paragraph position="3"> It should also be clear that an embedded discourse cannot be extended infinitively, as shown by the cache approach. A restriction has to be imposed on the number of levels where an antecedent can be looked \['or. Future research, however, has to clarify  this is.~uc Iurlhcr.</Paragraph>
      <Paragraph position="4"> Nt)tC/ thai Ihis I'ormali.~atio. c:q~itahzcs on the insight gained From the cache al)proacli. An elaboration cannot be continued for too long. since the working memory of the reader might lose track of the protagonist(s)introduced on the highest level.</Paragraph>
      <Paragraph position="5"> On the other hand, this fonnalisation also covers mor~ data than the cache approach. A text comprehension theory that employs a cache storage cannot account for a discou~e such as (6) with a non-competing female protagonist. The discourse referent for Mary would have been stored in long term memory, because no differentiation is made betwee n the grammatical types of possible antecedents according to the cache approach.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML