File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0107_metho.xml
Size: 37,803 bytes
Last Modified: 2025-10-06 14:15:24
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0107"> <Title>user's domain knowledge. In R. Dale, C. Mellish &</Title> <Section position="3" start_page="0" end_page="58" type="metho"> <SectionTitle> 1 Mo~a~n </SectionTitle> <Paragraph position="0"> To examine the phenomenon of reference in discourse, and to analyze how discourse structure and reference interact, we need a tool,which allows several kinds of functionality including mark-up, visualization, and evaluation. Before desitming slach a tool, we must ~y analyze the kinds of information each application requires.</Paragraph> <Paragraph position="1"> Three applications have driven the design of the system. These are: 1) the creation of training data for automatic derivation of reference resolution algorithms (/.e., machine learning), 2) the formation ofa testhed for evaluating proposed reference generation and anaphera resolution theories, and 3) the development of theories about understanding reference in dialog. The influence that these three areas have upon the functional requirements of an annotation system are discussed below.</Paragraph> <Paragraph position="2"> In this paper we fn~t describe the requirements that each of these three related applications demand from a discourse annotation tool geared to aid in answering questions concerning reference. We next discuss some of the theoretical implications and decisions concerning the tool development that have arisen from these requirements. Next we describe the tool itself. F'mally, we discuss related work, future directions of this work, and some conclusions.</Paragraph> <Section position="1" start_page="0" end_page="54" type="sub_section"> <SectionTitle> 1.1 Machine Learning </SectionTitle> <Paragraph position="0"> Consider a learning task in which we will present the learner with a sequence of triples of the form (E, F, U), where: * E is a pair of text expressions EA and EB, , F is vector of features describing the expressions, and * U is the classification: + ff EA and Es co-refer,. otherwise.</Paragraph> <Paragraph position="1"> (Two expressions co-refer when ~ey denote the same discourse entity (DE).) A successful learner will output a model which, when given only (E, F), can predict the value of G'. that is, classify the instance, as positive or negative. We intend to use the annotation tool to produce a set of such instances and features.</Paragraph> <Paragraph position="2"> The first requirement for a tool which would help us generate such a body of data is that it must allow us to mark all the potential referring expressions. This simply means that the us~ will have the ability to delineate any span of text which represents a DE, and Ireat~that span as a single entity. Of course, this is a time-consuming and ermr-proue process and thus it is helpful to automate as much as possible. In the training phase, the learner must be given all the potential antecedents for an anaphoric reference, so that it will know how to distinguish the proper antecedent from all the other candidates. For the testing phase, the correct antecedent's span must he included as a marked entity in the corptis, or the learner has no chance of getting that instance of co-reference right. The other crucial function of an annotation tool is to let the user associate attributes, or f~h~re values, with the</Paragraph> <Paragraph position="4"> marked expression. During the training phase, a learning algorithm is trying to find correlations between the features F and the classification 6'. Choosing the set of features to include in the learning phase is a very difficult task. The set must be sufficiently rich so as to include all of those features which might affect a referring expression's resolution. On the other hand, since the learner will likely find that only certain features predict co-reference, we do not want to burden the learner with many useless features that will bog it down with computational complexity. Also, a less restricted set of features permits more oppot:tunity for inconsistency in a given coder's markings and disagreement among coders (Condon & Cech, 1995).</Paragraph> <Paragraph position="5"> We cannot know (before training) exactly which features are most predictive of co-reference. So. we will try to mark a set of features which is a superset of the necessary features. Drawing on the feature sets used in Connolly et al. (1997) and Ge et al. (1998), we believe the following factors might indicate co-referenco: * Syntactic role (e.g. Subject, Object, Prepositional Object,...), * Pronominalization (yea or no), * Distance between EA and Ee (an integer), * Definiteness (yes or no), * Semantic role (e.g. indicating location, manner, time,...), * Nesting depth of an N'P (an integer), * Information status (as defined by Strube (1998)) of the DE, * Gender, Number, Animacy.</Paragraph> <Paragraph position="6"> The tool must allow the coder to assign values for these features to each marked expression, but should not demand that every expression has a value assigned for every feature.</Paragraph> <Paragraph position="7"> Since we cannot claim that this set Of features is exhaustive, the tool must allow further features to be added by the user. Since reliability of feature assignment is important, the tool should have the abifity to extract as many features aspossible automatically (for example, from a parsed corpus). In addition since some features must be hand-marked, the tool must have the ability to compare feature marking between two coders for the same teXL</Paragraph> </Section> <Section position="2" start_page="54" end_page="55" type="sub_section"> <SectionTitle> 1.2 Evaluating Anaphora Generation and Resolution Mgorithmq </SectionTitle> <Paragraph position="0"> Our discourse annotation and visualization tool also fulfills the role of a testhed in which we can examine theoriea of generating and resolving anaphoric expressions.</Paragraph> <Paragraph position="1"> From the generation perspective., we look for answers to questions concerning when it isappropriate to generate a pronoun versus some other anaphoric expression (e.g., a definite description or name; see McCoy & Strube (199%), McCoy & Strube (i 999b))~ Some researchers have looked at the question of when to generate a pronoun (versus some other description) (e.g., McDonald (1980), McKeown (1983), McKeown (1985), Appelt (1981)). In this work the decision was based on a notion of focus of attention (Sidner, 1979) if an entity was the focus of the previoussentence and is the focus of the current sentence, then use a pronoun.</Paragraph> <Paragraph position="2"> To evaluate such claims, not only must co-reference relatious be marked in a text, but information concerning focusing data structures must be kept.</Paragraph> <Paragraph position="3"> Dale (1992) discussed the generation of pronouns in the context of work on generating referring expressions (Appelt, 1985; Reiter, 1990). Dale suggests the principlea of efficiency and adequacy which favor generating the smallest referring expression that distinguishes the object in question from all others in the context. This notion was somewhat altered in Dale & Reiter (1995) to more adequately reflect human-generated referring ex* pressious and to be more computationally tractable.</Paragraph> <Paragraph position="4"> Other researchers have suggested that a notion of dis- ~-~ course structure must be taken into account when generating referring expressions. In particular, Gresz & Sidher (1986) and Reichman (1985) both suggest that a full noun phrase might be generated at discourse segment boundaries when a pronoun might have been adequate (in Dale's sense). Passonnean (1996b)argues for the use of the principles of information adequacy and economy.</Paragraph> <Paragraph position="5"> Her algorithm takes discourse segmentation into account through the use of focus spaces which am associated with discourse segments. Passonneau argues that, a fuller description might be used at a bounda~ because the set of accessible objects changes at discourse segment boundaries. null Passonneau's work suggests additional features which must be marked in a text to evaluate referring expression generation algorithms. These include discourse segment boundaries and sets of &quot;confnsable&quot; DE's contained in * the focus space, Thus the definition of what constitutes a discourse segment is another item which is open to research; our tool should allow for alternative markings of discourse segments so that various algorithins can be evaluated. For example, in our current work we look at changes in time as segment boundaries. Other definitions * are possible. So, the tool must be able to keep information for various alternative algorithms.</Paragraph> <Paragraph position="6"> While it is intuitively appealing that notions of discourse segmentation affect pronoun generation, the above work fails to identify how a discourse segment should he defined to a generation algorithm - thus it is not clear how this work can he applied to the generation process.</Paragraph> <Paragraph position="7"> Given this previous work, we need a tool that will al- null low us to specify (!) alternative definitions of discourse segmentation, and (2) alternative algorithms for pronoun versus definite description generation (and anaphora resolution). The tool must have the ability to then calculate statistics so that the alternative definitions and algorithms can be compared.</Paragraph> <Paragraph position="8"> Thus, this application requires the ability to specify co-reference relations, associate various features with referring expressions (both syntactic and discourserelevant), calculate the results of certain well-specified algorithms on the referring expressions, and tabulate ~e results of such algorithms. In addition to this information on referring expressions themselves, the tool must allow the marking of arbitrary features over arbitrary pieces of text (e.g., for alternative definitions of discourse segments). Because this work is exploratory in nature, the tool should allow a researcher to easily find places where various algorithms fail so that they can be examined and the algorithms updated as needed.</Paragraph> </Section> <Section position="3" start_page="55" end_page="56" type="sub_section"> <SectionTitle> 1.3 Underst~mding Spoken Dialog </SectionTitle> <Paragraph position="0"> The evaluation of algorithms for anaphora resolution in spoken dialog requires annotation of discourse structure on several levels. This is because spoken dialog shows more complex phenomena than written discourse. Prob- null lematic issues in spoken dialog include * the determination of the center of attention in multi-party discourse; * utterances with no discourse entities; * abandoned or partial utterances, interruptions, speech repa!rs; * the determination of utterance boundaries; * the high frequency of discourse deictic and vague anaphora (Eckert & Slmbe, 1999).</Paragraph> <Paragraph position="1"> In order to capture the complexity of anaphora resolu* tion in spoken dialog, the annotation requires a multitude of steps.</Paragraph> <Paragraph position="2"> Dialog Acts. To determine the domain ofanaphoric antecedent~ the dialog must be divided into short piece. We have chosen to use units based on dialog acts for this task. Therefore, turns have to be segmcnted into dialog ~t units. Our study of anaphoric extnssious reveals that in a dialog between two participants A and B, the DE's introduced by A are not added to the shared discourse memory model until ,As contribution has been acknowledged by B. Thus the segment is important for resolution algorithms.</Paragraph> <Paragraph position="3"> As in all coding schemes, intercoder reliabifity (here. of the dialog act units)must be questioned. For the purpose.of applying the Kappa (g) stnfi~c the segmentation task must be turned into a classification task. So, we view boundaries between dialog acts as one class and non-boundaries as the other (see Passonneau & Litman (1997) for a similar practice). The next step is to classify dialog act units as particular dialog acts. For this task the statistic is also appropriate.</Paragraph> <Paragraph position="4"> Individual and Abstract Object Anaphot~a. Since spoken dialog shows a high number of discourse deictic and vague anaphora, pronouns and demonstratives have to be classified accusingly. Thus an additional feature, anaphor type, must be marked in the corpus.</Paragraph> <Paragraph position="5"> Co-Indexatlon of Anaphora and Antecedents.</Paragraph> <Paragraph position="6"> Vague pronouns do not have a particular antecedent in the text. Hence, they cannot be co-indexed with an antecedent. The co-indexatiun of individual object anaphora in spoken dialog does not differ from written discourse. However, the high number of discourse deicfic pronouns requires a second set of markables since discourse deictic pronouns can co-specify wi~ propositions, sentences and even diseourse segments.</Paragraph> <Paragraph position="7"> Therefore, the reliability of the annotation depends on (1) the marking of the correct text span and (2) whether the correct antecedent is linked with the pronoun. Determining the reliability of marking spans of text is difficult when any span can be marked, since this means almost any word boundary is a candidate segment boundary.</Paragraph> <Paragraph position="8"> Here, the s: statistic does not seem meaningful because of the huge disparity in the number of non-boundaries and boundaries. This highly skewed distribution seems to overwhelm s~.</Paragraph> <Paragraph position="9"> Thus we are exploring more appropriate measures Of intereoder reliability on this task. At the moment, our approach to this problem is to use =, but restrict the annotators, so that they are allowed to mark only certain contiguous linguistic objects like verb phrases, sentences, or a well defined segment spanning more than one turn.</Paragraph> <Paragraph position="10"> 2 Annotating a Parsed Corpus All of the applications discussed in section I depend on having a corpus of reliably marked expressions, features, and relations. In order to determine that these dimensions have been '*reliably marked&quot;, we need to measure agreement between two codeas marking the same text.</Paragraph> <Paragraph position="11"> One way to increase the relinb'dity of the coding (regardless of the method used to measure reliability) is to automate part of the coding process. Our system can extract a number of markings, features and relations from the parsed, part-of-speech-tagged corpora of the type found in in the Penn Treebank 2 (Marcus et al., 1994).</Paragraph> <Paragraph position="12"> Use of the Treebank data means we can find most of the markables and many of the necessary features before giving the task to a human coder. We do not try to extract any of the co-reference information from the parsed corpora. null</Paragraph> </Section> <Section position="4" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 2.1 Extracting Markables </SectionTitle> <Paragraph position="0"> In this context, a markable is a text span representing a discourse entity which can be anaphoricaily referred to in a text or dialog. The majority of markables are noun phrases. Because the Treebank is a fully-parsed and well-defined representation of the text, it is trivial to determine the boundaries of all of the NP's in the text. However, the full set of NP's found by the Tree-bank parse is too inclusive for our purposes (/.e., it is a superset of the NP markables). While the Treebank delineates all NP's at all levels of embedding, it is not the case that each such NP contributes a distinct DE. Consider the following example containing three NP's in the parsed Treebank: (I) ~ (NP different parts) 0PPof(NPEumpe))) We want to mark both &quot;different parts of Europe&quot; and &quot;Europe&quot;, since they both contribute distinct DE's. However, notice that &quot;different parts&quot; does not contribute a DE since it is not possible to refer to this subexpression alone in subsequent discourse.</Paragraph> <Paragraph position="1"> To avoid finding such undesirable NP's, our system has a heuristic (HI) which says: Pass overan~NP which is a leftmost child of a top-level NP. This heuristic is too drastic, though, eliminating constructions like (2).</Paragraph> <Paragraph position="2"> (2) (NP (NP the inner bra/n) and OqP the eym)) To avoid losing these examples, we include another heursitic (H2) which says: HI does not apply when the NP is a sibling of another NP. A third heuristic must be added to overrule HI in the case of a possessor in a possessive construction, such as:</Paragraph> <Paragraph position="4"> where we should extract both &quot;Chicago&quot; and &quot;Chicago's South Side&quot;. So, the heuristic H3 is introduced: HI does not apply when the NP is a posse.~sive form.</Paragraph> <Paragraph position="5"> Even with heuristics eliminating the NP's which we do not need to consider, there are some NP's that will be found by the system which cannot be eliminated an- null tomadcally. Copular consU'uctions such as (4) introduce unnecessary NP's.</Paragraph> <Paragraph position="6"> (4) John is a docto~ &quot;John&quot; and &quot;a doctor&quot; are syntactically NP's, but the second does not contribute a unique DE.</Paragraph> <Paragraph position="7"> Also, idiomatic expressions such as (5) must be eliminated by hand: (5) Ned kick~ the bucket.</Paragraph> <Paragraph position="8"> ' The syntactic NP&quot;the bucket&quot;refers to no DE and cannot be the antecedent of any future referring expression, so it should not be marked.</Paragraph> <Paragraph position="9"> At this time, we do not have a way for the expression extracting system to detect and avoid these examples. As a result, we must introduce a correction phase in which a human corrects the markings, eliminating those that are superfluous, and adjusting those that may have been mismarked. The goal is to have a set of expressions which is as close as possible to the set of expressions necessary and sufficient for the applications. For example, if there are many extraneous expressions in the machine learning task, they will act as distractors - examples which decrease the accuracy of the learned model by diluting the highly correlative data with noise.</Paragraph> </Section> <Section position="5" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 2.2 Extracting Features </SectionTitle> <Paragraph position="0"> In addition to extractingmany markables themselves, the parsed corpora contain information from which many of the features can be automatically derived. Some features' values are marked explicitly in the corpus while others can be automatically extract.~ by examining the tree structure. The simplest source of feature values is the Treebank &quot;functional tags&quot;. For example, the grammatical functions (syntactic subject, topicalization, logical subject of passives, etc~) of phrases and the semantic role (vocative, location, manner, etc.) are marked in the corpus.</Paragraph> <Paragraph position="1"> Other features must be found by walking the tree structure provided in the Trcebank. The form of the NP (whether the NP is realized as a personal pronoun, demonstrative pronoun, or definite description) is a function of the part-of-speech tags assigned to the words in the NP. Whether the NP is definite, indefinite, or indeterminable depends on whether an article begins the NP. If the article is &quot;a', &quot;an&quot;, or &quot;some&quot;, we assume the NiP is indefinite. &quot;The&quot; indicates definiteness; otherwise, we assign a value of&quot;none&quot;, which simply indicates that there is no simple way of classifying this instance. The case of an NP is usually determined by its position in the tree. Any child of a VP is marked as an &quot;object&quot;. Children of PP's are marked &quot;l~ep-adjunct&quot; unless the PP was tagged &quot;PP-put &quot;t, which indicates that the PP acts as a complement to the verb. In this case we tag the NP as &quot;prep-complement&quot;.</Paragraph> </Section> <Section position="6" start_page="56" end_page="57" type="sub_section"> <SectionTitle> 2.3 Relations between Expressions </SectionTitle> <Paragraph position="0"> We allow two classea of relations to hold between markable entities: the co-referenca relation and an open class of aser-definable directional relations. A co-reference relation holds between A and B when A and B are expressions which both refer to the same discourse entity.</Paragraph> <Paragraph position="1"> Since co-reference is a symmetric, reflexive, and transithe relation, it divides the set of markables into equivaleace classes. Within a given equivalence class, all memtPP's using &quot;in&quot;, &quot;on&quot;. or &quot;around&quot; are sometimes marked PP-put.</Paragraph> <Paragraph position="2"> :-:.bet's refer to the same DE. Intuitively, our co-reference relation is a set of undirected links connecting all co-referring expressions. The symmetric property implies that it is not meaningful to store the direction of a relation. However, we do store each markable's antecedent when the user defines a co-reference link, so that we can later reconstruct the co-reference chain if necessary.</Paragraph> <Paragraph position="3"> The other kind of link is directional. We allow the user to define any number of relation.,/which are not symmetric, reflexive, or transitive. The only restriction on these relations is that they hold between exactly two entitles. Initially, we postulate four such relations which are necessary to handle indirect co-reference relations, also called bridging relations (see also Passonneau (1996a)): * Attribute.of (6) \[The car\]~ won't start because \[the engine\]i is missint. null * Propositional-inference (7) \[The man llas agun.\]# \[Thatlj scares me.</Paragraph> <Paragraph position="4"> * Contains (8) \[The peaches\]s are in a basket. Give me \[the biggestlk, o * Member.of (9) \[JaPSk\]m algl Jill went up the hill. \[l'hey\],,, were never seen again.</Paragraph> <Paragraph position="5"> Clearly, these must be directional (/.~, not symmetric) since, for example, if Member-of(A,B), then we should not assume Member-ef{B,A). The user is not prevented, however, from defining two such links, one in each direction. In fact. Contains and Member.of are logical duals; that is, Contnim(a,b) ~ Member(boO. However, we are always interested in the relation of a referring expression to its potential antecedents and so require that the referring expression be the first argument and the antecedent the second. In (8), Member-of(tim biggest, the peaches), but in (9), Contains(They, Jack) and Con- t.h~fl~ey, Jm).&quot;</Paragraph> </Section> <Section position="7" start_page="57" end_page="58" type="sub_section"> <SectionTitle> 7.4 Mmsur~A~m.eat </SectionTitle> <Paragraph position="0"> All of the annotation discussed in the above Sections is prone to error when a human is involved. The best way to combat these errors to is have several coders annotate the same corpus according to a coding manual. 2 A high measure of agreement between these coders gives us more confidence in the reliability of the data. Therefore, we 2The intent is that the coders will achieve a high degree of consistency if the manual is clear, and then if the manual accurately represents the desired coding style, consistency among coden implies accuracy of all the coding~ must be able to measure agreement between two 3 codings of the same text.</Paragraph> <Paragraph position="1"> The first kind of agreement that we need to measure is agreement of two sets of markables. Since we expect a few of the markables found by the system to need human editing, we may not assume that two coders working on the same text will have the same set of markables after the correction phase. We define agreement of two sets of markables 81 and .5'2 as 2,c .. Agreement(Sl, $2) = a + \[~ where a = \]Sl\], b : IS2\], and c = the number of expressions marked in .St that were marked with exactly the same boundaries in Sa. When agreement of markables is found to be less than 1, the coders are shown the expressions on which they disagree and can come to agreement (by referring to the coding manual and remarking those passages). We are developing a function of the tool which will simultaneously display the two versions of the text and highlight the expressions which are not common to the two codings. This will make it easier to visualize the differences between the codings and reach perfect agreement of markables.</Paragraph> <Paragraph position="2"> The second kind of agreement measures agreement between two coders' co-reference codings. We require that the two coders have the same set of markables before comparing their co-reference annotations, so achieving markable agreement of I is a prerequisite for this calculation. As discussed in section 2.3, the co-reference relation divides the set of markables into equivalence classes. A model-theoretic algorithm proposed by Vilain et al.</Paragraph> <Paragraph position="3"> (1995) uses these Co-referenc e classes to define a precision and recall metric which yields intuitively plausible results and is easy to calculate. The method depends on counting how many co-refereace links must be added to one coder's equivalence classes to wansform the set into that found by the other c~ler. We adopt this method and enable the tool to perform tl~ computation between any two codings which fully agree on the underlying set of markahies.</Paragraph> <Paragraph position="4"> Finally. we can measure feature-value agreement by viewing the featme assignment task as a kind of classification task and then computing Kappa (a), which measures how well the coders a.g~l compared to their random eJcpected agreemeat4(CaHetta, 1996). We conform to the method proposed in Poesio & Vieira (1998) for computing actual and expected agreement. (Again we assume the coders have already agreed on the set of + markables.) Suppose we are considering a given feature SAgreemem among a set of n > 2 coders is usmdly calculated as a function of the .~ pail'wise agreements, so we will discuss only the pa/rwise case here, realizing that the full comlmmion is straightforward.</Paragraph> <Paragraph position="5"> f, which was marked by two coders on each of N expressions in a corpus. Percent agreement is Simply the fraction of expressions out of N for which the two coders assigned the same value to .f. Expected agreement is not computed by assuming that each value is equally likely, though. We compute the expected agreement based on the actual distribution of values, as follows. For two coders, if f takes on values from V,</Paragraph> <Paragraph position="7"> where c~(v, f) is the number of times coder i assigned value v to feature f. Thus, if the coders have used the values in a perfectly even distribution among the IV I values, P(E) -- \[~. Any distribution which is not perfectly even will have an expected agreement higher than this.</Paragraph> <Paragraph position="8"> As with measuring markable agreement, we measure feature-value agreement to ensure that we have reliable features before using the data for one of the applications discussed in section 1. Therefore, coders can ask the system to show the examples for which they disagree on a specified feature. Again, the coders have the opportunity to recede those examples to achieve perfect agreement before passing the data to the application.</Paragraph> </Section> </Section> <Section position="4" start_page="58" end_page="59" type="metho"> <SectionTitle> 3 REFEREE: The Discourse Annotation </SectionTitle> <Paragraph position="0"> Tool We have built a discourse annotation and visualization tool which is designed according to the issues discussed in section I and which has all the capabilities described in section 2. RBFEREE s is a graphical interface tool written in Tclfrk. This makes i t highly portable and easily extensible.</Paragraph> <Section position="1" start_page="58" end_page="59" type="sub_section"> <SectionTitle> 3.1 Anm~ttion Modes </SectionTitle> <Paragraph position="0"> The tool has three &quot;modes&quot; - reference mode, segment mode, and dialog mode. In reference mode, the user can mark expressions, associate features with any expression, and assign co-reference (or other kinds of reference) links. Clicking on an expression with the mouse displays the features of that expression and highlights all other expressions in the text which co-refer with it. At this point, the user can up,t~3f the oo-reference or feature information or type some notes to be stored with the expression. (These notes are shown with the features when- null ever this expression is clicked on in the future.) Easy visualization of the co-reference equivalence classes could aid the user as he clicks through the text and sees how the co-reference chains thread through discourse.</Paragraph> <Paragraph position="1"> A byproduct of the built-in flexibility of REFEREE is the ability to use different feature &quot;masks&quot; in case the user only wants to consider some subset of the complete set of marked features. For example, the user can configure the tool to display and allow changes tO only the pronominalization feature. Then, the irrelevant features are not displayed and cannot be changed until the tool is reconfigured. This is also useful for associating different feature sets with different kinds of expressions.</Paragraph> <Paragraph position="2"> Segment mode allows the user to break the text into arbitrarily nesting and overlapping segments. (These do not have to correspond to any certain definition of discourse segment or text segment.) This allows the user the freedom to choose any degree of constraints upon the structure. When the user selects a region and clicks on the &quot;mark&quot; button, a new segment is created spanning that region. Thus, we can build up a list of start and end points of segments, and automatically determine which segments are contained in or overlap with which other segments. A separate window displays graphically the start and end point of each segment.</Paragraph> <Paragraph position="3"> At first glance, this seems to replicate the functionality of the reference mode, since both modes allow unconstrained marking (of contiguous ~xt spans). The important difference is that in reference mode, the user delineates referrable entities, while in segment mode, the user is marking spans which represent the structure of discourse. So, a use r Could have many spans marked as segments which exactly coincide with markables in reference mode; this simply represents the fact that the user believes it is possible for the text to refer to the segments or the pmpositious they express. Still, the segment markings are not superfluous. They impose a structure on top of the reference mode markables, even if some of them coincide. (While this could be simulated in the reference mode by adding a binary feature for segmenthood, the visualizationofsegments would be lost, as would the decoupling of the two kinds of spans we mark.) The last mode of interaction with RF.~ItEe is dialog mode. This allows the user to code a dialog by breaking it into turns. Each dialog participant's tm'ncan be broken into utterances which may be labeled as initiation or response units. (In some cases, there is overlap between these two dialog acts.) The most important function of dialog mode, as it relates to understanding reference in spoken language is to allow segmentation of the dialog * into turns assigned to one speaker or the other. Recall the * proper.closure of a turn is crucial for determining which DE's are in the shared discourse model.</Paragraph> </Section> </Section> <Section position="5" start_page="59" end_page="59" type="metho"> <SectionTitle> 4 Previous Work </SectionTitle> <Paragraph position="0"> ,~ ~ .,.,, to ~.,~. ~, .:~ ,,,..C/~. ~. Previous systems were designed for different purposes, --, ,. ~. ,~. ~ e,. ... .. .,~ ~,, m ,,~ ~ and therefore do not provide all of the functionality that ~td Iht ~C/t ~ tm dJ~ Cc, ~ t~r htw dt~rmr . Fall It*m&quot; t* rc.t * m~t~mbl* tram . rm dmmr hmar tkww ute tml,,qt nma. It&quot; this ,-.,,~*~,~ =,u .,. --.2o ,o ~.., h,. ~,,~. ~. ~ our applications require. For example, MITRE's Alemhe~ t~d little tim In Mhjch to ~ tho ~|ei ~ thst m8 * .m~.- ~ ,~ ~. a ,...~,. ,~,-.- ~d ~. bic Workbench (Day et ai., 1997) builds up an annotated tt~ H.:20 . I~* tt ~|d h,~e ~ ~rlst~ * 8J~ dl did e~.~ +~ v-. ,..,I e~ .,,..~.~ w ~. eza, corpus from scratch, under a mixed-initiative paradigm -~ ~.,, ~ e. .... =~ ~ ~ - * m-~. (in which some markings are given by the user, and some Ut4q~tmee f~t~&quot; . \[\[ggl~ ~uttd Uhag tht~t hal l~m ~ ~' C/tvl I~lmeto r td~O m t~4 sine era&quot; ruth ~/i. ~m emvkgt4P d~ll N~d i ~,~,.J.=.. ~ =~ ,~s..,=,,.,.,.., -,~ e. are automatically inserted by the computer). Learning an aim ~l~m =~ tt~. m ~Iv f.tt~ltg f~dl IB~ that .....~,..t.,..</Paragraph> <Paragraph position="1"> ~ .,., ,,.,, e,, ~ w., information extraction system was a primary function of ~111'8 etatmmt te4.II~ ~'omd ~tMnll * this system. While the associated Alembic NLP system ~,~,~,.~-,.~ ~, .-- m ,,~ ~ e..~.,,.-, e=a-,~ does incorporate some discourse level information into ,,. ~ ,.~. h. ~,~ m-= ,o e.,~ ~=. ~ ~. ~ the system, the user may not impose an arbitrarily com- Iot~hl ta d~w~ kit morn x~ xtatl~l. ~ t~w* m m~ ~ . .. ~ ,. ~. ,~ ~,t..a. ~,. ~,..~.~m,.+ m plex discourse structure whose structure the system can i~mt.~.In tMo Ic~l SIp or* tim VMeh ~mwe tt~ lw'hj ~ ~ ~,. t,~ ,,, t,,.e. - rcprescn(.</Paragraph> <Paragraph position="2"> e~m .~.,,.~ ,.- .~.+ v -- ~:.~ ~,~.,~ ,- ~, The Discourse Tagging Tool (Aone& Bennett, 1994) ~. ,. + em ~ ,.~ &quot;. ~,~.C/. ,~m, was designed for tagging multilingnal corpora, and also uar Ihd~eet ~aJlctt(~, Shtutdltm~d~fea ~'e, ,~ .'~t~ .,.ltm. ~ ,,~ .~,., ~' '*' ~'~ ~'~em ~,e ~.~ e~ s-..~ ~&quot; ~&quot;~ &quot;~+ ~&quot; ~' &quot;+ does not allow complex marking of discourse structure. Furthermore, the tag sets and relations are fixed and may ~m's tm~tm~ *~ ~t~ tt-~ I~mm role bq~st ~d &quot;~&quot;~&quot; ~ &quot; ~ ~ &quot;-&quot;*~,&quot; '~' ~&quot; &quot;-- not be elaborated by the user. Also. this work was not hl~Oq~fC/, gatmut~t~msC/~trtvwll .htl~dlt~ ,~... M ,,..,...~.~. *m, e. ,~ e...- ~,. ~ ..- concerned with dialogs. . ~ d~pJ ge Ig~ml~l~*l 4~g4~ gg ~hl~ tl~g th~ 1~4m t~lt Ig~ tp ,.,~-,--,~,.zo--~-,.~ ~ 5 Future Work and Conclusions a,,mq.,m: +e, m ~ , We are beginning annotation of a parsed corpus using \[ fs.r/7 &quot;t~eemmt ~mmua: ~.Zm f~;D &quot;tmrm&quot; '\[~-Ul-lm~ null</Paragraph> <Section position="1" start_page="59" end_page="59" type="sub_section"> <SectionTitle> 3.2 Interface and Implementatien Notes </SectionTitle> <Paragraph position="0"> Each of the three modes (reference, segment, and dialog) has one main window in which a page of text is displayed. For example, Figwe i shows the main screen for reference mode. In this figure, dark text represents the NP's that have been marked or extracted from the Tre~ bank. The &quot;current expression&quot; is highlighted and corefen'ins expressions are underlined. Though this scheme is perhaps visually unpleasing on paper, note that on the computer, the application uses vivid colors and easily differentiable typefaces. Furthermore, elements of the color scheme are cnstomizable by the user.</Paragraph> <Paragraph position="1"> The tool saves the user's aunotafioes in several data files while leaving the original text file unchanged. Other annotation programs have embedded the annotations into the text using a sublanguage of XML. Hies generated un* der either method are equally capable of representing the desired levels of annotation; we separate the text from the annotations \[n order to simplify the parsing of the data. In case a REFEREE user should want to port some marked text to a new annotation system, it is straightforward to automatically generate a text-and-annotation file to conform to any XML-style definition.</Paragraph> <Paragraph position="2"> Referee. We have found that it is much easier to code a corpus and get reliable results when the system has already found the majority of the markables. We intend to improve the tool by providing more functionality and better visualization of patterns in the data. We hope to add more complex feature-extraction rules that scarf, h the parse tree more extensively for syntactic features ~ are evident from the tree suucune; We are also interested in using a lexicalized knowledge base to find semantic relationships between the marked expressions.</Paragraph> <Paragraph position="3"> We believe that the requirements of the intended applications dictate the design of a novel and unique tool for the analysis of the relationship between discourse structure and reference. Referee fills this niche, and greatly reduces the workload placed on the human users. Furthermore, the open design of Referee makes it flexible, extenm'ble, and appficable to any number of other applications. null</Paragraph> </Section> </Section> class="xml-element"></Paper>