File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1008_metho.xml
Size: 16,306 bytes
Last Modified: 2025-10-06 14:08:41
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1008"> <Title>Annotating and measuring temporal relations in texts</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Evaluating annotations </SectionTitle> <Paragraph position="0"> What we want to annotate is something close to the temporal model built by a human reader of a text; as such, it may involve some form of reasoning, based on various cues (lexical or discursive), and may be expressed in several ways. As was noticed by (Setzer, 2001), it is difficult to reach a good agreement between human annotators, as they can express relations between events in different, yet equivalent, ways. For instance, they can say that an event e1 happens during another one e2, and that e2 happens before e3, leaving implicit that e1 too is before e3, while another might list explicitly all relations. One option could be to ask for a relation between all pairs of events in a given text, but this would be demanding a lot from human subjects, since they would be asked for n (n 1)=2 judgments, most of which would be hard to make explicit. Another option, followed by (Setzer, 2001) (and in a very simplified way, by (Katz and Arosio, 2001)) is to use a few rules of inference (similar to the example seen in the previous paragraph), and to compare the closures (with respect to these rules) of the human annotations. Such rules are of the form &quot;if r1 holds between x and y, and r2 holds between y and z, then r3 holds between x and z&quot;. Then one can measure the agreement between annotations with classical precision and recall on the set of triplets (event x,event y,relation). This is certainly an improvement, but (Setzer, 2001) points out that humans still forget available information, so that it is necessary to help them spell out completely the information they should have annotated. Setzer estimates that an hour is needed on average for a text with a number of 15 to 40 events.</Paragraph> <Paragraph position="1"> Actually, this method has two shortcomings.</Paragraph> <Paragraph position="2"> First, the choice of temporal relations proposed to annotators, i.e. &quot;before&quot;, &quot;after&quot;, &quot;during&quot;, and &quot;simultaneously&quot;. The latter is all the more difficult to judge as it lacks a precise semantics, and is defined as &quot;roughly at the same time&quot; ((Setzer, 2001), p.81). The second problem is related to the inferential model considered, as it is only partial. Even though the exact mental processing of such information is still beyond reach, and thus any claim to cognitive plausibility is questionable, there are more precise frameworks for reasoning about temporal information. For instance the well-studied Allen's relations algebra (see Figure 2). Here, relations between two time intervals are derived from all the possibilities for the respective position of those intervals endpoints (before, after or same), yielding 13 relations. What this framework can also express are more general relations between events, such as disjunctive relations (relation between event 1 and event 2 is relation A or relation B), and reasoning on such knowledge. We think it is important at least to relate annotation relations to a clear temporal model, even if this model is not directly used.</Paragraph> <Paragraph position="3"> Besides, we believe that measuring agreement on the basis of a more complete &quot;event calculus&quot; will be more precise, if we accept to infer disjunctive relation. Then we want to give a better score to the annotation &quot;A or B&quot; when A is true, than to an annotation where nothing is said. Section 5 gives more details about this problem.</Paragraph> <Paragraph position="4"> We will now present our method to achieve the task of annotating automatically event relations.</Paragraph> <Paragraph position="5"> This has been tested on a small set of French newswire texts from the Agence France Press.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 A method for annotating temporal relations </SectionTitle> <Paragraph position="0"> We will now present our method to achieve the task of annotating automatically event relations. This has been tested on a small set of French newswire texts from the Agence France Press. The starting point was raw text plus its broadcast date. We then applied the following steps: part of speech tagging with Treetagger (Schmid, 1994), with some post-processing to locate some lexicalised prepositional phrases; partial parsing with a cascade of regular expressions analyzers (cf. (Abney, 1996); we also used Abney's Cass software to apply the rules)1. This was done to extract dates, temporal adjuncts, various temporal markers, and to achieve a somewhat coarse clause-splitting (one finite verb in each clause) and to attach temporal adjuncts to the appropriate clause (this is of course a potentially large source of errors). Relative clauses are extracted and put at the end of their sentence of origin, in a way similar to (Filatova and Hovy, 2001). Table 1 gives an idea of the kind of temporal information defined and extracted at this step and for which potentially different temporal interpretations are given (for now, temporal focus is always the previously detected event; this is obviously an over-simplification).</Paragraph> <Paragraph position="1"> date computation to precise temporal locations of events associated with explicit, yet imprecise, temporal information, such as dates relative to the time of the text (e.g. last Monday). for each event associated to a temporal adjunct, a temporal relation is established (with a date when possible).</Paragraph> <Paragraph position="2"> a set of discourse rules is used to establish possible relations between two events appearing consecutively in the text, according to the tenses of the verbs introducing the events.</Paragraph> <Paragraph position="3"> These rules for French are similar to rules for English proposed in (Grover et al., 1995; Song and Cohen, 1991; Kameyama et al., 1993), but 1We have defined 89 rules, divided in 29 levels.</Paragraph> <Paragraph position="4"> are expressed with Allen relations instead of a set of ad hoc relations (see Table 1 for a sub-set of the rules). These rules are only applied when no temporal marker indicates a specific relation between the two events.</Paragraph> <Paragraph position="5"> the last step consists in computing a fixed point on the graph of relations between events recognized in the text, and dates. We used a classical path-consistency algorithm (Allen, 1984). More explanation is given section 4.</Paragraph> <Paragraph position="6"> Allen relations are illustrated Figure 2. In the following (and Table 1) they will be abbreviated with their first letters, adding an &quot;i&quot; for their inverse relations. So, for instance, &quot;before&quot; is &quot;b&quot; and &quot;after&quot; is &quot;bi&quot; (b(x,y) bi(y,x)). Table 1 gives the disjunction of possible relations between an event e1 with tense X and a event e2 with tense Y following e1 in the text. This is considered as a first very simplified discourse model. It only tries to list plausible relations between two consecutive events, when there is no marker than could explicit that relation. For instance a simple past e1 can be related with e, b, m, s, d, f, o to a following simple past event e2 in such a context (roughly saying that e1 is before or during e2 or meets or overlaps it). This crude model is only intended as a basis, which will be refined once we have a larger set of annotated texts. This will be enriched later with a notion of temporal focus, following for instance (Kameyama et al., 1993; Song and Cohen, 1991), and a notion of temporal perspective necessary to capture more complex tense interactions.</Paragraph> <Paragraph position="7"> The path consistency algorithm is detailed in the next section.</Paragraph> <Paragraph position="8"> and Y (Time flies from left to right) We have argued in favor of the use of Allen relations for defining annotating temporal relations, not only because they have a clear semantics, but also because a lot of work has been done on inference procedures over constraints expressed with these relations. We therefore believe that a good way of avoiding the pitfalls of choosing relations for human annotation and of defining inference patterns for these relations is to define them from Allen relations and use relational algebra computation to infer all possible relations between events of a text (that is saturate the constraint graph, see below), both from a human annotation and an annotation given by a system, and then to compare the two. In this perspective, any event is considered to correspond to a convex time interval.</Paragraph> <Paragraph position="9"> The set of all relations between pairs of events is then seen as a graph of constraints, which can be completed with inference rules. The saturation of the graph of relations is not done with a few hand-crafted rules of the form (relation between e1 and e2) + (relation between e2 and e3) gives (a simple relation between e1 and e3) (Setzer, 2001; Katz and Arosio, 2001) but with the use of the full algebra of Allen relation. This will reach a more complete description of temporal information, and also gives a way to detect inconsistencies in an annotation.</Paragraph> <Paragraph position="10"> An algebra of relation can be defined on any set of relations that are mutually exclusive (two relations cannot hold at the same time between two entities) and exhaustive (at least one relation must hold between two given entities). The algebra starts from a set of base relations U= fr1;r2;:::g, and a general relation is a subset of U, interpreted as a disjunction of the relations it contains. From there we can define union and intersection of relations as classical set union and intersection of the base relations they consist of. Moreover, one can define a composition of relations as follows:</Paragraph> <Paragraph position="12"> By computing beforehand the 13 13 compositions of base relations of U, we can compute the composition of any two general relations (because r\r0 =O</Paragraph> <Paragraph position="14"> Saturating the graph of temporal constraints means applying these rules to all compatible pairs of constraints in the graph and iterating until a fixpoint is reached. The following, so-called &quot;pathconsistency&quot; algorithm (Allen, 1984) ensures this fixpoint is reached: date(1/2) : non absolute date (&quot;march 25th&quot;, &quot;in June&quot;). e1/e2 imp pp pres sp imp o, e, s, d, f, si, di, fi bi, mi, oi e, b o, d, s, f, e, si, di, fi pp b, m, o, e, s, d, f b, m, o, e, s, d, f, bi, mi e, b b, m, o pres U U b, m, o, si, di, fi, e U sp b, m, o, e, s, d, f e, s, d, f, bi, mi e, b e, b, m, s, d, f, o 1. changed = 0 2. for all pair of nodes (i;j) 2 N N and for all k 2 N such that ((i;k) 2 A ^ (k;j) 2 A) (a) R1i;j = (Ri;k Rk;j) (b) if no edge (a relation R2i;j) existed before between i and j, then R2i;j = U (c) intersect: Ri;j = R1i;j \ R2i;j</Paragraph> <Paragraph position="16"> It is to be noted that this algorithm is correct: if it detects an inconsistency then there is really one, but it is incomplete in general (it does not necessarily detect an inconsistent situation). There are sub-algebras for which it is also complete, but it remains to be seen if any of them can be enough for our purpose here.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Measuring success </SectionTitle> <Paragraph position="0"> In order to validate our method, we have compared the results given by the system with a &quot;manual&quot; annotation. It is not really realistic to ask humans (whether they are experts or not) for Allen relations between events. They are too numerous and some are too precise to be useful alone, and it is probably dangerous to ask for disjunctive information.</Paragraph> <Paragraph position="1"> But we still want to have annotation relations with a clear semantics, that we could link to Allen's algebra to infer and compare information about temporal situations. So we have chosen relations similar to that of (Bruce, 1972) (as in (Li et al., 2001)), who inspired Allen; these relations are equivalent to certain sets of Allen relations, as shown Table 2. We thought they were rather intuitive, seem to have an appropriate level of granularity, and since three of them are enough to describe situations (the other 3 being the converse relations), they are not to hard to use by naive annotators.</Paragraph> <Paragraph position="2"> To abstract away from particulars of a given annotation for some text, and thus to be able to compare the underlying temporal model described by an annotation, we try to measure a similarity between annotations given by a system and human annotations, from the saturated graph of detected temporal relations in each case (the human graph is saturated after annotation relations have been translated as equivalent disjunctions of Allen relations). We do not want to limit the comparison to &quot;simple&quot; (base) relations, as in (Setzer, 2001), because it makes the evaluation very dependent on the choice of relations, and we also want to have a gradual measure of the imprecision of the system annotation. For instance, finding there is a &quot;before or during&quot; relation between two events is better than proposing &quot;after&quot; is the human put down &quot;before&quot;, and it is less good BEFORE 8 i 8 j (i before j , ((i b j) _ (i m j))) AFTER 8 i 8 j (i after j , ((i bi j) _ (i mi j))) OVERLAPS 8 i 8 j (i overlaps j , ((i o j))) IS_OVERLAPPED 8 i 8 j (i is_overlapped j , ((i oi j))) INCLUDES 8 i 8 j (i includes j , ((i di j) _ (i si j) _ (i fi j) _ (i e j))) IS_INCLUDED 8 i 8 j (i is_included j , ((i d j) _ (i s j) _ (i f j) _ (i e j))) than the correct answer &quot;before&quot;.</Paragraph> <Paragraph position="3"> Actually we are after two different notions. The first one is the consistency of the system's annotation with the human's: the information in the text is compatible with the system's annotation, i.e. the former implies the latter. The second notion is how precise the information given by the system is. A very disjunctive information is less precise than a simple one, for instance (a or b or c) is less precise than (a or b) if a correct answer is (a).</Paragraph> <Paragraph position="4"> In order to measure these, we propose two elementary comparison functions between two sets of relations S and H, where S is the annotation proposed by the system and H is the annotation inferred from what was proposed by the human.</Paragraph> <Paragraph position="5"> finesse = jS\HjjSj coherence = jS\HjjHj The global finesse score of an annotation is the average of a measure on all edges that have information according to the human annotation (this excludes edges with the universal disjunction U) once the graph is saturated, while coherence is averaged on the set of edges that bear information according to the system annotation.</Paragraph> <Paragraph position="6"> Finesse is intended to measure the quantity of information the system gets, while coherence gives an estimate of errors the system makes with respect to information in the text. Finesse and coherence thus are somewhat similar respectively to recall and precision, but we decided to use new terms to avoid confusion (&quot;precision&quot; being an ambiguous term when dealing with gradual measures, as it could mean how close the measure is to the maximum 1).</Paragraph> <Paragraph position="7"> Obviously if S=H on all edges, all measures are equal to 1. If the system gives no information at all, S is a disjunction of all relations so H S, H \ S = H and coherence=1, but then finesse is very low.</Paragraph> <Paragraph position="8"> These measures can of course be used to estimate agreement between annotators.</Paragraph> </Section> class="xml-element"></Paper>