File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0904_metho.xml
Size: 18,565 bytes
Last Modified: 2025-10-06 14:10:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0904"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Pilot Study on Acquiring Metric Temporal Constraints for Events</Title> <Section position="4" start_page="23" end_page="24" type="metho"> <SectionTitle> 2 Annotation Scheme and Corpora </SectionTitle> <Paragraph position="0"> TimeML (Pustejovsky et al. 2005) (www.timeml.org) is an annotation scheme for markup of events, times, and their qualitative temporal relations in news articles. The TimeML scheme flags tensed verbs, adjectives, and nominals with EVENT tags with various attributes, including the class of event, tense, grammatical aspect, polarity (negative or positive), any modal operators which govern the event being tagged, and cardinality of the event if it's mentioned more than once. Likewise, time expressions are flagged and their values normalized, based on an extension of the ACE (2004) (tern.mitre.org) TIMEX2 annotation scheme (called TIMEX3).</Paragraph> <Paragraph position="1"> For temporal relations, TimeML defines a TLINK tag that links tagged events to other events and/or times. For example, given sentence (4), a TLINK tag will anchor the event instance of announcing to the time expression Tuesday (whose normalized value will be inferred from context), with the relation IS_INCLUDED. This is shown in (5).</Paragraph> <Paragraph position="2"> The representation of time expressions in TimeML uses TIMEX2, which is an extension of the TIMEX2 scheme (Ferro et al. 2005). It represents three different kinds of time values: points in time (answering the question &quot;when?&quot;), durations (answering &quot;how long?&quot;), and frequencies (answering &quot;how often?&quot;) .</Paragraph> <Paragraph position="3"> TimeML uses 14 temporal relations in the TLINK relTypes. Among these, the 6 inverse relations are redundant. In order to have a non-hierarchical classification, SIMULTANEOUS and IDENTITY are collapsed, since IDENTITY is a subtype of SIMULTANEOUS. (An event or time is SIMULTANEOUS with another event or time if they occupy the same time interval. X and Y are IDENTICAL if they are simultaneous and coreferential). DURING and IS_INCLUDED are collapsed since DURING is a subtype of IS_INCLUDED that anchors events to times that are durations. (An event or time INCLUDES another event or time if the latter occupies a proper subinterval of the former.) IBEFORE (immediately before) corresponds to MEETS in Allen's interval calculus (Allen 1984). Allen's OVER-LAPS relation is not represented in TimeML. The above considerations allow us to collapse the TLINK relations to a disjunctive classification of 6 temporal relations TRels = {SIMUL-TANEOUS, IBEFORE, BEFORE, BEGINS, ENDS, INCLUDES}. These 6 relations and their inverses map one-to-one to 12 of Allen's 13 basic relations (Allen 1984).</Paragraph> <Paragraph position="4"> Formally, each TLINK is a constraint of the general form x R y, where x and y are intervals, and R is a disjunct [?]</Paragraph> <Paragraph position="6"> tion in TRels. In annotating a document for Ti- null Our representation (using t3 and t4) grounds the fuzzy primitive P1Q3 (i.e., a period of one 3 rd -quarter) to specific months, though this is an application-specific step. In analyzing our data, we normalize P1Q3 as P3M (i.e., a period of 3 months). For conciseness, we omit TimeML EVENT and TIMEX3 attributes that aren't relevant to the discussion. null meML, the annotator adds a TLINK iff she can commit to the TLINK relType being unambiguous, i.e., having exactly one relType r. Two human-annotated corpora have been released based on TimeML : TimeBank 1.2 (Pustejovsky et al. 2003) with 186 documents and 64,077 words of text, and the Opinion Corpus (www.timeml.org), with 73 documents and 38,709 words. TimeBank 1.2 (we use 1.2.a) was created in the early stages of TimeML development, and was partitioned across five annotators with different levels of expertise. The Opinion Corpus was developed recently, and was partitioned across just two highly trained annotators, and could therefore be expected to be less noisy. In our experiments, we merged the two datasets to produce a single corpus, called OTC.</Paragraph> </Section> <Section position="5" start_page="24" end_page="25" type="metho"> <SectionTitle> 3 Translation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="24" end_page="24" type="sub_section"> <SectionTitle> 3.1 Introduction </SectionTitle> <Paragraph position="0"> The first step is to translate a TimeML representation with qualitative relations into one where metric constraints are added. This translation needs to produce a consistent metric representation. The temporal extents of events, and between events, can be read off, when there are no unknowns, from the metric representation.</Paragraph> <Paragraph position="1"> The problem, however is that the representation may have unknowns, and the extents may not be minimal.</Paragraph> </Section> <Section position="2" start_page="24" end_page="24" type="sub_section"> <SectionTitle> 3.2 Mapping to Metric Representation </SectionTitle> <Paragraph position="0"> Let each event or time interval x be represented as a pair of start and end time points <x1, x2>. For example, given sentence (4), and the TimeML representation shown in (5), let x be fall and y be announce. Then, we have x1 =</Paragraph> <Paragraph position="2"> sents time of day in hours).</Paragraph> <Paragraph position="3"> To add metric constraints, given a pair of events or times x and y, where x=<x1, x2> and y=<y1, y2>, we need to add, based on the qualitative relation between x and y, constraints of the general form (xi-yj) [?] n, for 1 [?] i, j [?]2. We follow precisely the method 'Allen-to-metric' of (Kautz and Ladkin 1991) which defines metric constraints for each relation R in TRels. For example, here is a qualitative relation and its metric constraints: (6) x is BEFORE y iff (x2-y1) < 0.</Paragraph> <Paragraph position="4"> More details can be found at timeml.org.</Paragraph> <Paragraph position="5"> In our example, where x is fall and y is the announce, we are given the qualitative relationship that x is BEFORE Y, so the metric constraint (x2-y1) < 0 can be asserted.</Paragraph> <Paragraph position="6"> Consider another qualitative relation and its metric constraint: (7) z INCLUDES y iff (z1-y1) < 0 and (y2-z2) < 0.</Paragraph> <Paragraph position="7"> Let y be announce in (4), as before, and let z=<z1, z2> be the time of Tuesday, where z1 = 19980108T00, and z2 = 19980108T23:59. Since we are given the qualitative relation y IS_INCLUDED z, the metric constraints (z1-y1) < 0 and (y2-z2) < 0 can be asserted.</Paragraph> </Section> <Section position="3" start_page="24" end_page="25" type="sub_section"> <SectionTitle> 3.3 Consistency Checking </SectionTitle> <Paragraph position="0"> We now turn to the general problem of checking consistency. The set of TLINKs for a document constitutes a graph, where the nodes are events or times, and the edges are TLINKs.</Paragraph> <Paragraph position="1"> Given such a TimeML-derived graph for a document, a temporal closure algorithm (Verhagen 2005) carries out a transitive closure of the graph. The transitive closure algorithm was inspired by (Setzer and Gaizauskas 2000) and is based on Allen's interval algebra, taking into account the limitations on that algebra that were pointed out by (Vilain et al. 1990). It is basically a constraint propagation algorithm that uses a transitivity table to model the compositional behavior of all pairs of relations in a document. The algorithm's transitivity table is represented by 745 axioms. An example axiom is shown in (8): (8) If relation(A, B) = BEFORE && relation(B, C) = INCLUDES then infer relation(A, C) = BEFORE.</Paragraph> <Paragraph position="2"> In propagating constraints, links added by closure can have a disjunction of one or more relations in TRels. When the algorithm terminates, any TLINK with more than one disjunct is discarded. Thus, a closed graph is consistent and has a single relType r in TRels for each TLINK edge. The algorithm runs in O(n ) time, where n is the number of intervals.</Paragraph> <Paragraph position="3"> The closed graph is augmented so that whenever input edges a r c are composed to yield the output edge a r c, where r1, r2, and r3 are in TRels, the metric constraints for r are added to the output edge. To continue our example, since the fall x is BEFORE the Tuesday z and z INCLUDES y (announce), we can infer, using rule 8, that x is BEFORE y, i.e., that fall precedes announce. Using rule (6), we can again assert that (x2-y1) < 0.</Paragraph> </Section> <Section position="4" start_page="25" end_page="25" type="sub_section"> <SectionTitle> 3.4 Reading off Temporal Extents </SectionTitle> <Paragraph position="0"> We now have the metric constraints added to the graph in a consistent manner. It remains to compute, given each event or time x=<x1, x2>, the values for x1 and x2. In our example, we have fall x=<19970701T00, 19970930T23:59>, announce y=<19980108Tn1, 19980108Tn2>, and Tuesday z=<19980108T00, 19980108T23:59>, and the added metric constraints that (x2-y1), (z1-y1), and (y2-z2) are all negative. Graphically, this can be pictured as in As can be see in Figure 1, there are still unknowns (n1 and n2): we aren't told exactly how long announce lasted -- it could be anywhere up to a day. We therefore need to acquire information about how long events last when the example text doesn't tell us. We now turn to this problem. null</Paragraph> </Section> </Section> <Section position="6" start_page="25" end_page="25" type="metho"> <SectionTitle> 4 Acquisition </SectionTitle> <Paragraph position="0"> We started with the 4593 event-time TLINKs we found in the unclosed human-annotated OTC.</Paragraph> <Paragraph position="1"> From these, we restricted ourselves to those where the times involved were of type TIMEX3 DURATION. We augmented the TimeBank data with information from the raw (un-annotated) British National Corpus. We tried a variety of search patterns to try and elicit durations, finally converging on the single pattern &quot;lasted&quot;. There were 1325 hits for this query in the BNC. (The public web interface to the BNC only shows 50 random results at a time, so we had to iterate.) The retrieved hits (sentences and fragments of sentences) were then processed with components from the TARSQI toolkit (Verhagen et al. 2005) to provide automatic TimeML annotations. The TLINKs between events and times that were TIMEX3 DURATIONS were then extracted.</Paragraph> <Paragraph position="2"> These links were then corrected and validated by hand and then added to the OTC data to form an integrated corpus. An example from the BNC is shown in (9).</Paragraph> <Paragraph position="3"> Next, the resulting data was subject to morphological normalization in a semi-automated fashion to generate more counts for each event. Hyphens were removed, plurals were converted to singular forms, finite verbs to infinitival forms, and gerundive nominals to verbs. Derivational ending on nominals were stripped and the corresponding infinitival verb form generated. These normalizations are rather aggressive and can lead to loss of important distinctions. For example, sets of events (e.g., storms or bombings) as a whole can have much longer durations compared to individual events. In addition, no word-sense disambiguation was carried out, so different senses of a given verb or event nominal may be confounded together.</Paragraph> </Section> <Section position="7" start_page="25" end_page="69" type="metho"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> The resulting dataset had 255 distinct events with the number of durations for the events as shown in the frequency distribution in Table 1.</Paragraph> <Paragraph position="1"> The granularities found in news corpora such as OTC and mixed corpora such as BNC are domi- null nated by quarterly reports, which reflect the influence of specific information pinpointing the durations of financial events. This explains the fact that 12 of the top 13 events in Table 1 are financial ones, with the reporting verb say being the only non-financial event in the top 13.</Paragraph> <Paragraph position="2"> The durations for the most frequent event, represented by the verb to lose, is shown in Table 2. Most losses are during a quarter, or a year, because financial news tends to quantize losses for those periods.</Paragraph> <Paragraph position="3"> to lose Ideally, we would be able to generalize over the duration values, grouping them into classes. Table 3 shows some hand-aggregated duration classes for the data. These classes are ranges of durations. It can be seen that the temporal span of events across the data is dominated by granularities of weeks and months, extending into small numbers of years.</Paragraph> <Paragraph position="4"> mins/days/months, continuous present, indefinite future, etc.) Interestingly, 67 events in the data correspond to 'achievement' verbs, whose main characteristic is that they can have a near-instantaneous duration (though of course they can be iterated or extended to have other durations). We obtained a list of achievement verbs from the LCS lexicon of (Dorr and Olsen 1997) . Achievements can be marked as having durations of PTXS, i.e., an unspecified number of seconds. Such values don't reinforce any of the observed values, instead extending the set of durations to include much smaller durations. As a result, these hidden values are not shown in our data</Paragraph> </Section> <Section position="8" start_page="69" end_page="69" type="metho"> <SectionTitle> 6 Estimating Duration Probabilities </SectionTitle> <Paragraph position="0"> Given a distribution of durations for events observed in corpora, one of the challenges is to arrive at an appropriate value for a given event (or class of events). Based on data such as Table 2, we could estimate the probability P(lose, P3M) [?] 0.346, while P(lose, P1D) [?] 0.038, which is nearly ten times less likely. Table 2 reveals peaking at 3 months, 6 months, and 9 months, with uniform probabilities for all others. Further, we can estimate the probability that losses will be during periods of 2 months, 3 months, or 9 months as [?] 0.46. Of course, we would prefer a much large sample to get more reliable estimates.</Paragraph> <Paragraph position="1"> One could also infer a max-min time range, but the maximum or minimum may not always be likely, as in the case of lose, which has relatively low probability of extending for &quot;P1D&quot; or &quot;P1E&quot;. Turning to earnings, we find that P(earn,</Paragraph> <Paragraph position="3"> of event to lose So far, we have considered durations to be discrete, falling into a fixed number of categories. These categories could be atomic TimeML DU- null See www.umiacs.umd.edu/ ~bonnie/ LCS_ Database_Documentation.html. null RATION values, as in the examples of durations in Table 2, or they could be aggregated in some fashion, as in Table 3. In the discrete view, unless we have a category of 4 months, the probability of a loss extending over 4 months is undefined. Viewed this way, the problem is one of classification, namely providing the probability that an event has a particular duration category. The second view takes duration to be continuous, so the duration of an event can have any subinterval as a value. The problem here is one of regression. We can re-plot the data in Table 2 as Figure 2, where we have plotted durations in days on the x-axis in a natural log scale, and frequency on the y-axis. Since we have plotted the durations as a curve, we can interpolate and extrapolate durations, so that we can obtain the probability of a loss for 4 months. Of course, we would like to fit the best curve possible, and, as always, the more data points we have, the better.</Paragraph> </Section> <Section position="9" start_page="69" end_page="69" type="metho"> <SectionTitle> 7 Possible Enhancements </SectionTitle> <Paragraph position="0"> One of the basic problems with this approach is data sparseness, with few examples for each event. This makes it difficult to generalize about durations. In this section, we discuss enhancements that can address this problem.</Paragraph> <Section position="1" start_page="69" end_page="69" type="sub_section"> <SectionTitle> 7.1 Converting points to durations </SectionTitle> <Paragraph position="0"> More durations can be inferred from the OTC by coercing TIMEX3 DATE and TIME expressions to DURATIONS; for example, if someone announced something in 1997, the maximum duration would be one year. Whether this leads to reliable heuristics or not remains to be seen.</Paragraph> </Section> <Section position="2" start_page="69" end_page="69" type="sub_section"> <SectionTitle> 7.2 Event class aggregation </SectionTitle> <Paragraph position="0"> A more useful approach might be to aggregate events into classes, as we have done implicitly with financial events. Reporting verbs are already identified as a TimeML subclass, as are aspectual verbs such as begin, continue and finish. Arriving at an appropriate set of classes, based on distributional data or resource-derived classes (e.g., TimeML, VerbNet, WordNet, etc.) remains to be explored.</Paragraph> </Section> <Section position="3" start_page="69" end_page="69" type="sub_section"> <SectionTitle> 7.3 Expanding the corpus sample </SectionTitle> <Paragraph position="0"> Last but not least, we could expand substantially the search patterns and size of the corpus searched against. In particular, we could emulate the approach used in VerbOcean (Chklovski and Pantel 2004). This resource consists of lexical relations mined from Google searches. The mining uses a set of lexical and syntactic patterns to test for pairs of verbs strongly associated on the Web in a particular semantic relation. For example, the system discovers that marriage happens-before divorce, and that tie happens-before untie. Such results are based on estimating the probability of the joint occurrence of the two verbs and the pattern. One can imagine a similar approach being used for durations. Bootstrapping of patterns may also be possible.</Paragraph> </Section> </Section> class="xml-element"></Paper>