File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1020_intro.xml
Size: 4,780 bytes
Last Modified: 2025-10-06 14:02:16
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1020"> <Title>Inferring Sentence-internal Temporal Relations</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 The Model </SectionTitle> <Paragraph position="0"> Given a main clause and a subordinate clause attached to it, our task is to infer the temporal marker linking the two clauses. Formally, Pa0 SM</Paragraph> <Paragraph position="2"> represents the probability that a marker t j relates a main clause SM and a subordinate clause SS. We aim to identify which marker t j in the set of possible markers T maximises Pa0 SM</Paragraph> <Paragraph position="4"> We ignore the term Pa0 SM in (3) as it is a constant and use Bayes' Rule to derive Pa0 SM</Paragraph> <Paragraph position="6"> We will further assume that the likelihood of the subordinate clause SS is conditionally independent of the main clause SM (i.e., Pa0 SS</Paragraph> <Paragraph position="8"> ). The assumption is clearly a simplification but makes the estimation of the probabilities Pa0 SM</Paragraph> <Paragraph position="10"> more reliable in the face of sparse data.</Paragraph> <Paragraph position="12"> characteristic of the propositions occurring with the marker t j (our features are described in detail in Section 3.2). By making the simplifying assumption that these features are conditionally independent given the temporal marker, the probability of observing the con-</Paragraph> <Paragraph position="14"> We effectively treat the temporal interpretation problem as a disambiguation task. From the (confusion) set T of temporal markers a17 after, before, while, when, as, once, until, sincea18 , we select the one that maximises (6). We compiled a list of temporal markers from Quirk et al.</Paragraph> <Paragraph position="15"> (1985). Markers with corpus frequency less than 10 per million were excluded from our confusion set (see Section 3.1 for a description of our corpus).</Paragraph> <Paragraph position="16"> The model in (6) is simplistic in that the relationships between the features across the clauses are not captured directly. However, if two values of these features for the main and subordinate clauses co-occur frequently with a particular marker, then the conditional probability of these features on that marker will approximate the right biases. Also note that some of these markers are ambiguous with respect to their meaning: one sense of while denotes overlap, another contrast; since can indicate a sequence of events in which the main clause occurs after the subordinate clause or cause, as indicates overlap or cause, and when can denote overlap, a sequence of events, or contrast. Our model selects the appropriate markers on the basis of distributional evidence while being agnostic to their specific meaning when they are ambiguous.</Paragraph> <Paragraph position="17"> For the sentence fusion task, the identity of the two clauses is unknown, and our task is to infer which clause contains the marker. This can be expressed as:</Paragraph> <Paragraph position="19"> where p is generally speaking a sentence fragment to be realised as a main or subordinate clause (a17 p a4 S</Paragraph> <Paragraph position="21"> the two clauses.</Paragraph> <Paragraph position="22"> We can estimate the parameters for the models in (6) and (7) from a parsed corpus. We first identify clauses in a hypotactic relation, i.e., main clauses of which the subordinate clause is a constituent. Next, in the training phase, we estimate the probabilities Pa0 aa10 M</Paragraph> <Paragraph position="24"> with marker t. For features with zero counts, we use add-k smoothing (Johnson, 1932), where k is a small number less than one. In the testing phase, all occurrences of the relevant temporal markers are removed for the interpretation task and the model must decide which member of the confusion set to choose. For the sentence fusion task, it is the temporal order of the two clauses that is unknown and must be inferred. A similar approach has been advocated for the interpretation of discourse relations by Marcu and Echihabi (2002). They train a set of naive Bayes classifiers on a large corpus (in the order of 40 M sentences) representative of four rhetorical relations using word bigrams as features. The discourse relations are read off from explicit discourse markers thus avoiding time consuming hand coding. Apart from the fact that we present an alternative model, our work differs from Marcu and Echihabi (2002) in two important ways. First we explore the contribution of linguistic information to the inference task using considerably smaller data sets and secondly apply the proposed model to a generation task, namely information fusion.</Paragraph> </Section> class="xml-element"></Paper>