File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0907_metho.xml

Size: 21,836 bytes

Last Modified: 2025-10-06 14:10:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0907">
  <Title>Marking Time in Developmental Biology: Annotating Developmental Events and their Links with Molecular Events</Title>
  <Section position="4" start_page="46" end_page="47" type="metho">
    <SectionTitle>
2 Notions of Time
</SectionTitle>
    <Paragraph position="0"> As previously mentioned, there are different ways of calibrating for developmental stages, and they cannot simply be mapped to one another. The two most common stage notations for mouse development are Theiler stages, TS, and Embryonic days, E (equivalent to days post coitum, d.p.c.). The latter are self explanatory in that they denote the 24 hour day and can be considered real-time staging.</Paragraph>
    <Paragraph position="1">  The convention was originally that E11 would represent the 24 hour period of the 11th day. It is, however, now common to nd E11.5 representing the same time period, but this is merely a change in convention due to standard practices of experimentation. null A Theiler stage on the other hand represents a non- xed relative time period de ned by the progress of development rather than directly in terms of the passage of time. Theiler Stages (Theiler, 1989) divide mouse development into 26 prenatal and 2 postnatal stages. In general, Theiler used external features that can be directly assessed by visual inspection of the live embryo as developmental landmarks to de ne stages. The Edinburgh Mouse Atlas Project (EMAP)1 uses Theiler stages to organise anatomical terms in their Mouse Atlas Nomenclature (MAN). EMAP gives a brief description of each Theiler stage with TS25 as an example as follows: Skin wrinkled The skin has thickened and formed wrinkles and the subcutaneous veins are less visible. The ngers and toes have become parallel and the umbilical hernia has disappeared. The eyelids have fused. Whiskers are just visible.</Paragraph>
    <Paragraph position="2"> Absent: ear extending over auditory meatus, long whiskers.</Paragraph>
    <Paragraph position="3"> An embryo is in TS25 at approximately 17 d.p.c.</Paragraph>
    <Paragraph position="4"> As can be seen in Figure 2, an embryo at E11 could be considered in Theiler stage 17, 18 or 19, i.e. Theiler stages can overlap one another with respect to Embryonic day. Indeed, here, TS17 can fully encompass TS18 in the dpc timeline.</Paragraph>
    <Paragraph position="5"> The development of internal structures is approximately correlated with external developments, so except for ne temporal differences, the Theiler stages can be assumed to apply to the whole embryo. Theiler stages provide only gross temporal resolution of developmental events, and the development of internal structures often take place within the boundaries of one of these stages or overlapping stage boundaries. Thus, internal developmental processes can also have their own ner relative timeline or staging.</Paragraph>
    <Paragraph position="6"> There is no ontology or reference book that comprehensively speci es this ner staging and the knowledge of the biologist as the reader of ar- null notated with the two standard staging notations for mouse development. At E10.5 the Wolf an duct invades the metanephric mesenchyme forming the ureteric bud around E11. The bud then branches around E11.5 and continues to do so until birth, forming the ultimate functional units of the kidney - the nephrons. TS = Theiler Stage, E = Embryonic day/dpc. This image is adapted from http://www.sciencemuseum.org.uk/ ticles is relied upon. This work will contribute to making this deeper staging criteria explicit.</Paragraph>
  </Section>
  <Section position="5" start_page="47" end_page="49" type="metho">
    <SectionTitle>
3 Annotation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
3.1 Event Classification
</SectionTitle>
      <Paragraph position="0"> As a rst step, a Gold Standard corpus of 988 sentences was developed with each sentence being classi ed as containing the description of a developmental and/or molecular event or not. 385 sentences were classi ed as positive, with 603 negative. Named entities within all these sentences were also annotated. Among these element types were stage, process and tissue. A Naive Bayes automatic classi er for sentence classi cation was developed using this Gold Standard resulting in a balanced F-score of 72.3% for event classi cation. (A manual rule-based approach resulted in an F-score of 86.6%, but this has yet to be fully investigated for automation. Guessing positive for all sentences would give a balanced F-score of</Paragraph>
    </Section>
    <Section position="2" start_page="47" end_page="49" type="sub_section">
      <SectionTitle>
3.2 Event Specifications
</SectionTitle>
      <Paragraph position="0"> Two event types are of interest in this work molecular and tissue events. The former involve the action (and possible effect) of molecules dur- null ing development and the latter involves the development of the tissues themselves. A description of an event can be expected to contain the following elements: a0 molecular or tissue event type (e.g. expression, inhibition) a0 stage or temporal expression (e.g. after X, subsequent to X, E11) a0 at least one of - molecule name, anatomical term, biological process term The informational elements included within an event description can then be used to relate events to each other. Speci cally, processes involve known tissues and are known to happen during certain stages, just as the relative order of processes, tissue formations and stages are known. While an initial speci cation of an event may be associated with a single sentence, clause or phrase, not all the elements of relevance to this work may be speci ed there. In particular, an informational element of the event may be explicitly and fully stated in this initial event speci cation, or it may be underspeci ed or it may be missing. For those that are underspeci ed or missing, background knowledge about other elements and events may need to be taken into consideration in order for them to be fully resolved (see Section 4.2).</Paragraph>
      <Paragraph position="1"> The following is a straightforward example where the given sentence speci es all the main elements required for a molecular event.</Paragraph>
      <Paragraph position="2">  1. At E11, the integrin a8 subunit was expressed throughout the mesenchyme of the nephrogenic cord.</Paragraph>
      <Paragraph position="3">  specify more than one event.</Paragraph>
      <Paragraph position="4"> 2. Prior to formation of the ureteric bud,  no a8 expression was evident within the mesenchyme that separates the urogenital ridge from the metanephric mesenchyme and within the metanephric mesenchyme itself.</Paragraph>
      <Paragraph position="5">  EVENT-0 is not the focus of this sentence, but rather a reference event. Its attributes need to be recorded so that the stage of the other events can be determined.</Paragraph>
      <Paragraph position="6"> TimeML (Pustejovsky et al., 2004) is a speci cation language designed for the annotation of temporal and event information. Although TimeML is not currently being used as a method of representation for this work, Example 1 above could be represented as follows:  nephrogenic cord can be considered a signal of type tissue as it does not exist throughout the  whole of development and so can indicate or rule out time periods for this event description.</Paragraph>
    </Section>
    <Section position="3" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
3.3 Event Time-Stamping
</SectionTitle>
      <Paragraph position="0"> The relative timing of any biological processes mentioned in the event descriptions rst needs to be determined before we can work out when the actual events described are taking place.</Paragraph>
      <Paragraph position="1"> Schilder and Habel (2001) looked beyond the core temporal expressions and into prepositional phrases that contained temporal relations, i.e. before, during, etc and introduced the notion of noun phrases as event-denoting expressions. An event that is described as occurring after the election does not have an explicit time-stamp attached to it, but the knowledge about the timing of the election mentioned gives the reader a notion of when in absolute time the event occurred. This is similar to Example 2 above where Event-0 is the reference event, thus biological processes can be considered event-denoting expressions.</Paragraph>
      <Paragraph position="2"> While Schilder and Habel rely on prepositional phrases to designate their event-denoting noun phrases, for this work propositional phrases are not necessarily required. The mention of a noun phrase by itelf may be enough. In developmental biology, tissues may only be extant for a limited period before they form into some other tissue and these can also be used as event-denoting expressions - for example, comma-shaped bodies are structures within the developing kidney that are only in existence for a relatively short time period before the existence of the S-shaped bodies and after epithelialization. Therefore the mention of tissues as well as processes can help to pinpoint the timing of the event being described. While they may not ultimately bring us to the exact stage the event is occurring in, it can at least rule out some spans of time. We discuss this further in Section 4.2.</Paragraph>
      <Paragraph position="3"> In order for events to be linked to one another, it is necessary to uniquely index each event and its elements. Mapping across indices will be utilised so that known relationships between elements can be represented. For example, E10 comes before E12, tubulogenesis occurs during kidney morphogenesis, and the proximal tubule is part of the nephron.</Paragraph>
      <Paragraph position="4"> Of the elements types listed in Section 3.2, only the molecule element cannot be used to resolve developmental stage while tissue, process, stage and, of course, temporal expression can. Other elements are also of interest to the biologist and integral to development and molecular function, however they are not of use in the grounding of events in time.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="49" end_page="51" type="metho">
    <SectionTitle>
4 Initial Investigations
</SectionTitle>
    <Paragraph position="0"> This section demonstrates that one must look beyond the sentence in order to resolve the temporal aspects of events.</Paragraph>
    <Section position="1" start_page="49" end_page="50" type="sub_section">
      <SectionTitle>
4.1 Evidence for Developmental Stage
</SectionTitle>
      <Paragraph position="0"> Evidence suf cient to resolve developmental stage can come from many places. 314 positive sentences from the Gold Standard corpus and their context were examined, and the evidence required to resolve developmental stage for each of the events mentioned there was determined as shown in Table 1.</Paragraph>
      <Paragraph position="1"> As can be seen from the table, only 48 out of the 314 event sentences (i.e. 15%) have the developmental stage in which the event is occurring explicitly stated in the given sentence, (e.g. Example 1 in Section 3). So other means need to be explored in order to ground events with respect to developmental stage. An event sentence may be a continuation of a topic, and so the speci c developmental stage involved may well be stated in the immediately surrounding or related text.</Paragraph>
      <Paragraph position="2"> Information in the immediately surrounding text (rows labelled Following Sentence, Previous Sentence and Current Paragraph) resolves the developmental stage of the event in 64 cases (i.e. 21%). This most commonly occurs by looking for the immediately previously mentioned stage, and in one case the next encountered stage.</Paragraph>
      <Paragraph position="3"> Event sentences also often refer to gures, and so the stage being described in the caption (i.e.</Paragraph>
      <Paragraph position="4"> legend) of the referenced gure will often be the same as the one relevant to the sentence. (This was true of all sentences looked at that referenced a gure.) Figures, however, are generally only found in the Results sections and so this type of evidence is not often going to be of use for sentences found in other sections of an article.</Paragraph>
      <Paragraph position="5"> Similarly, events can be described within the gure legends themselves. The concise and simple way in which legends are generally written mean that the explicit stage is commonly referred to, and so stage can be resolved using this referenced information(43 out of 47 cases, i.e. 91%).</Paragraph>
      <Paragraph position="6">  Irrelevant indicates that the event being described is not time critical, i.e. event is a constant over developmental timeline, or end result. Prior knowledge means temporal information other than that found in the current paragraph but associated with current event such as tissue and process is required for temporal resolution. This may be found in the current article or from previously curated information (assuming accurate terminology mapping.) Text from outside the current paragraph cannot be relied upon to be relevant to the current sentence without additional information. time not resolved means the stage could not be pinpointed using the gure legend. not relevant indicates that although an explicit stage was referred to within the sentence, this was not relevant to the event being described, e.g. event and stage in different clauses of the sentence.</Paragraph>
      <Paragraph position="7"> Table 2 shows a similar table to Table 1, but deals only with those sentences found within gure legends. It shows where within the gure legend the required evidence for developmental stage can be found. As can be seen, in 80% of these cases the relevant developmental stage can be ascertained directly from the legend. It should be noted that gure legends in biological articles tend to be much lengthier than those from NLP articles.</Paragraph>
      <Paragraph position="8"> In 21% of the event sentences, a speci c developmental stage is not relevant to the fact being described ( rst row of Table 1), e.g. the kidneys of the double mutants were located more caudal and medial than normal. This sentence is describing an end result, i.e. an affected or normal kidney at birth (although this could, of course, be considered a developmental stage.) Alternatively, the time-irrelevant event being described could be a non-event, e.g. the fact that a gene is never expressed in a particular tissue. Similarly, this could be considered as the developmental stage range from conception to birth.</Paragraph>
      <Paragraph position="9"> The signi cantly small proportion of event sentences located in Abstracts (24 of 314 total event sentences, less than 8%) demonstrates the need to use full text. Even where an event is described within an Abstract, it is rarely accompanied by associated processes or tissues speci c enough to suggest the stage of development never mind an explicit timestamp, as it is, by necessity, only generally describing the whole article. The majority of BioNLP work is being done with the use of Abstracts only. This is because of their relative ease of access compared with full text, but methods developed using Abstracts only will not necessarily be as effective when applied to full text.</Paragraph>
      <Paragraph position="10"> As can be seen, the majority of temporallyunderspeci ed event sentences are situated in the Results section of the articles. Indeed, this is the section where most event sentences are to be found. This work is initially focussing on event descriptions found in Results sections of articles as these will focus on the work done by the authors and their ndings and will not generally include modality in the event descriptions as Introduction and Discussion sections might. As shown above, the Methods section rarely contains event descriptions and when they do they are usually about what the experiment aims to show and so this should be repeated in the Results section.</Paragraph>
    </Section>
    <Section position="2" start_page="50" end_page="51" type="sub_section">
      <SectionTitle>
4.2 Prior Knowledge
</SectionTitle>
      <Paragraph position="0"> As mentioned earlier, if none of the above sources reveal the relevant stage of an event, then other elements within the sentence, such as tissue or process, need to be looked at so that prior knowledge  gure legends. Rows as in Table 1, with Current Paragraph being equal to the whole of the legend. about those elements can be exploited for developmental stage to be resolved. For example, given the sentence Prior to formation of the ureteric bud, no a8 expression was evident within the mesenchyme that separates the urogenital ridge from the metanephric mesenchyme and within the metanephric mesenchyme itself.</Paragraph>
      <Paragraph position="1"> the developmental stage can be resolved if we know when the ureteric bud forms (TS17/E10.5).</Paragraph>
      <Paragraph position="2"> It could also be the case that the other tissues or processes mentioned have a speci c lifetime within development and these could help to further pinpoint the timeline involved for the lack of a8 expression. For example, Pax2 was initiating in the metanephric mesenchyme undergoing induction.</Paragraph>
      <Paragraph position="3"> It is not so straightforward to assign a stage here, since the mesenchyme is constantly being induced from E11 (TS18) until birth (TS26), but we have at least discounted E1-E10 (TS1-TS17) as relevant stages.</Paragraph>
      <Paragraph position="4"> Resources such as the Mouse Atlas Nomenclature (MAN) (Ringwald et al., 1994) will provide the initial prior knowledge in order to resolve developmental stage of events. This describes the different stages of development and the tissues in evidence at each stage, giving what is known as the abstract mouse. From this abstract mouse, we can ascertain the normal stage ranges where tissues exist and use this knowledge for temporal resolution, taking care not to assume that tissues do not necessarily exist within the same stage range in mutant mice than in wild-type. The prior knowledge databank can be recursively added to with facts from events already extracted from papers for use in further event extraction and their anchoring in time.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="51" end_page="52" type="metho">
    <SectionTitle>
5 Future Work
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="51" end_page="52" type="sub_section">
      <SectionTitle>
5.1 Term Normalisation
</SectionTitle>
      <Paragraph position="0"> There is no point extracting events descriptions if we cannot relate the events and their elements to each other. The event-denoting expressions identi ed need to be normalised so that it can be recognised when two terms are referring to the same element. null Inconsistent terminology in the biomedical eld is a known problem (Sinclair et al., 2002). One gene can have several names (synonymy) just as the same name can be used for more than one gene (homonymy). Very often the synonyms bear no relation to one another since they were perhaps concurrently discovered in different laboratories and named. For example, the gene insomnia can also be known as cheap date, since experiments found that organisms without this gene have a tendency to fall asleep and are particularly susceptible to alcohol. The same anatomical part can also be referred to by different terms, e.g. the Wolffian duct is also known as the nephric duct, and the metanephros is another name for the kidney. There is also a lineage issue, where a tissue with one name (or perhaps more) develops into something with another name (e.g. the intermediate mesoderm gives rise to both the Wolffian duct and the metanephric mesenchyme which in turn both develop into the metanephros. The MAN includes this type of information.</Paragraph>
      <Paragraph position="1"> Term normalisation is particularly important for the process and tissue elements. If these terms are not normalised, temporal knowledge about the terms may not be exploited and it may not be determined that events involving them are linked.</Paragraph>
    </Section>
    <Section position="2" start_page="52" end_page="52" type="sub_section">
      <SectionTitle>
5.2 Event Elements
</SectionTitle>
      <Paragraph position="0"> If the elements required to fully describe an event are explicitly stated within a simple sentence, then temporal grounding will be straightforward. However, this is unlikely to often be the case. More complex sentences will dictate the need for dependency relations to be determined so that each event's elements can be identi ed. Methods for dealing with missing or underspeci ed elements that are not resolved within the event description itself will be investigated.</Paragraph>
      <Paragraph position="1"> A naive approach will rst be investigated to ll these gaps: nd the closest appropriate element in the previous context (varying the size of the window for how far back to look, such as current paragraph or last 3 sentences). An error analysis on this simple method will help to guide the amount of further work necessary to achieve equal success across all elements. For those elements that this method is ineffective, other methods will be developed incorporating features such as sensitivity to syntax, event type and location within article. Similarly, it will be established whether different techniques are required for missing information than for underspeci ed information. They will rst be treated in the same manner with analysis determining whether they should be treated differently.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML