Overlapping transcription initiation codes and promoter interpretation in vertebrate development and differentiation
Not peer reviewed
MetadataShow full item record
A core promoter is a minimal region sufficient to direct the accurate initiation of transcription. Various core promoter elements have been discovered that recruit and position transcriptional machinery, which then initiates transcription at individual transcription start sites (TSS); however, no universal promoter code has been established. The methods and results presented in this thesis focus on innovative analysis of precise transcription initiation data to reveal sequence and chromatin features underlying core promoters and their dynamic usage in development and differentiation.
Cap analysis of gene expression (CAGE) provides a single base-pair resolution map of TSSs and their relative usage, and it is a powerful tool for studying promoter structure and function. It has led to the discovery of major promoter classes that differ in transcription initiation patterns: “sharp” promoters in which the majority of transcription starts at one clearly dominant TSS, and “broad” promoters with multiple equally used TSS positions distributed along a wider region. By applying CAGE to a developmental time-course of zebrafish (Danio rerio) we created a first comprehensive map of transcription initiation during vertebrate embryogenesis and revealed widespread dynamics in promoter usage at all levels, from alternative promoters to individual TSSs. We found that thousands of promoters are utilized differently by the oocyte and the embryo, uncovering two independent codes that drive dynamic changes in TSS usage and promoter shape. Maternal TSS selection is guided by an A/T-rich W-box motif positioned at a fixed spacing from the TSS producing a sharp promoter architecture, whereas zygotic selection is restricted by the position of the first downstream nucleosome and produces broad promoter architecture with the dominant TSS aligned to inter- and intranucleosomal sequence positioning signals. The two grammars co-exist in close proximity or in physical overlap at promoters genome-wide.
We further showed that a tight association between dominant TSS in broad promoters and nucleosome positioning exists in human and mouse transcription. Alignment of the intranucleosomal dinucleotide frequency patterns downstream of the TSS revealed that a well-positioned +1 nucleosome is a key determinant of TSS preference in broad promoters. Its presence in both zebrafish and mammals suggests the evolutionary conservation of the underlying nucleosome-associated TSS selection mechanism.
Precise TSS localisation is crucial for promoter-centred analyses of any genomewide data. To facilitate the reuse of high-resolution and context-specific TSSs derived from a growing resource of CAGE data, we developed CAGEr, an R/Bioconductor software package for promoterome mining. CAGEr provides easy access to the majority of published CAGE datasets and presents a comprehensive workflow for processing, visualisation and analysis of precise promoter data, and allows its integration with other genome data types.
Taken together, the work presented in this thesis reveals unexpected dynamics in core promoter usage at TSS level and demonstrates that promoter type is not an inherent property of the genomic locus, but is rather dependent on the regulatory context. The existence of overlapping transcription initiation codes has important implications for future analyses of promoter content and function.