Title: Statistical Tools for Accessing
1Statistical Tools for Accessing Transcriptional
Regulation Christine Steinhoff Computational
Molecular Biology
2Aspects of Regulation
- Conserved blocks in noncoding upstream regions
(regulatory candidates) - Integrative analysis of different datatypes on
different biological levels (genomic sequence
RNA protein ...) - Pathway modelling
- Regulatory aspects of repetitive elements ...
- Regulatory aspects of epigenetic features
- ...
3Outline Examples from Our Group
- Theory/Statistics - Normalization Issues
- - GO category overrepresentation
- - Integrative analysis of different data
types - - ChIP analysis
2. Implementation issues - R/Bioconductor
3. Application and Databases - CORG database
and upstream region analysis
4Normalization of (microarray) data Variance
stabilization
5GO Category Overrepresentation
- subgroup of genes from any kind of biological
experiment - any functional group overrepresented?
- problem shortcomings of approaches, strong
dependencies between the categories - new statistical approach
6Integrative analysis of different data types
patients
Genomic localization
7Integrative analysis of different data types
8ChIP analysis
- Tiling arrays varying affinies from probe to
probe - Unspecific binding establish physical model
(function of stability of DNA complexes) - physical model explains high percentage of
variance in the data
9ChIP analysis
Probes with larger intensity than predicted are
assumed to carry evidence for TF binding
Score of evidence , Which is a
function of Signal intensity Non specific
binding signal Optical background Standard
deviation of all scores in a bin of predicted
intensities containing unspecific binding
ChIP-chip experiments can be used to identify
mammalian in vivo TF binding sites with high
confidence if the probe-specific behavior is
explicitly taken into account.
10Results de novo motif discovery
MEME
Sequence set
Motif set
CLOVER sequences chromosome 21 22
Significantly enriched motifs
Comparison with known motifs
Candidate motifs
11Results de novo motif discovery
P53-DO1 GST control found in 23 of 23
SP1 GST control found in 37 of 37
SP1 INPUTcontrol found in 53 of 53
P53-DO1 INPUT control found in 24 of 24
JASPAR
P53-FL GST control found in 11 of 12
JASPAR
12What is R/Bioconductor ?
- Free software environment for statistical
computing and graphics - UNIX platforms,
- Windows
- MacOS
- http//www.r-project.org/
- Open source and open development software
project for the analysis and comprehension of
genomic data. - started in the Fall of 2001.
- Bioconductor short courses.
- All course materials are available on the WWW
- http//www.bioconductor.org/
13CORG database
- CORG COmparative Regulatory Genomics
- Conservation of non-coding DNA segments across
multiple homologous genomic sequences - Pairwise as well as Multiple alignments based on
the pairwise ones are available.
- Basis for upstream region exploration
- gene structure
- transcriptional start sites
- comparative information
- transcription factor motif annotation
14CORG database
15EuTRACC
Repository -gt Array Express
- Storage
- Management
- Retrieval
- Integrative analysis
- Data mining
- Visualization
- What kind of ChIP chips Design issues? Tiling
Arrays? - Development of R package for unified analysis
normalisation issues/implementation of physical
model - Further statistical issues Overrepresentation
- Differential behavior
- Cobehavior
- integrative approaches expressionChIPreg
ulationcovariates
16Martin Vingron (Head) Abha Singh
Bais (PhD) Ho-Ryun Chung (Postdoc) Szymon M.
Kielbasa (Postdoc) Holger Klein (PhD) Ho-Joon
Lee (PhD) Thomas Manke (Postdoc) Utz
Pape (PhD) Paz Polak (PhD) Hugues
Richard (Postdoc) Marcel Schulz (PhD) Ewa
Szczurek (PhD) Christine Steinhoff (Postdoc) Toma
sz Zemoitel (Postdoc)
Members of the Regulation Group
Thank you for your attention