Transcriptional and post-transcriptional regulation of gene expression - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Transcriptional and post-transcriptional regulation of gene expression

Description:

Transcriptional and post-transcriptional regulation of gene expression protein Translation Localization Stability mRNA 3 UTR Pol II DNA Activation Repression – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 52
Provided by: Olivi91
Category:

less

Transcript and Presenter's Notes

Title: Transcriptional and post-transcriptional regulation of gene expression


1
Transcriptional and post-transcriptional
regulation of gene expression
protein
Translation Localization Stability
mRNA
3UTR
Pol II
DNA
Activation Repression
2
  • Where does each transcription factor bind in the
    genome, in each cell type, at a given time ? Near
    which genes ?
  • What is the cis-regulatory code of each factor ?
    Does they require any co-factors ?

DNA
Activation Repression
3
ChIP-seq
Transcription factor of interest
Antibody
Genome Analyzer II (Solexa)
4
Control input DNA
Genome Analyzer II (Solexa)
5
ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGA
CGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGATTAGTGAAT
TC
TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCT
GCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTAATCACTTA
AG
Average length 250bp
6
25-40bp
ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGA
CGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGATTAGTGAAT
TC
TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCT
GCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTAATCACTTA
AG
Average length 250bp
7
25-40bp
ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGA
CGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGATTAGTGAAT
TC
TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCT
GCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTAATCACTTA
AG
Average length 250bp
8
BCL6 ChIP-seq
  • Lymphoma cell line (OCI-Ly1)
  • Solexa/Illumina
  • 6 lanes for ChIP, 1 for input DNA, 1 for QC
  • 36nt long sequences
  • 32 Million reads
  • Aligned/mapped to hg18 with Eland

Melnick lab at WCMC
9
Read mapping with Eland
Solexa Read
AAAATACGCGTATTCTCCCAAAACAATATC
AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATC
TTACAAGATGTAAATATACCCAAGATG
Reference Human Genome (hg18)
10
Read mapping with Eland
Solexa Read
AAAATACGCCTATTCTCCCAAAACAATATC
AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATC
TTACAAGATGTAAATATACCCAAGATG
Reference Human Genome (hg18)
11
Read mapping with Eland
Solexa Read
AAAATACGCCTATTCTCCCATAACAATATC
AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATC
TTACAAGATGTAAATATACCCAAGATG
Reference Human Genome (hg18)
12
Reads can map to multiple locations/chromosomes
Solexa Read 1
Solexa Read 2
Reference Human Genome (hg18)
13
Reads map to one strand or the other
Solexa Read 1
Solexa Read 2
hg18
14
gtHWI-EAS83_30UCEAAXX129151011 AGGTCACAAAACAAGT
CCTAACAAATTTAAGAGTAT U0 1 13 62 chr8.fa 59699745 R
DD gtHWI-EAS83_30UCEAAXX128261245 GTCAGAAAAATC
CTTTTTATTATATAAACAATACAT U2 0 0 1 chr5.fa 12119509
8 F DD 15G 20G gtHWI-EAS83_30UCEAAXX12900945 G
TCATCAAACTCCAAGGATTCTGTTTTCAACATACT U0 1 1 0 chr18
.fa 8914049 R DD gtHWI-EAS83_30UCEAAXX121037111
8 GAAAGTGATTAGCAGATTGTCATTTAATAATTGTCT U2 0 0 1 ch
r1.fa 97496963 F DD 18G 28G gtHWI-EAS83_30UCEAAXX1
2898874 GATAAATTTTTTCCTACAATCTTAAATTATTACACA U
1 0 1 0 chr3.fa 95643444 R DD 10C gtHWI-EAS83_30UCE
AAXX12918928 AAAAATTAAACAATTCTAAAAATATTTTTATC
TTAA U2 0 0 1 chr2.fa 177727639 R DD 18C 31G gtHWI-
EAS83_30UCEAAXX1213244 GCACATGTCATACTCTTTCTAG
CTCTCTTATTTTTC U0 1 0 0 chr8.fa 79132719 R DD gtHWI
-EAS83_30UCEAAXX128991015 AAATTAATGTAAAAAATAGG
ATACTGAATTGTGATA U1 0 1 0 chr10.fa 69774166 F DD 3
0G gtHWI-EAS83_30UCEAAXX12909926 GTAGTTAACAATA
ATTTATTTTATACTTCAAAATTC U1 0 1 17 chrX.fa 26496842
R DD 7A gtHWI-EAS83_30UCEAAXX127011702 GTCAGAA
TTAATTAATCAAAACACCAAATGTACTTC U0 1 0 0 chr12.fa 72
700465 F DD gtHWI-EAS83_30UCEAAXX129961003 ATTT
TGACTTTATTATTTTTTCTTCAATGTTTTTAA NM 0 0 0 gtHWI-EAS
83_30UCEAAXX128841090 GAAAGTACATCAAATACATATTAT
ATACTTTACATA R2 0 0 2 gtHWI-EAS83_30UCEAAXX12911
937 AATCCATATACATTTCTTTTTAATCATTTCCTCTTT U1 0 1
0 chr11.fa 94204222 F DD 20G gtHWI-EAS83_30UCEAAXX
121517330 GTGAGTTTCTTAATCCTGAGTTCTAATTTTATTTCA
R0 29 255 255 gtHWI-EAS83_30UCEAAXX129041031 AC
ATTTTATAAATTTTTAATTTCATTTTAATTTATA NM 0 0 0 gtHWI-E
AS83_30UCEAAXX1212911469 GTTTTTAAAATCAACACTTTT
ATTATAGAAGTAGCA U0 1 0 1 chr12.fa 62166701 R DD gtH
WI-EAS83_30UCEAAXX121697828 GTACTGATGTAAACTTGG
TAAAAACATTGACATAAA U0 1 0 0 chr14.fa 65160857 F DD
gtHWI-EAS83_30UCEAAXX121415583 GAAGAAAATGACTAT
GTCAAAATATTATCTCTCAAT U0 1 0 0 chr5.fa 97782464 F
DD gtHWI-EAS83_30UCEAAXX1215611653 GTTTTACTGATT
TTCTTACTTACTAAACTACCTGTT U0 1 0 0 chr7.fa 13320026
5 F DD gtHWI-EAS83_30UCEAAXX121579943 AATGATACG
GCGACCACCGACAGGTTCAGAGTTCTA NM 0 0 0 gtHWI-EAS83_30
UCEAAXX121705268 GAGAATTATTCAGAAGTCAAATCTGTGCT
TAGTTTA U2 0 0 1 chr5.fa 162472124 R DD 3G 7C gtHWI
-EAS83_30UCEAAXX121489318 GTATGTATCATATATATTTA
TGTATCATATATATTT R1 0 3 2 gtHWI-EAS83_30UCEAAXX12
10031113 GATTGCTCCATTATTTGTTAAAAACATAGTAAAATA NM
0 0 0 gtHWI-EAS83_30UCEAAXX128951072 ATGAGATCA
GTACTTCAAAGAGATATCTGCACTCCC U0 1 1 9 chr12.fa 3383
0898 R DD gtHWI-EAS83_30UCEAAXX128531178 GTTAGT
CCCAATATTCCATTAATCCCAATAAATATA U2 0 0 1 chr6.fa 11
0722427 F DD 15G 19G gtHWI-EAS83_30UCEAAXX121432
972 GAGATAATAATAGCAGTTATGGCATCGAGATAATTT U0 1 0 0
chr2.fa 47305609 R DD gtHWI-EAS83_30UCEAAXX1217
18341 GTAGAGGGCACACATCACAAACAAGTTTCTGAGAAT R2 0 0
3 gtHWI-EAS83_30UCEAAXX121171302 GAATATCCACTTG
CAGACTTTACAAACAAATTTTTT R2 0 0 4 gtHWI-EAS83_30UCEA
AXX1210551126 GGCAGATGAAACTTCTATACACTATATTTTAG
CCAG U0 1 0 0 chr13.fa 90021137 F DD gtHWI-EAS83_30
UCEAAXX129711371 GAAAGAAAAACTATTGAAAAAATAGTTAC
TTTCCAA U0 1 0 0 chr1.fa 74303257 R DD gtHWI-EAS83_
30UCEAAXX121774614 GTGTAGATGATATCGAGGGCATTAGAA
GTAAATAGC U0 1 0 0 chr5.fa 16031200 F DD gtHWI-EAS8
3_30UCEAAXX121207808 GAGAGGAAATAATAAAGATAAAAGT
AGAAAAAGTGA U0 1 0 0 chr1.fa 187326417 F DD gtHWI-E
AS83_30UCEAAXX121680815 GATAATTATGTTGTTGTAATTA
TTGTTTGTTTTTTT U0 1 0 0 chr15.fa 46739015 R DD gtHW
I-EAS83_30UCEAAXX121688260 GTTGACAATCCAGCTGTCA
TAGAAACTGACTATTTT U0 1 0 0 chr12.fa 38910133 R DD
gtHWI-EAS83_30UCEAAXX121051916 AAAAATTCTCCCAAAA
CAACAAGATGTAAATATACC U0 1 0 0 chr3.fa 101625712 R
DD gtHWI-EAS83_30UCEAAXX121771308 GTTCTTACACTGA
TATGAAGAAATACCTGAGACTGG U0 1 2 67 chr2.fa 21412853
7 R DD gtHWI-EAS83_30UCEAAXX12911917 GAGAAACAC
ACATATTTTTGTAAGTGCCATCACATC U1 0 1 0 chr7.fa 13668
652 R DD 18C gtHWI-EAS83_30UCEAAXX121105348 GTA
TTATCTAACACACAAGATGATGTTTGTTTTTAT NM 0 0 0 gtHWI-EA
S83_30UCEAAXX121048857 GAGTGTAGAAAATTTTCTGCCCT
AAAATATTTGTTA U1 0 1 0 chr6.fa 74625385 F DD 13G gt
HWI-EAS83_30UCEAAXX127431729 GTATCCTAAAGTGTATC
TTATGTTTTTTCATCTTCT U1 0 1 0 chr12.fa 7400023 R DD
9C gtHWI-EAS83_30UCEAAXX12128764 AATAAAACAAAT
TCCAATGGCTTAGATTCTACTTAA U2 0 0 1 chr10.fa 9802079
9 R DD 15C 20C gtHWI-EAS83_30UCEAAXX129401059 A
AATGGTCATACTTCCCAAAGCGATCTACAGATTCA U1 0 1 29 chr3
.fa 50834510 R DD 19C gtHWI-EAS83_30UCEAAXX12898
1061 ACATTTCCACATTTCTGTGGAAGCCTCACAATCATT R2 0 0
2 gtHWI-EAS83_30UCEAAXX12913932 ATTAATCAACAGCA
ACATTAATCAACTGAATCAACA U0 1 0 0 chr2.fa 46078825 R
DD gtHWI-EAS83_30UCEAAXX12431647 GAATAAATAATC
AAAACATATAATACATTTTTTTAT U1 0 1 0 chr5.fa 41496935
F DD 32G gtHWI-EAS83_30UCEAAXX121412731 ATATAC
ACATATATATACATATATATATACACATAT R0 47 255 255 gtHWI-
EAS83_30UCEAAXX1213891196 GAGAAGGAAATGTGTTTTCT
AAGTTTCTTTATCTTC U1 0 1 0 chr4.fa 188020201 F DD 3
2G gtHWI-EAS83_30UCEAAXX1212641479 GTGTAGGAAAGA
AAAAAGGAGGTTGTGTAGAAAAGA U0 1 0 0 chr2.fa 19222780
4 F DD gtHWI-EAS83_30UCEAAXX1238890 TTTATTTAAAT
CTTTTAAAAANTTTTTTCCAACAAA NM 0 0 0 gtHWI-EAS83_30UC
EAAXX1213411065 GATACATATACACAAAGTAAAACTATTCAG
CCTCTA U0 1 0 0 chr17.fa 51416321 F DD gtHWI-EAS83_
30UCEAAXX121132929 GAGTTGTATTAATCTTAAATTGATAAT
TTACCATAT U1 0 1 0 chr10.fa 2376138 F DD 24G gtHWI-
EAS83_30UCEAAXX121758275 GCATTTTAACAAAATCACCAT
ATCTGGGTAACCATT U1 0 1 0 chr21.fa 27648337 R DD 18
C gtHWI-EAS83_30UCEAAXX129141000 GAAAGCACTTTATA
ATAAAACAACATTGGAGCACCT U1 0 1 0 chr8.fa 67496303 F
DD 16G
15
Number of reads per Eland type
  • U0 21019702 65
  • U1 3280059 10
  • U2 1007173 3
  • R0 3661054 11
  • R1 815275 2
  • R2 406002 1
  • NM 2050499 6
  • QC 306352 1

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Peak detection
  • Calculate read count at each position (bp) in
    genome
  • Determine if read count is greater than expected

20
Peak detection
  • We need to correct for input DNA reads (control)
  • - non-uniformaly distributed (form peaks too)
  • - vastly different numbers of reads
    between ChIP and input

21
Peak detection using ChIPseeqer
22
genome
genome
T A T T A A T T A T C
C C C A T A T A T G A T A T
Expected read count total number of reads
extended fragment length / chr length
23
Is the observed read count at a given genomic
position greater than expected ?
Frequency
x observed read count ? expected read count
Read count
The Poisson distribution
24
Is the observed read count at a given genomic
position greater than expected ?
x 10 reads (observed) ? 0.5 reads (expected)
genome
P(Xgt10) 1.7 x 10-10
log10 P(Xgt10) -9.77
-log10 P(Xgt10) 9.77
The Poisson distribution
25
Read count
Expected read count
-Log(p)
Expected read count total number of reads
extended frag len / chr len
26
Read count
Expected read count
-Log(p)
Expected read count total number of reads
extended frag len / chr len
Input reads
27
INPUT
ChIP
Read count
Read count
Expected read count
Expected read count
Genome positions (bp)
Genome positions (bp)
-Log(Pc)
-Log(Pi)
Threshold
Log(Pc) - Log(Pi)
28
Normalized Peak score (at each bp)
P(XChIP)
R -log10
P(Xinput)
Will detect peaks with high read counts in ChIP,
low in Input Works when no input DNA !
29
Non-mappable fraction of the genome
We enumerated all 30-mers, counted occurrences,
calculated non-unique fraction of genome
  • chr18 9369067/76117153 0.123087459668913 (12)
  • chr2 33849240/242951149 0.139325292921335
  • chr3 27854877/199501827 0.139622164963933
  • chr4 27090014/191273063 0.141630052737745
  • chr6 24330283/170899992 0.142365618132972
  • chr8 20932821/146274826 0.143106107677065
  • chr5 26029902/180857866 0.143924633059643
  • chr12 19382853/132349534 0.14645199279659
  • chr11 20039443/134452384 0.149044906485258
  • chr20 10017788/62435964 0.160449000194824
  • chr7 26182588/158821424 0.164855517225434
  • chr10 22968951/135374737 0.169669404417753
  • chr17 14496284/78774742 0.184021980040252
  • chrX 31269270/154913754 0.201849540099583
  • chr1 55186693/247249719 0.223202247602959
  • chr13 28668063/114142980 0.251159230291692
  • chr16 23552340/88827254 0.265147676410215
  • chr14 29689825/106368585 0.279122120502026
  • chrM 4628/16571 0.279283084907368

30
Peak detection
  • Determine all genomic regions with Rgt15
  • Merge peaks separated by less than 100bp
  • Output all peaks with length gt 100bp
  • Process 23M reads in lt7mins

31
BCL6 18,814 peaks
ChIP reads
Input reads
Detected Peaks
80 are within lt20kb of a known gene
32
  • Where does each transcription factor bind in the
    genome, in each cell type, at a given time ? Near
    which genes ?
  • What is the cis-regulatory code of each factor ?
    Does they require any co-factors ?

DNA
Activation Repression
33
Regulatory Sequence Discovery using FIRE
34
Discovering regulatory sequences associated with
peak regions
True TF binding peak?
Yes
Yes
Target regions
Yes
Yes
Yes
Yes

35
Motif Search Algorithm
36
Optimizing k-mers into more informative
degenerate motifs
True TF binding peak?
ATCCGTACA
Yes
Yes
Target regions
Yes
Yes
Yes
Yes

ATCCC/GTACA
which character increases the mutual information
by the largest amount ?
37
Optimizing k-mers into more informative
degenerate motifs
True TF binding peak?
Yes
Yes
Target regions
Yes
Yes
Yes
Yes

ATCCC/GTACA
. . .
38
Mutual information
change
Similarity to ChIP-chip RAP1 motif
Motif Conservation with S. bayanus
39
Motifs optimized so far
k-mer MI CTCATCG 0.0618 TCATCGC
0.0485 AAAATTT 0.0438 GCTCATC 0.0434 AAAAATT
0.0383 ATGAGCT 0.0334 TTGCCAC 0.0322 TGCCACC
0.0298 ATCTCAT 0.0265 ...
MI0.081
Highly informative k-mers
MI0.045
optimize ?
Only optimize k-mer if I(k-merexpression
motif) is large enough (for all motifs optimized
so far)
Conditional mutual information I(XYZ)
40
Motif co-occurrence anallysis
Discovered Motifs
Enrichment
Depletion
FIRE automatically compares discovered motifs to
known motifs in TRANSFAC and JASPAR
41
ChIPseeqer an integrated framework for ChIP-seq
data analysis
  • ChIPseeqer (peak detection)
  • ChIPseeqer2Track (for Genome Browser)
  • ChIPseeqer2FIRE ( motif analysis)
  • ChIPseeqer2iPAGE ( pathway analysis)
  • ChIPseeqer2cons (conservation analysis)

42
Installing and setting up programs
  • Install ChIPseeqer and FIRE
  • http//physiology.med.cornell.edu/faculty/elemento
    /lab/chipseq.shtml
  • http//tavazoielab.princeton.edu/FIRE/
  • Execute following commands
  • export FIREDIR/Applications/FIRE-1.1  
  • export PATHPATHFIREDIR  
  • export CHIPSEEQERDIR/Applications/ChIPseeqer-1.0 
     
  • export PATHPATHCHIPSEEQERDIRCHIPSEEQERDIR/SC
    RIPTS
  • chmod x CHIPSEEQERDIR/ChIP
  • chmod x CHIPSEEQERDIR/SCRIPTS/.pl

43
Peak Detection
  • - Input file CTCF.bed
  • cd /Desktop/elemento
  • Or download from
  • http//physiology.med.cornell.edu/faculty/elemento
    /lab/files/chipseq/
  • - 2947043 U0 reads in BED format
  • (check by typing wc l CTCF.bed)
  • (view by typing more CTCF.bed and q to exit)
  • - No input DNA for this experiment

44
Peak Detection
  • Step 1 Split big read file into one file per
    chromosome 
  • split_bed_or_mit_files.pl CTCF.bed  
  • Expected output
  • Opening CTCF.bed
  • Current directory .
  • Creating ./reads.chr1

45
Peak Detection
  • Step 2. Detect peaks  
  • ChIPseeqer --chipdir. --t15 --fraglen250
    --formatbed -outfileCTCF_peaks_t15.txt  
  • Expected output
  • Processing reads in chrY ... done.
  • Processing reads in chrX ... done.
  • Processing reads in chr9 ... done.
  • Processing reads in chr8 ... done.
  • Step 3. Count how many peaks were found
  • wc -l CTCF_peaks_t15.txt

46
Making a Genome Browser track
  • Command lines
  • cd JuliaChild
  • wc l CTCF_peaks_t15.txt
  • ChIPseeqer2track --targetsCTCF_peaks_t15.txt
    --tracknameCTCF peaks
  • Expected output
  • CTCF_peaks_t15.txt.wgl.gz created.
  • To check that the file was created
  • ls

47
Making a Genome Browser track
http//genome.ucsc.edu/cgi-bin/hgGateway
48
Making FIRE input files
  • Command line (type instructions below as one
    single line)
  • ChIPseeqer2FIRE --targetsCTCF_peaks_t15.txt
    genomewg.fa
  • --suffixCTCF_peaks_t15_FIRE
  • wg.fa is also available from
  • http//physiology.med.cornell.edu/faculty/elemento
    /lab/files/chipseq/
  • (decompress with gunzip wg.fa.gz)
  • Expected output
  • Extracting sequences ... Done.
  • Extracting randomly selected sequences ... Done.
  • CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.se
    q have been generated.

49
FIRE analysis
  • Command line (type instructions below as one
    single line)
  • fire.pl --expfileCTCF_peaks_t15_FIRE.txt
  • --fastafile_dnaCTCF_peaks_t15_FIRE.seq
    --nodups1 --minr2
  • --specieshuman --dorna0 --dodnarna0
  • Expected output
  • Extracting sequences ... Done.
  • Extracting randomly selected sequences ... Done.
  • CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.se
    q have been generated.

50
FIRE main output file
open CTCF_peaks_t15_FIRE.txt_FIRE/DNA/CTCF_peaks_t
15_FIRE.txt.summary.pdf
Randomly selected sequences
Peak sequences
51
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com