Title: Program of Study Dan Smith
1Program of Study Dan Smith
2Undergraduate Classes
3Graduate Classes
- Core Curriculum Finished
- MCB 511 Research Perspectives
- MCB 525 Techniques in Molecular and Cellular
Biology - MCB 553 Structure and Function of Eukaryotic
Cells - MCB 554 Genome Organization, Structure, and
Maintenance - MCB 555 Genome Expression and Regulation
- MCB 556 Cell Signaling and Development
- MCB 557 Scientific Skills and Ethics
- MCB 610 Internship (rotations)
- Electives Taken So Far
- BB 650 Protein Evolution
- MB 534 Virology
4Thesis
- Chapter 1 Prediction of Transcription Factor
Binding Sites - Chapters 2 3 Undecided
- A few ideas
5Chapter 1 Regulons
- Goal
- To identify novel transcription factor binding
sites in silico, then - Verify predictions with wet-lab experiments.
- Why
- A common binding site may imply a common
expression pattern, allowing - Metabolic modeling
- Insight into protein coordination.
6Chapter 1 Regulons
- Site Criteria
- Within 300nt upstream of a gene.
- Common to at least three genes.
- Similarly conserved in environmental fragments.
- Example
C134_0635 _at_-16 AGAAC(AAAACA)AGAAC SOS txn
repressor 0634
umuC C134_0921 _at_-40 AGAAC(ACTTGT)AGAAC
repressor LexA C134_1160 _at_-68
AGAAC(ATTTAT)AGAAC phosphomannomutase
7Chapter 1 Regulons
- Wet-Lab Experimental Validation
Likely Transcription Factor, with added His-Tag,
is bound to column
8Chapter 1 Regulons
- Wet-Lab Experimental Validation
DNA with predicted binding sites is added, then
washed.
9Chapter 1 Regulons
- Wet-Lab Experimental Validation
Bound DNA is analyzed for length, which indicates
the binding site on it.
10Chapters 2 3
- Some work already done on assembling
environmental fragments. - Longest contig so far is 19,214 nt
- combined 357 fragments 18x coverage
- Similar to Extreme Assembly method by Rusch,
Venter, et al. (PLoS, Mar 07) - tBLASTx (me) vs. Celera nucleotide (them)
- Mate-Pairs (me) vs. No Mate-Pairs (them)
- They got contigs up to 900,000 nt long.
11Chapters 2 3
12(No Transcript)
13Supplementary Slides
14How Predictions are Made
- Align environmental fragments of the upstream
region for each SAR11 gene.
SAR11 Genome
Gene of Interest
1
tBLASTn (1e-50)
BLASTn (1e-5)
Environmental Fragment
2
Add to alignment
3
15How Predictions are Made
- Make a position-specific scoring matrix (PSSM)
Seq1 T G A A C A G A A C C Seq2 A
G A A C T G A A C C Seq3 A G A A C
G G A A C T Seq4 T G A A C A G A
A C C Seq5 C G A A C A G A A C T
1
A .4 0 1 1 0 .6 0 1 1 0 0 T .4
0 0 0 0 .2 0 0 0 0 .4 C .2 0 0 0 1
0 0 0 0 1 .6 G 0 1 0 0 0 .2 1 0
0 0 0
2
3
16How Predictions are Made
- Align PSSMs from different genes (all vs. all)
A .2 0 1 1 0 .2 0 1 1 0 0 T 0 0 0
0 0 .2 0 0 0 0 .4 C .2 0 0 0 1 0 0 0
0 1 .4 G .6 1 0 0 0 .6 1 0 0 0 .2 ?
.6 0 0 0 0 .4 0 0 0 0 .2
.4 0 1 1 0 .6 0 1 1 0 0 .4 0 0 0 0
.2 0 0 0 0 .4 .2 0 0 0 1 0 0 0 0 1
.6 0 1 0 0 0 .2 1 0 0 0 0
1
No difference between PSSMs for these 8
nucleotide positions.
2
3
17(No Transcript)