Spaghetti Code, Soupy Logic adventures in gene expression - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Spaghetti Code, Soupy Logic adventures in gene expression

Description:

Human genome lacks documentation, has accumulated 3 billion ... Research Interests to Role in Genome Project. Assembly War Story ... CPUs running Linux ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 49
Provided by: jimk88
Category:

less

Transcript and Presenter's Notes

Title: Spaghetti Code, Soupy Logic adventures in gene expression


1
Spaghetti Code, Soupy Logicadventures in gene
expression genome annotation
Jim Kent University of California Santa
Cruz
2
A Challenge Every Speaker Faces
  • Who is the audience?
  • Bioinformaticians
  • Biologists with bigger, better databases?
  • Geeks trading bits for bases?
  • Leading edge interdisciplinary super scientists?

3
Top 5 Reasons Biologists Go Into Bioinformatics
  • 5 - Microscopes and biochemistry are so 20th
    century.
  • 4 - Got started purifying proteins, but it turns
    out the cold room is really COLD.
  • 3 - After 23 years of school wanted to make MORE
    than 23,000/year in a postdoc.
  • 2 - Like to swear, _at_ttracted to _ Perl !!
  • 1 - Getting carpel tunnel from pipetting

4
Top 5 Reasons Computer People go into
Bioinformatics
  • 5 - Bio courses have some females.
  • 4 - Human genome stabler than Windows XP
  • 3 - Having mastered binary trees, quad trees, and
    parse trees ready for phylogenic trees.
  • 2 - Missing heady froth of the internet bubble.
  • 1 - Must augment humanity to defeat evil
    artificial intelligent robots.

5
The Paradox of Genomics
How does a long, static, one dimensional string
of DNA turn into the remarkably complex, dynamic,
and three dimensional human body?
GTTTGCCATCTTTTGCTGCTCTAGGGAATCCAGCAGCTGTCACCATG
TAAACAAGCCCAGGCTAGACCAGTTACCCTCATCATCTTAGCTGATA
GCCAGCCAGCCACCACAGGCATGAGT
6
Models and Metaphors
  • When trying to understand something we like to
    build up metaphors and models.
  • Computer programs are complex systems that
    ultimately are built up of 0s and 1s, perhaps
    they are a model for a genome built of A,C,G and
    T?
  • Human genome lacks documentation, has accumulated
    3 billion years of cruft, and does not believe in
    local variables.
  • Therefore we must look to less than
    straightforward software programs as guides.

7
Bioperl CORBA module
sub new my ( class, _at_args) _at__ my
self class-gtSUPERnew(_at_args) my ( idl,
ior, orbname ) self-gt_rearrange( qw(IDL
IOR ORBNAME),
_at_args) self-gt'_ior'
ior 'biocorba.ior' self-gt'_idl'
idl ENVBIOCORBAIDL 'biocorba.idl'
self-gt'_orbname' orbname
'orbit-local-orb' CORBAORBitIDL_PATH
self-gt'_idl' my orb CORBAORB_init(or
bname) my root_poa orb-gtresolve_initial_r
eferences("RootPOA") self-gt'_orb'
orb self-gt'_rootpoa' root_poa
return self
8
Obfuscated C
define c(n,s)case nscontinue char
x"((((((((((((((((((((((",w "\b\b\b\b\b\b\b\
b\b\b\b\b\b\b\b\b\b\b\b\b\b\b"char
r92,124,47,l2,3,1 ,0charT" ","
","\\/"," ",""char d1,p40,o40,k0,a,y
,z,g -1,G,X,PT4,f0unsigned int s0void
u(int i)int nprintf( "\233uH\233Lc\233uHc\
233uHs\23322uH_at_\23323uH \n",x-w,rd,x
w ,rd,X,P,pk,o)if(abs(p-x21)gtw21)exit(0
)if(g!G)struct itimerval t 0,0,0,0g((gltG)
ltlt1)-1t.it_interval.tv_usect.it_value.tv_usec72
000/((ggtgt 3)1)setitimer(0,t,0)fprintf("\e10
u",g24)fputchar(7)s(9-w21 )((ggtgt3)1
)opm(x)m(w)(nrand())255--wwif(!(
PPn7936)) while(abs((Xrand()76)-x2)-w
lt6)XPT(nrand()31)lt3(dn)!d--xlt w
(x,d)d2xwgt79(--x,--d)signal(i
,u)void e()signal(14, SIG_IGN)printf("\e0q\ec
Score u\n",s)system("stty echo -cbreak")int
main (int C,charV)atexit(e)(Clt2V1!113)
(f(C(int)getenv("TERM"))( int)0x756E696CC
(int)0x6C696E75)srand(getpid())system("stty
-echo cbreak" )h(0)u(14)for()switch(getchar()
)case 113return 0case 91case
98c(44,k -1)case 32case 110c(46,k0)case
93case 109c(47,k1)c(49,h(0))c(50,h(1 ))c(51,
h(2))c(52,h(3))
9
Microsoft Windows
mouse
blue screen of death
Windows XP
keyboard
network
elaborate proprietary process
10
Looks like metaphor not enough, must study actual
cells DNA
11
How DNA is Used by the Cell
12
Promoter Tells Where to Begin
Different promoters activate different genes
in different parts of the body.
13
A Computer in Soup
Idealized promoter for a gene involved in making
hair. Proteins that bind to specific DNA
sequences in the promoter region together turn a
gene on or off. These proteins are themselves
regulated by their own promoters leading to a
gene regulatory network with many of the same
properties as a neural network.
14
Genes can be transcription factors that
activate or repress other genes, leading to
regulatory networks such as this one from the
development of the central nervous system. (Image
from DHaeseleer Somogyi 1999)
15
The Decisions of a Cell
  • When to reproduce?
  • When to migrate and where?
  • What to differentiate into?
  • When to secrete something?
  • When to make an electrical signal?

The more rapid decisions usually are via the cell
membrane and 2nd messengers. The longer acting
decisions are usually made in the nucleus.
16
Nucleus Used to Appear Simple
  • Cheek cells stained with basic dyes. Nuclei are
    readily visible.

17
Mammalian Nuclei Stained in Various Ways
Image from Tom Misteli lab
18
Artists rendition of nucleus
Image from nuclear protein database
19
Chromatin
20
Turning on a gene
  • Getting DNA into the right compartment of the
    nucleus (may involve very diffuse signals in DNA
    over very long distances)
  • Loosening up chromatin structure (this involves
    activator and repressors which can act over
    relatively long distances)
  • Attracting RNA Polymerase II to the transcription
    start site (these involve relatively close
    factors both upstream and downstream of
    transcription start).

21
Methods for Studying Transcription
  • Genetics in model organisms
  • Promoters hooked to reporter genes
  • Gel shifts and DNAse footprinting.
  • Phylogenic footprinting
  • Motif searches in clusters of coregulated genes.

22
Drosophila Genetics
antennapediamutant
normal
23
Reporter Gene Constructs
promoter to study
easily seen gene
Drosophila embryo transfected with ftz promoter
hookedup to lacz reporter gene, creating stripes
where ftz promoteris active.
24
Biochemical Footprinting Assays
Gel showing selective protection of DNA from
nuclease digestion where transcription factor is
bound.
Txn factorfootprint
25
Pseudogenes
26
Creative Chaos Genome
27
Finding Transcription Start
28
Phylogenic Footprinting
29
Mouse Paints Some Promoters
RefSeq
Spliced EST
Mouse
Fish
Repeat
Crystallin - a gene expressed in the eye. Coding
regions are very similar to crystallins in the
liver, but the promoter is different.
30
Normalized eScores
31
Mouse/Human Chrom 7 Synteny
32
Motifs in Coregulated Genes
33
Conservation Levels of Regulatory Regions
34
Transition from Private Research Interests to
Role in Genome Project
35
Assembly War Story
36
Building a Better Browser
37
Pretty Adventurous Programming
38
Genome BrowserBLATGene SorterTable
BrowserService Organization
39
Parasol and Kilo Cluster
  • UCSC cluster has 1000 CPUs running Linux
  • 1,000,000 BLASTZ jobs in 25 hours for mouse/human
    alignment
  • We wrote Parasol job scheduler to keep up.
  • Very fast and free.
  • Jobs are organized into batches.
  • Error checking at job and at batch level.

40
Acknowledgements
  • Individuals
  • Institutions

NHGRI, The Wellcome Trust, HHMI, Taxpayers in the
US and worldwide. Whitehead, Sanger, Wash U,
Baylor, Stanford, DOE, and the international
sequencing centers. NCBI, Ensembl, Genoscope,
The SNP Consortium, UCSC, Softberry, Affymetrix.
David Haussler, Chuck Sugnet Francis Collins,
Bob Waterston, Eric Lander, John Sulston, Richard
Gibbs Lincoln Stein, Sean Eddy, Olivier
Jaillon, David Kulp, Victor Solovyev, Ewan
Birney, Greg Schuler, Deanna Church, Asif
Chinwalla, Kim Worley, the Gene Cats. Everyone
else!
41
THE END
42
Coloring CRYGD Start
gctcgttcaggggtaaaggtgtattctagatCCACAACAAGCCCCGTGGT
CTAGCACAGC AAAGAGAAAAAAAGAGAACACGAAAATGCCCTTGCTCCC
CTCCGGGGGCCCCTTTTGTGC GGTTCTTGCCAACGCAGCAGCCCTCCTG
CTATATAGCCCGCCGCGCCgCAGCCCCACCCG
CTCAGCGCCGCCGCCCCACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCC
ATGGGGAAGG TGAGCCCAGCCTGCGCCCCGGGACCCCGGAGCTTCCTCC
ATCGCGGGGGCCAGAGACTGG GGCAGGAGCAGGCCTGTGAGACCTCGCC
TTGTCCCGCCTTGCCTTGCAGATCACCCTCTA
CGAGGACCGGGGCTTCCAGGGCCGCCACTATGAATGCAGCAGCGACCACC
CCAACCTGCA GCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGACAG
CGGCTGCTGGATGCTCTATGA GCAGCCCAACTACTCGGGCCTCCAGTAC
TTCCTGCGCCGCGGCGACTATGCCGACCACCA
GCAGTGGATGGGCCTCAGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCC
ACGTGAGTAC ATCCTCAAGTCAGGACCCAGGCCCTCAGGACACTCACTG
GAtgGTTTCAAGCAAAAGTTA AACATTAGAAGTAGTGATCAGTcacaat
aaCTGAGAGTGGACAAAAGATGAACTATAGTG
GATTAAGTCAATAGagttTGCTCCCCACATAAGCAAAGTATTACCCAGAC
AcCAGTTAAT caCAATTAATCCACAAATATGTATTGAGTAGGAATGTGT
CTCCTGCCctAGGGGTTGTAT
43
Trends in Society Biology
50s Cars are good Mitochondria and metabolism
60s Recording DNA as recording media of genes
70s Birth control Working out the cell cycle
80s Yuppies Start of serious genetic engineering
90s Microsoft rules Incyte, Celera race to patent genome
2000s
44
(No Transcript)
45
(The NEED for Bioinformatics)
  • 200 million bases of DNA are sequenced every
    day.
  • Not much use without assembly.
  • Protein and non-sequence data also being
    generated at a prodigious rate.
  • How to store it and find the parts you want?
  • Making models that are simple enough to
    understand, but rich enough to reflect the
    biology.

46
(My Road to a Bio PhD)
  • Liked bio, but too many prerequisites!
  • Had fun doing graphics/animation programming in
    80s early 90s.
  • Bored of endlessly shifting Microsoft APIs
  • Community college, UC extension to get bio BA
    equivalent in 97 98.
  • UC Santa Cruz bio grad school 1999
  • Interested in developmental biology and how a
    cell makes decisions.

47
Perhaps Must Study Actual Cells
48
Spaghetti Code or Soupy Logic
Steaming fresh modules in sourceforge.net
Combinatorical assembly of transcription factors
in cell.
Write a Comment
User Comments (0)
About PowerShow.com