Title: BIONF/BENG 203: Functional Genomics
1BIONF/BENG 203Functional Genomics
Sources of Functional DataLectures 1 and 2
- Lecture TI 1
- Trey Ideker
- UCSD Department of Bioengineering
2Grading
- 40 Problem Sets (best 4 of 5)
- 30 Midterm
- 30 Final Project
3Outline of the course
Biological data sources (2)
Data pre-processing (6)
Total of 17 lectures
Project Presentations (2)
4Functional Genomics Data
- Expression
- mRNA, protein
- Molecular interactions
- Protein, mRNA, small molecules
- Knockout phenotypes
- 1st, 2nd, higher orders
- SNP sequence (polymorphism) data
- Imaging data
- Sub-cellular localization
- Cell morphology
- Gene ontology
5Dividing the data into two classes of
informationBiological Networks and Network
States
- Directly observe the network wires themselves
- Protein-protein interactions
- Two-hybrid system, coIP, protein antibody arrays
- BIND, DIP
- Protein-DNA interactions
- Chromatin IP
- BIND, Transfac, SCPD
-
- Other types not yet possible
- e.g., protein-small molecule
- Observe molecular states that result from the
interaction wiring - DNA/RNA Gene expression
- DNA microarrays, SAGE
- Protein levels, locations, and modifications
- Mass spectrometry, fluorescence microscopy,
protein arrays - Gross phenotypes
- e.g., growth rates of single and double deletion
strains
1)
2)
6High-throughput methods for measuring cellular
states
- Gene expression levels RT-PCR, arrays
- Protein levels, modifications mass specProtein
locations fluorescent tagging - Metabolite levels NMR and mass spec
- Systematic phenotyping
7The transcriptome and proteome
- The transcriptome is the full complement of RNA
molecules produced by a genome - The proteome is the full complement of proteins
enabled by the transcriptome - DNA ? RNA ? protein
- Genome ? transcriptome ? proteome
- 30,000 genes ? ??? RNAs ? ??? proteins?
- For example, the drosophila gene Dscam can
generate 40,000 distinct transcripts through
alternative splicing. - What is the minimum number of exons that would be
required?
8Expression High-throughput approaches
- RNA
- DNA Microarrays
- cDNA / EST sequencing
- RT-PCR
- Differential display
- SAGE
- Massively parallel signature sequencing (MPSS)
- Proteins
- 2D PAGE
- Mass spectrometry
9Gene expression arrays
-
- They are really, really, really, really, really,
really, really, really, really, really, really,
really, really important -
10Microarrays
- Monitors the level of each gene
- Is it turned on or off in a particular
biological condition? - Is this on/off state different between two
biological conditions? - Microarray is a rectangular grid of spots printed
on a glass microscope slide, where each spot
contains DNA for a different gene
11Two-color DNA microarray design
Reverse Transcription
12cDNA-chip of brain glioblastoma
13Types of microarrays
- Spotted (cDNA)
- Robotic transfer of cDNA clones or PCR products
- Spotting on nylon membranes or glass slides
coated with poly-lysine - Synthetic (oligo)
- Direct oligo synthesis on solid microarray
substrate - Uses photolithography (Affymetrix) or ink-jet
printing (Agilent) - All configurations assume the DNA on the array is
in excess of the hybridized samplethus the
kinetics are linear and the spot intensity
reflects that amount of hybridized sample. - Labeling can be radioactive, fluorescent
(one-color), or two-color
14Microarray Spotter
15Affymetrix High Density Arrays
16Microarrays (continued)
- Imaging
- Radioactive 32P labeling Autoradiography or
phosphorimager - Fluorescent labeling Confocal microscope
(invented by Marvin Minsky!!) - Feature density
- Nylon membrane macroarrays ? 100-1000 features
- Glass slide spotted array ? 5,000 features / cm2
- Synthesized arrays ? 50,000 features / cm2
17Microarrayconfocal scanner
- Collects sharply defined optical sections from
which 3D renderings can be created - The key is spatial filtering to eliminate
out-of-focus light or glare in specimens whose
thickness exceeds the immediate plane of focus. - Two lasers for excitation
- Two color scan in less than 10 minutes
- High resolution, 10 micron pixel size
18cDNA / EST sequencing projects
- cDNA complementary or copy DNA
- EST Expressed Sequence Tag
- The microarray could be described as a closed
system because information about RNAs is limited
by the targets available for hybridization. RNAs
not represented on the array are not
interrogated. - Direct sequencing of cDNAs (yielding ESTs)
overcomes this problem by large-scale random
sampling of sequences from a whole-cell RNA
extract - Statistical counting of distinct sequences
provides an estimate of expression level - Conversely, cDNA library can be normalized to
capture rare messages - Requires large scale sequencing to get
statistical significance
19cDNA / EST SequencingPreparation of a cDNA
library in phage l vector
20SerialAnalysis ofGeneExpression
SAGE Technology
Takes idea of sequence sampling to the
extreme Generates short ESTs (9-14nt) which are
joined into long concatamers and then
sequenced 49 is 262,144, 5-fold the number of
human genes The count of each type of tag
estimates RNA copy number gt50X more efficient
than cDNA sequencing because many RNAs are
represented in a single sequencing run
21Steps to SAGE
- Copy mRNA ? ds cDNA using biotinylated (dT)
- Cleave with anchoring enzyme (AE) which cleaves
within 250bp of poly-A tail at 3 end. - Capture this segment on streptavidin beads
- Ligate to linkers containing a type IIs
restriction site, which cleave DNA 14 bp away
from this site. - Ligate sequences to each other and PCR amplify
- Cleave with AE to remove linkers
- Concatenate, clone, and sequence
22Velculescu et al. Science (1995)
WHY DI-TAGS? Ditags are used to detect bias in
the PCR amplification step. The probability of
any two tags being coupled in the same ditag is
small. Biased amplification can be detected as
many ditags always having the same 2 tags present.
B
A
B
A
B
A
PrimerA
PrimerB
PrimerA
PrimerB
23SAGE (continued)
Example of a concatemer
CATGACCCACGAGCAGGGTACGATGATACATGGAAACCTATGCACCTTGG
GTAGCACATG
TAG1
TAG2
TAG3
TAG4
Counting the tags
24Proteomics
25An example SDS-PAGE
How many proteins are in a band?
Protein stains Silver Copper Coomassie Blue
262D-PAGE
Dimension 2 size
Dimension 1 Isoelectric focusing gel
272D gel from macrophage phagosomes
28Mass spectrometry
- Mass spectrometers consist of three essential
parts - Ionization source Converts peptides into
gas-phase ions (MALDI ESI) - Mass analyzer Separates ions by mass to charge
(m/z) ratio (Ion trap, time of flight,
quadrupole) - Ion detector Current over time indicates amount
of signal at each m/z value
29MS/MS Overview
30MS/MS Overview
31(No Transcript)
32(No Transcript)
33A raw fragmentation spectrum
By calculating the molecular weight difference
between ions of the same type the sequence can be
determined. SEQUEST uses the fragmentation
pattern to search through a complete protein
database to identify the sequence which best fits
the pattern.
34Tandem Mass Spec (MS/MS)
35Typical nanoelectrospray source
36Isotope Coded Affinity Tags (ICAT)
Mass spec based method for measuring relative
protein abundances between two samples
Heavy reagent d8-ICAT (Xdeuterium) Normal
reagent d0-ICAT (Xhydrogen)
ICAT Reagents
O
N
N
O
O
O
I
N
O
O
N
S
Biotin tag
Linker (d0 or d8)
Thiol specific reactive group
37Protein Quantification Identification via ICAT
Strategy
100
Mixture 1
Light
Heavy
0
550
560
570
580
m/z
ICAT-labeled cysteines
Quantitation
100
NH2-EACDPLR-COOH
Combine and proteolyze (trypsin)
Affinity separation (avidin)
Mixture 2
0
200
400
600
800
m/z
ICAT Flash animation http//occawlonline.pearsone
d.com/bookbind/pubbooks/bc_mcampbell_genomics_1/me
dialib/method/ICAT/ICAT.html
Protein identification
38ICAT continued
- The heavy (blue) and light (gray) peptides are
separated and quantified to produce a ratio for
each peptide here, a single peptide ratio is
shown - Each peptide is subjected to CID fragmentation in
the second MS stage in order to identify it
39Metabolomic measurements
-
- 2D NMR or mass spectrometry
- Currently not global and in less widespread use
than microarrays, but have tremendous potential -
40Gene knockout and RNAi libraries for model
speciesExample from yeast
- Replacement of yeast ORFS with kanMX gene flanked
by unique oligo barcodes Yeast Deletion Project
Consortium
41YFP tagging for protein localization
YPF is green, transmitted light is red
NIC96 Nuclear Pore
TUB1 Tubulin cytoskeleton
HHF2 Histone Nucleus
BNI4 Bud neck
Images courtesy T. Davis lab See also recent work
byWeissman and OShea labs at UCSF
42Systematic phenotyping
Barcode (UPTAG)
CTAACTC
TCGCGCA
TCATAAT
Deletion Strain
Growth 6hrs in minimal media (how many doublings?)
Rich media
Harvest and label genomic DNA
43Systematic phenotyping with a barcode arrayRon
Davis and friends
- These oligo barcodes are also spotted on a DNA
microarray - Growth time in minimal media
- Red 0 hours
- Green 6 hours
44Molecular Interactions
- Among proteins, mRNA, small molecules, and so on
45(No Transcript)
46Also like sequence, protein interaction data are
exponentially growing
DIP Database Growthtotal interactions
EMBL Database Growthtotal nucleotides (gigabases)
10
5
0
1980
2000
1990
(As are the false positives!!!)
47High-throughput methods for measuring interaction
networks
- 2-hybrid
- co-immunoprecipitation w/ mass spec
- chIP-on-chip
- systematic genetic analysis
48Yeast two-hybrid method
Fields and Song
49Detection of protein interactions with antibody
arrays
McBeath and Schreiber
50Kinase-target interactions
Mike Snyder and colleagues
51High-throughput methods for measuring networks
- 2-hybrid
- co-immunoprecipitation w/ mass spec
- chIP-on-chip
- systematic genetic analysis
52Protein interactions by protein
immunoprecipitation followed by mass spectrometry
TEV Tobacco Etch Virus proteolytic site CBP
Calmodulin binding peptide Protein A IgG
binding from Staphylococcus
Gavin / Cellzome
53TAP purification
Image courtesy of Bertrand Seraphin
54High-throughput methods for measuring networks
- 2-hybrid
- co-immunoprecipitation w/ mass spec
- chIP-on-chip
- systematic genetic analysis
55ChIP-chip measurement of protein?DNA interactions
From Figure 1 of Simon et al. Cell 2001
56High-throughput methods for measuring networks
- 2-hybrid
- co-immunoprecipitation w/ mass spec
- chIP-on-chip
- systematic genetic analysis
57Genetic interactions synthetic lethals and
suppressors
- Genetic Interactions
- Widespread method used by geneticists to discover
pathways in yeast, fly, and worm - Implications for drug targeting and drug
development for human disease - Thousands are now reported in literature and
systematic studies - As with other types, the number of known genetic
interactions is exponentially increasing
Adapted from Tong et al., Science 2001
58Most recorded genetic interactions are synthetic
lethal relationships
A
B
A
DB
DA
B
DA
DB
Adapted from Hartman, Garvik, and Hartwell,
Science 2001
59Synthetic-lethal protein interaction
A
B
X
A
B
Suppressor protein interaction
A
B
B
DB
X
A
B
DB
60Interpretation of genetic interactions (Guarente
T.I.G. 1990)
Parallel Effects (Redundant or Additive)
Sequential Effects (Additive)
GOAL Identify downstream physical pathways
Single A or B mutations typically abolish their
biochemical activities
Single A or B mutations typically reduce their
biochemical activities