Title: Transcriptome
1Transcriptome
- Gene Discovery
- Quantitation of Gene Expression
BIO520 Bioinformatics Jim Lund
2WHY?
- The genes expressed determine the state of the
cell. - Signaling.
- Metabolic capabilities.
- Differentiation state (cell type).
- Response to changes in environment.
- Verifies gene predictions.
- Transcriptional regulation
- Normal vs. abnormal
- Conditional expression
3Transcriptome Analysis
- Gene (transcript) discovery
- transcripts
- alternative splicing/processing
- Transcript assays
- Promoter analysis
- Transcription Factors
- Cellular control networks
4Gene Discovery
- Inference from genomic DNA
- Prokaryotes fungi OK
- cDNA characterization
- EST
- SAGE
5EST (Expressed Sequence Tag)
- Sequence cDNA libraries
- proportional libraries
- subtracted or normalized libraries
- Which end?
- 5 or 3 or Whole
6Library Type
- regular or proportional
- Subtracted
- Miss alternate transcripts
- normalized
- Tissue
- Primer
- dT vs random
7Ideal cDNAs
8Real cDNAs
9Which end?
- Whole cDNA
- BEST HARDEST (Long)
- 3-end
- Consistent technically, limited information
- 5end
- Coding identity highest
- 5 AND 3
- Good, but technical informatic challenge
10(No Transcript)
11EST Data Analyses
- Clustering Analysis
- Assemble ESTs into genes.
- Alternative splicing forms
- Find coding SNPs.
- Truncated, unspliced, and junk ESTs can be
misleading - Project Unigene
- Program stackPACK
- Frequency analysis
- Digital Differential Display
- DDD is a computational method for comparing
sequence-based gene representation profiles among
individual cDNA libraries or pools of libraries.
12EST Results (old)
- Known genes (30)
- Similarities to other ORFs, ESTs (30)
- Infer Function?
- Novel Class (30, ? w/ time)
13Typical Progress/Results
- Humans
- 6,694,833 ESTs
- 124,179 clusters (sets)
- 29,000 sets contain EST and mRNA seqs.
- CGAP EST library plateau broken by
- different tissues, different states
- normalized libraries
14Data Quality Considerations
- 99 correct data (1 errors!).
- Frameshifts-effects depend on tools
- BLASTX tool to find
- How sensitive
- TBLASTX, TBLASTN to use in other projects
- How sensitive
15Gene Expression Assay
- EST (Poor method)
- SAGE
- Microarray Hybridization
- Transcriptional Fusions
- GFP, LacZ fusions
16Serial Analysis of Gene Expression (SAGE)
- Collect mRNA
- Isolate short oligomers from each transcript.
- Ligate together the oligomers and clone them.
- Sequence thousands of clones.
- Map the 1x104 1x105 oligomers to their genes.
- Find which genes are transcribed and their
relative expression levels. - http//www.sagenet.org (Vogelstein at JHU)
17SAGE technique
- Prepare biotin labeled cDNA
- Cleave with anchoring enzyme (NlaIII)
18SAGE technique
- Ligate on linkers
- Cleave with tagging enzyme (BsmFI)
19SAGE technique
- Ligate, PCR, and gel purify ditags (102bp).
- Recleave with anchoring enzyme (NlaIII), ligate
to form concatemers. - Size select, clone and sequence concatemers.
20Colon cancer vs. normal colon epithelium (SAGE)
21Microarray Hybridization
- Determine gene expression by parallel
hybridization of labeled cDNA to DNA attached to
a fixed support. - http//cmgm.stanford.edu/pbrown/
22Microarray Hybridization
- Producing chips
- Producing probes / reading arrays
- Analyzing and interpreting data
23Transcriptional Array
orf 1
orf 2
orf 3
1
2
3
3 cm
4
5
6
200 spots
7
8
9
2
40,000 dot/9 cm
or
Condition 1
Condition 2
gt All human genes
mRNA
mRNA
24Transcriptional Array-1
orf 1
orf 2
orf 3
1
2
3
3 cm
4
5
6
200 spots
7
8
9
2
40,000 dot/9 cm
or
Condition 1
Condition 2
Condition 2
gt All human genes
mRNA
mRNA
mRNA
25Transcriptional Array-2
orf 1
orf 2
orf 3
1
2
3
3
1
2
3 cm
6
4
5
6
200 spots
7
8
9
7
8
2
40,000 dot/9 cm
or
Condition 1
Condition 2
gt All human genes
mRNA
mRNA
26Microarray Technologies
- Spotted arrays (Brown et al.)
- Spot arrays on glass slides
- PCR fragments
- Long (50-70bp) oligo arrays
- Synthesis
- Affymetrix (www.affymetrix.com)
- High density array of 25 bp oligos
- Made using light directed oligonucleotide
synthesis and photolithography - Agilent, CombiMatrix
- Made using light directed oligonucleotide
synthesis and mirrors.
27Spotted Arrays
28Print Quill
29Spotted microarray image
30Affymetrix photolithographic technology
- Lithographic masks are used to either block or
transmit light onto specific locations of the
array. - The surface is then flooded with a solution
containing either adenine, thymine, cytosine, or
guanine, and coupling occurs only in those
regions on the glass that have been deprotected
through illumination. - The coupled nucleotide also bears a
light-sensitive protecting group, so the cycle
can be repeated. - Microarray is built as the probes are synthesized
through repeated cycles of deprotection and
coupling. - Typically ends at 25 bps.)
- Current arrays have 1.3 million unique features
per array.
31GeneChip Expression Assay Design
32Affymetrix GeneChips Expression Analysis
- Available for humans and model organisms.
- Made only by Affymetrix.
- Chip designs change slowly.
- GeneChips
- Human 50,000 RefSeq genes and ESTs
- C. elegans 22,500 genes (12/00 genome
annotation) - Rat 230 30,000 genes, ESTs
- Yeast 6100 gene set
- Tiling arrays for model organisms
- http//affymetrix.com
33Quantitation of fluorescence signals (Image to
data)
- Hybridization, scan in chip image.
- Gridding
- Determine where the spots are.
- Spot intensity and local background
determination. - Normalization
- Adjust to make the red and green total signal
intensities the same. - Gene expression ratio.
- Red channel/green channel.
- Programs
- ScanAlyze, http//rana.lbl.gov/EisenSoftware.htm
- GenePix, http//www.axon.com/gn_GenePixSoftware.ht
ml
34Microarray data
Big tables of numbers!
35Viewing microarray data
Clustergram
Scatter plot log(ch1) vs log(ch2)
M vs A signal vs expression change
Volcano plot log(expr) vs p-value