Title: Gene Expression Analysis Unit 19
1Gene Expression AnalysisUnit 19
- BIOL221T Advanced Bioinformatics for
Biotechnology
Irene Gabashvili, PhD
2Major challenge in biology
- Gain an understanding of the workings of the cell
by integrating available information from the
various fields of molecular and cellular biology
and physiology into an accurate model to generate
hypotheses for testing - In previous lectures we mostly talked about
Genome (Sequence) informatics
3-Omes -Omics
- Genome - all the genes of an organism
- Transcriptome all the transcripts (mRNAs) of an
organism - Proteome all the proteins of an organism
- Metabolome all metabolites (low molecular
weight molecules participating in general
metabolic reactions required for the maintenance,
growth) of an organism
4Sequencing Successes
T7 bacteriophage completed in 1983 39,937 bp, 59
coded proteins Escherichia coli completed in
1996 4,639,221 bp, 4,293 ORFs Sacchoromyces
cerevisae completed in 1996 12,069,252 bp, 5,800
genes
5Completed sequences
1995 First complete bacterial genomes 2002
About 35 bacterial genomes
0.5-5 Mb hundreds to 2000 genes 1996 April
Yeast (Saccharomyces cerevisiae)
12 Mb, 5,500 genes 1998 Dec. -Worm
(Caenorhabditis elegans)
97 Mb, 19,000 genes 2000 March - Fly (Drosophila
melanogaster) 137 Mb,
13,500 genes 2000 Dec. - Mustard (Arabidopsis
thaliana) 125 Mb, 25,498
genes 2000 June Human (Homo sapiens) 1st rough
draft 2001 Feb 15/16 Human, working draft
3000 Mb, 35,00040,000 genes
Mouse, rat, chimp
6Bac- by Bac shotgun (public sequence)) Total
shotgun from the BAC ends (Celera)
7No prerequisites
Clone contig is a prerequisite
8Human Genome Organization
HUMAN GENOME
Nuclear genome 3000 Mb 25-35-40-65-80K genes
Mitochondrial genome 16.6 kb 37 genes
30
70
Genes and gene- related sequences
Extragenic DNA
Two rRNA genes
22 tRNA genes
13 polypeptide- encoding genes
80
20
Unique or moderately repetitive
10
90
Unique or low copy number
Moderate to highly repetitive
Coding DNA
Noncoding DNA
Pseudogenes
Gene fragments
Introns, untranslated sequences, etc.
Tandemly repeated or clustered repeats
Interspersed repeats
9Human RNA genes (non-coding RNA transcripts)
- 100000 RNA genes in human genome (rough)
- rRNA
- tRNA
- Small nuclear RNA
- Small nucleolar RNA
- SRP RNA
- MicroRNA
- Antisense RNA
- Non-coding gene mRNA isoforms
- RNAs form transcribed pseudogenes
10Human pseudogenes
Non-processed pseudogenes
Processed pseudogenes
Contain introns Arise by duplications
Frequency of transfer depend on chromosomal
context (pericentromeral fragment are transferred
more often)
Do not contain introns Arise by
retrotransposition Frequency of transfer
depends on initial level of gene
expression (Highly expressed genes are
transferred more often)
Partial
Complete
Both types of pseudogenes are raw material for
evolution
11Molecular Biology Tools
- Northern/Southern Blotting
- Differential Display
- RNAi (small RNA interference)
- Serial Analysis of Gene Expression (SAGE)
- DNA Microarrays or Gene Chips
- Yeast two-hybrid analysis
- Immuno-precipitation/pull-down
- GFP Tagging Microscopy
12SAGE
- Every mRNA molecule is converted into a short
(10-14 base), unique tag. Equivalent to reducing
all the people in a city into a telephone book
with surnames - After creating the tags, these are assembled or
concatenated into a long list - The list can be read using a DNA sequencer and
the list compared to a database to ID genes or
proteins and their frequency
13SAGE
Convert mRNA to dsDNA Digest with
NlaIII Split into 2 aliquots Attach Linkers
14SAGE
Linkers have PCR Tagging Endonuclease Cut
with TE BsmF1 Mix both aliquots Blunt-end
ligate to make Ditag Concatenate Sequence
15Hybridization
- Nucleic acid hybridization is a fundamental tool
in molecular genetics. It takes advantage of
the complementary nature of double stranded DNA
or RNA to the DNA or even RNA to RNA. - Nucleic acid probes are used extensively in many
different diagnostic tests. - Hybridization is also used in cloning and PCR
16Principles of hybridization
- The addition of a probe to a complex mixture of
target DNA. The mix is incubated under
conditions that promote the formation of hydrogen
bonds between complementary strands. - Factors that affect hybridization characteristics
- Strand Length
- Base Composition
- Chemical environment
17Principles of nucleic acid hybridization
18Types of probes
19Stringency
- Strand length
- The longer the probe the more stable the duplex
- Base Composition
- The GC base pairs are more stable than AT
- Chemical environment
- The concentration of Na ions stablize
- Chemical denaturants (formamide or urea)
destablize hydrogen bonds.
20Reassociation Kinetics
- When double stranded DNA is denatured by heat the
speed at which the strands form double stranded
DNA is due to the starting concentration of DNA.
If there is a high concentration of complementary
DNA then the time required will be reduced.
Reassociation Kinetics is the speed at which
complementary single strands form duplexes. Two
parameters is Concentration (Co) and time (t) in
sec. (Cot) This dictates that single copy genes
hybridize more slowly than multicopy sequences.
Therefore give weaker signals on a southern.
21Bioinformatics Hybridization Techniques
- Software tools to design probes, calculate
melting temperature, GC content, stability,
folding - Tools to design Primers short DNA sequences used
to initiate the synthesis of DNA - Tools to design Probes sequences of DNA or RNA
used to detect complementary sequences by
hybridization
22Tools to design Primers, Probes cloning
strategies
- Matlab
- VectorNTI
- MacVector www.macvector.com/
- http//array.iis.sinica.edu.tw/ups/
- http//frodo.wi.mit.edu/
- http//genome.jouy.inra.fr/cgi-bin/CloneIt/CloneIt
23Tools to design Primers and Probes
- In-Silico PCR - search for a pair of primers
- http//genome.ucsc.edu/cgi-bin/hgPcr
- http//bioinfo.ut.ee/index.php?pid1
- http//bioinfo.ut.ee/mprimer3/
- http//bioinfo.ut.ee/genometester/
- http//bioinfo.ut.ee/maphdesigner/
- https//vectordesigner.invitrogen.com/
24Dot blot or slot blot
25Southern Blot
26Northern Blot
27Mutation detection by RFLP
28Assay of RFLP (restriction site polymorphism)
This has a variety applications including VNTR
RFLPs and DNA fingerprinting.
29Detection of gene deletions by restriction mapping
30In situ hybridization
- Chromosome in situ hybridization
- Metaphase or protometaphase chromosomes are
probed with labeled DNA . The DNA can be labeled
with a fluorochrome (FISH). - Tissue in situ hybridization
- Sliced or whole mounted preparations can be
probed with RNA probes to detect mRNA expression
31Hybridization Summary
- Hybridization is due to complementarity of DNA
strands. - DNA can be labeled various ways
- Hybridization can detect identical or similar
sequences. - A variety of techniques utilize hybridization of
DNA or RNA probes Southern Blot, RFLP, VNTRs,
Mutation detection, deletion detection, Northern
Blot, tissue specific expression, In situ
hybridization - Microarrays are minaturized hybridization
platforms
32DNA Microarrays
- Principle is to analyze gene (mRNA) or protein
expression through large scale non-radioactive
Northern (RNA) or Southern (DNA) hybridization
analysis - Brighter the spot, the more DNA
- Microarrays are like Velcro chips made of DNA
fragments attached to a substrate - Requires robotic arraying device and fluorescence
microarray reader
33Microarrays
- Probe single-stranded DNA with a defined
identity tethered to a solid medium Target
the labeled DNA or RNA
34Microarrays
35(No Transcript)
36History Types of Arrays
- The first arrays, created in the mid 80s, were
called macro arrays. They were fabricated by
spotting DNA probes on a membrane-type material
with spot sizes of about 300 microns, which
limited the density of the spots to about 2000
probes. They mostly were used for DNA clones, PCR
products or oligonucleotides and typically were
used with radioactively-labeled targets.
37History Types of Arrays
- Next came microarrays, which were created by
using pin spotters. These are pin-based robotic
systems that can dispense an accurate volume of a
DNA solution in a spot of about 150 microns onto
a glass slide. DNA clones, PCR products or
pre-synthesized oligonucleotides can be bound to
the glass surface to create high-density arrays
38History Types of Arrays
- By the mid 90s, researchers were using 2 channel
microarrays Templates for genes of interest were
obtained and amplified by PCR. Following
purification and quality control, aliquots were
printed on coated glass microscope slides. Total
RNA from both the test and reference sample was
fluorescently labeled with either Cy3 or
Cy5dUTP using a single round of reverse
transcription. The fluorescent targets were
pooled and allowed to hybridize, under stringent
conditions, to the clones on the array.
39History Types of Arrays
- Rather than making arrays in the laboratory using
spotters, oligonucleotides can be synthesized in
situ on a surface, creating high-density arrays
with up to 500,000 probe sequences. The first
company to commercialize this type of technology
was Affymetrix, which uses a proprietary
light-directed oligonucleotide synthesis approach
(Affy GeneChips).
40History Types of Arrays
- Agilent Technologies uses inkjet printing
technology to build the oligonucleotides on
standard format glass slides using
phosphoramidite chemistry. - Nanogen developed an electronic microarray,
utilizing the natural charge of the DNA - Illumina BeadChips - The Sentrix BeadChip
technology is set up to perform multiple
hybridizations in parallel. Probes 50mer
oligonucleotides
41BO Chapter 16
- Annotating array probes
- Designing the Experiment
- Data Collection and Management
- Image processing
- Measures of Expression
- Normalization
- Finding Significant Genes
42BO Chapter 16, cont
- Expression Vectors
- Clustering Approaches
- Beyond Statistical Significance and Clustering
- The Classification Problem
- Distances
- Fisher Exact Test
43The starting point Annotating Array Probes
- Approaches to construct DNA arrays in-sity
synthesis, randomly assembled bead-based arrays,
mechanically spotted arrays (cDNA clone,
PCR-amplified amplicon or other material) - Annotation Resources SOURCE, DRAGON, DAVID,
RESOURCERER, TIGR Gene Indices, EGO databases
(some no longer exist, see ex. links) - Mapping software tools IPA
44(No Transcript)
45Designing the Experiment
- 2-color microarrays
- Plenty of RNA sample direct comparison with dye
swap (flip dye pairs) - Limited sample balanced block design
- More than 2 samples are compared Reference
design (common reference needed) - One color
- Power calculations for statistically significant
measures of gene expression
46Bioinformatics of Gene Expression
- Data Collection and Management (MIAME, MAGE-ML)
- Estimating Background
- Measures of Expression (log ratio)
- Normalization (2-channel arrays)
- Filtering
- Finding Significant Genes
- Custering
47Internet Resources
- Expression Databases
- Array Express www.ebi.ac.uk/arrayexpress/
- CIBEX cibex.nig.ac.jp/
- Gene Expression Omnibus www.ncbi.nlm.nih.gov/geo/
- Annotation
- The Source database source.stanford.edu
- DAVID http//david.abcc.ncifcrf.gov/
- Gene Ontology Database, KEGG
48Internet Resources
- Expression Software
- http//david.abcc.ncifcrf.gov/
- BASE base.thep.lu.se
- Bioconductor bioconductor.org
- TM4 software http//www.tm4.org/
- SAM http//www-stat.stanford.edu/tibs/SAM/
- Cluster/Treeview http//bonsai.ims.u-tokyo.ac.jp/
mdehoon/software/cluster/software.htm - HCE http//www.cs.umd.edu/hcil/hce/
- http//ihome.cuhk.edu.hk/b400559/arraysoft_mining
_specific.html
49Commercial Software
- Spotfire spotfire.tibco.com
- GeneSpring www.genespring.com/
- Partek Pro http//www.partek.com/
- IPA http//www.ingenuity.com/
50Beyond Statistics
- IPA looks for significant functional
associations, GO- and literature based
associations, canonical pathways, predefined and
custom gene lists, creates networks, reconstructs
significant processes, finds biomarkers
51IPA, How to
- Upload Data
- Analyze Gene Expression Data
- Compare Gene Expression Experiments
- o Interpret results
- Functions, Diseases, Pathways, Networks, Lists,
Molecules - o Explore Networks
- Highlight, Overlay, Merge, Export, Share
52Learning IPA
- Workshop in Stanford on April 21st
http//lane.stanford.edu/howto/index.html?id_2608
53Learning IPA
- o Merging Networks
- Simple search for genes/proteins/chemicals
- Using Node View Pages
- QA
- Hands-on exercises
54From Previous Lecture
- Intermolecular Interactions
- Interaction and Pathway Databases
- Search and Explore in IPA (Simple search for
genes/proteins/chemicals, Advanced Search for
diseases, molecule types, locations) - Finding interaction partners and closest path in
networks, in IPA - Quick functional assessments in IPA