Title: Chip arrays and gene expression data
1Chip arrays
Chip arrays and gene expression data
2Chip arrays
With the chip array technology, one can
- measure the expression of 10,000 (all) genes at
once - Can answer questions such as
- Which genes are expressed in a muscle cell?
- Which genes are expressed during the first weak
of pregnancy in the mother? In the fetus? - Which genes are expressed in cancer?
3Chip arrays
Classical chip array questions (continued)
4. If one mutates a TF which genes are not
expressed following this change? 5. Which genes
are not expressed in the brain of a retarded
baby? 6. Which genes are expressed when one is
asleep versus when the same person is awake?
4Chip arrays
DNA chip in each cell theres a specific DNA
molecule. Upon hybridization with an mRNA
molecule (or cDNA one) the intensity of the
hybridization can be quantified by light.
5Chip arrays
Various technologies
The two most common companies Affymetrix (uses
photolithography). Agilent (uses phosphoramidite
chemistry).
6Chip arrays
Affymetrix
Affymetrix each probe is 25 bp a part of an
exon.
The reader
The chip itself
In one cm2 gt 106 different oligos
7Chip arrays
Affymetrix
Affymetrix each probe is 25 nucleotides. Above
this, a technological problem exists the
synthesis becomes inaccurate With such short
probes, each mRNA can hybridize to more than one
probe. The solution, each gene is covered by
several distinct probes
8Chip arrays
Affymetrix
Affymetrix one can buy ready-made chips (human
genome, mouse genome), or can design (print)
his own chip (more expensive)
9Chip arrays
Affymetrix
- Detection
- mRNA is isolated from the tissue (cells, viruses)
- cDNA is synthesized
- The cDNA is fluorescently labeled
- Sometimes, the cDNA is amplified using PCR
- The intensity in each cell (probe) is measured by
the reader
10Chip arrays
Agilent
Agilent Developed DNA printers in each spot
pico-liters of nucleotides are added. They can
make probes up to 60 mers (Agilent is derived
from Hewlett-Packard)
Standard phosphoramidite chemistry
11Chip arrays
Agilent
Hybridization to Agilent probes is more
accurate If there is an hybridization to a
probe, the gene it represents is probably
expressed
12Chip arrays
Agilent
But, it is impossible to know how many probes are
in each cell. So absolute fluorescent intensities
are meaningless
13Chip arrays
Agilent
Solution, in the same experiment, hybridize
samples with two conditions healthy cells versus
tumor cells The Agilent reader will give the
ratio of the two colors
14Chip arrays
Stanford cDNA chips
In this approach, long cDNA sequences (gt300bp)
are produced in a cell (a clone) and are linked
to each chip cell. This produces long cDNAs and
saves synthesizing them a nucleotide at a time
(cheaper!) As in the case of Agilent, it is
impossible to control the number of probes in
each cell
15Chip arrays
Output
Each cell is some measurement which is an output
of an optical scanner
16Chip arrays
Output
Each gene is represented by several cells
(usually distributed in various places around the
chip)
Gene 2
Gene 1
Gene 3
17Chip arrays
Output
Programs specific to each technology convert the
data from oligos to genes
18Chip arrays
Technical noise
- Microarray data are noisy because of technical
issues - Variation introduced during sample preparation
- Array manufacture (variation between supposedly
identical arrays) - Hybridization (variation in the amount of a
sample), and more
19Chip arrays
Normalization
Microarray data are normalized to remove
technical noise. This step is done both within an
array and among arrays
20Chip arrays
Repeats
The repeat can either be the same sample a
different chip or a real biological repeat a
different sample
21Chip arrays
Differential expression
Genes 1 and 3 are not expressed the same in wt
versus treatment -gt they are differentially
expressed Statistically, t-test and/or ANOVA are
used to test if a specific gene is differentially
expressed
22Chip arrays
Correcting for multiple tests
- Because there are thousands of genes in each chip
array experiments, even if none of the genes is
differentially expressed, many false positive
predictions are expected - Two approaches for correction
- Bonferroni divides the P value cutoff by the
number of genes. -gt many potential genes may be
missed - False Discovery Rate (FDR) allows for a certain
percent of false discoveries (e.g., 5)
23Chip arrays
Expression profiles
Genes 1 and 2 show the same expression profile.
Same is true for genes 3 and 4 (highly expressed)
24Chip arrays
Expression profiles
Genes with the same expression profile -gt
suggestive of a functional linkage (in this
example g1 and g3 may be specific to the brain
rather than just being house keeping genes that
are highly expressed in all tissues (g2).
25Chip arrays
Clustering
In general, we want to find all the genes which
share the same expression profile -gt suggestive
of a functional linkage This is done by
clustering the genes with the same profile
26Chip arrays
Clustering
Clustering of the conditions can suggest two
types of brain tumors (bt) Bi-clustering both
on the conditions and on the genes.
27Chip arrays
Applications
Think of increasing the glucose concentration of
E.coli and making a chip array in these various
concentrations One can potentially discover all
genes in the glycolysis pathway Knocking out a
gene -gt discover all genes that interact with it
28Chip arrays
Applications
Analyzing expression of genes can help reveal the
gene network of a given organism
29Chip arrays
Gene network
30Chip arrays
Classification (clinical)
Do I have a brain tumor?
31Chip arrays
Presentation heat map
500 genes from (14 chips of) normal and (32 of)
ischemic human hearts.
32Chip arrays
From a list of genes for characterization
It is often impossible to make sense by just
reading the name of the genes function The Gene
Ontology (GO) project enables to find whether
these differentially expressed genes share
something in common GO is a controlled
vocabulary that describes all annotated genes
33Chip arrays
From a list of genes for characterization
One can compare if the GO category, for example
extracellular is more prevalent among the
differentially expressed genes relative to their
frequency among all genes
34Chip arrays
Using chips to study evolution of expression
It is very problematic to use a human chip to
study gene expression in gorilla. Observed
differences in expression may reflect true
differences in expression levels. However, they
may also reflect bias introduced by the fact that
the many mRNAs of gorilla differ in sequence from
mRNA of humans, resulting in different levels of
hybridizations
35Chip arrays
Using chips to study evolution of expression
36Chip arrays
Using chips to study evolution of expression
Compared expression levels between humans,
chimpanzees, orangutans, and rhesus monkeys using
specially designed chips for each
genome. Concluded that for most genes the
expression level is conserved among primates
(this is expected since too high or too low
levels should be selected against)
37Chip arrays
Using chips to study evolution of expression
Found shifts in expression level of TFs specific
to human (TFs are highly expressed in humans
compared to other primates) This supports the
theory that most of the significant differences
between human and chimp are in gene regulation
rather than in protein sequences
38Chip arrays
Using chips to study evolution of expression
Genes that are highly expressed are slow evolving
39Chip arrays
Sequence by hybridization
- It was thought that the following procedure could
work for sequencing a genome - Make a chip containing all x mers (e.g., x 25)
- Hybridize a genome to the chip
- By analyzing all the hybridizations with their
overlaps assemble the genome - Problem it doesnt work
40Chip arrays
ChIP-chip
41Chip arrays
ChIP-chip
Chip-chip A method for measuring protein-DNA
interaction Proteins that bind DNA
includes Those responsible for transcription
regulation Transcription factors
(TFs) Replication proteins Histones
42Chip arrays
ChIP-chip
ChIP-chip One chip is for Chromatin
ImmunoPrecipitation and the second chip is for
DNA microarrays The method is used mostly to
detect TF binding sites
43Chip arrays
ChIP-chip
- There must be an antibody to the TF
- The DNA is broken into fragments
- The DNA is chemically linked to the TF
- DNATF is precipitated using the Ab
- The bonds between DNA and TF are removed
- DNA is determined by hybridization to the DNA chip
44Chip arrays
ChIP-chip
Control with irrelevant antibodies
Reverse cross linking and DNA extraction Amplifica
tion and labeling
45Chip arrays
Tiling arrays
Here the chip array should include not only
protein coding genes but also control regions, or
simply the entire genome
46Chip arrays
Deep sequencing
Solexa and other methods for deep
sequencing Today technologies are being
developed that can sequence a lot of data in an
incredible rate A variant of the ChIP-chip method
is to sequence rather than to hybridize the DNA
47Chip arrays
Protein-protein interactions
48Chip arrays
Protein-protein interactions
Databases of protein-protein interactions DIP In
tAct MINT MIPS iHOP
49Chip arrays
Protein-protein interactions
Protein-protein interactions are fundamental for
functional annotation If X interacts with Y Y
is known to be related to muscle development,
maybe X is also related to muscle
development Guilt by association