Gene expression data: Questions, answers and statistics - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Gene expression data: Questions, answers and statistics

Description:

Genetics and Bioinformatics, Walter & Eliza Hall Institute of Medical research. Overview ... Biological questions first, then statistical methods (design, ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 40
Provided by: mar227
Category:

less

Transcript and Presenter's Notes

Title: Gene expression data: Questions, answers and statistics


1
Gene expression dataQuestions, answers and
statistics
  • Terry Speed and Yee Hwa Yang
  • Department of Statistics UC Berkeley
  • Genetics and Bioinformatics, Walter Eliza Hall
    Institute of Medical research

2
Overview
  • Questions involving microarray data.
  • Different experimental designs
  • Case studies, including
  • Olfactory epithelium,
  • Olfactory bulb,
  • Identification of differentially expressed genes,
  • Pattern searching.

3
Questions and answers a point of view
  • Biological questions first, then statistical
    methods (design, analysis) and thinking, leading
    to tentative answers, together with an assessment
    of the uncertainty in those answers
  • Rather than beginning with
  • Purely exploratory analyses, or modelling
    either processes or data
  • Something of each of the last two comes into
    most statistical analyses, but only after
    focussing on biological questions

4
Biological question Differentially expressed
genes Sample class prediction etc.
Experimental design
Microarray experiment
16-bit TIFF files
Image analysis
(Rfg, Rbg), (Gfg, Gbg)
Normalization
R, G
Estimation
Testing
Clustering
Discrimination
Biological verification and interpretation
5
Which genes are (relatively) up/down regulated?
  • Samples liver tissue from each of two kinds of
    mice, e.g. KO vs. WT, or mutant vs. WT

? n
T
C
? n
  • For each gene form the t statistic
  • average of n trt Ms
  • sqrt(1/n (SD of n trt Ms)2)

6
Which genes are (relatively) up/down regulated?
  • Samples as before, but also pooled control
    liver tissue

? n
T
C
? n
C
C
  • For each gene form the t statistic
  • average of n trt Ms - average of n ctl Ms
  • sqrt(1/n (SD of n trt Ms)2 (SD of n ctl Ms)2)

7
Multiple comparisons of interest
T2
T3
T4
T1
x 2
x 2
x 2
x 2
C
  • Samples Liver tissue from mice treated by
    cholesterol modifying drugs.
  • Question 1 Find genes that respond differently
    between the treatment and the control.
  • Question 2 Find genes that respond similarly
    across two or more treatments relative to control.

8
Interaction?
  • Samples treated cell lines at 4 time points
    (30 minutes, 1 hour, 4 hours, 24 hours)
  • Question Which genes contribute to the enhanced
    inhibitory effect of OSM when it is combined with
    EGF? Role of time?

ctl
OSM
? 4 times
OSM EGF
EGF
9
Gene Expression Data
  • Gene expression data on 1,2,3,4,5,... genes for 5
    slides

Slide (experiment)
slide1 slide2 slide3 slide4 slide5 1 0.46
0.30 0.80 1.51 0.90 2 -0.10 0.49 0.24
0.06 0.46 3 0.15 0.74 0.04 0.10
0.20 4 -0.45 -1.03 -0.79 -0.56 -0.32 5 -0.06
1.06 1.35 1.09 -1.09
Genes
Gene expression level of gene i on slide j

Log2( Red intensity / Green intensity)
Sometimes a common reference, e.g. green,
sometimes not.
10
Molecular development of sensory maps
11
Olfactory epithelium
  • GOAL Exploratory study to identify genes with
    altered expression between zone 1 and zone 4 of
    the olfactory epithelium for new born (P0) and
    adult (A) mice.
  • Tissue samples
  • P01 Zone 1 of epithelium from P0 mouse.
  • P04 Zone 4 of epithelium from P0 mouse.
  • A1 Zone 1 of epithelium from adult mouse.
  • A4 Zone 4 of epithelium from adult mouse.
  • Probes 19,000 mouse cDNAs.

12
Red stained region is the olfactory epithelium
13
Factorial Design as completed
Age Effect
2
A1
P01
4
Zone Effect
1
3
5
P04
A 4
14
Layout of the cDNA microarrays
  • Made in Ngai lab, UC Berkeley
  • Mouse ESTs, 19,200 spots.
  • Two different print groups, each with
  • 4 x 4 grid, each with
  • 25 x24 spots
  • Controls on the first 2 rows of each grid.

77
pg1
pg2
15
Two slides
P04 vs. P01 (pg2)
A1 vs. P01 (pg2)
16
Preprocessing - Image Analysis
1. Addressing locate centers
2. Segmentation classification of pixels either
as signal or background. using seeded region
growing).
3. Information extraction for each spot of the
array, calculates signal intensity pairs,
background and quality measures.
Results from SRG from P04 vs. P01
17
Preprocessing after image analysis
  • Where necessary, we carry out
  • Colour normalization (location and scale)
    within slides, possibly within pin-groups,
  • Scale normalization between slides,
  • A variety of other adjustments, e.g. to remove
    spatial artifacts.

18
Factorial design
m
ma
Different ways of estimating parameters. e.g. Z
effect. 1 (m z) - (m) z 2 - 5 ((m
a) - (m)) -((m a)-(m z)) (a) - (a z)
z 4 3 - 5 z
2
A1
P01
4
1
3
5
P04
A 4
mz
mzaza
How do we combine the information?
19
Regression analysis
Define a matrix X so that E(M)X?, see below. Use
least squares estimate for z, a, za for each
gene.
20
Estimates of zone effects log(zone 4 / zone1) vs
ave A
gene A
gene B
average log v(RG)
21
Estimates of zone effects vs SE
Z effect
  • ?
  • t ? / SE
  • ? ? t

Log2(SE)
22
Estimates of age effects vs estimates of zone
effects
Zone Age Zone ? Age
23
Top 50 genes from each effect
Zone . Age interaction
Age
19
0
48
29
2
0
19
Zone
24
In situ hybridization image
Gene A (up-regulated in zone 4)
25
Gene B (up-regulated in zone1)
26
(No Transcript)
27
Continuation the Mouse olfactory bulb
28
1-year old statement by our collaborator
  • Comparison of large regions of olfactory bulb
    fails to yield molecular differences.
  • Molecules involved in target recognition may be
    expressed in a limited subset of cells.
  • A new approach is required that possesses high
    sensitivity and throughput of analysis.

29
The olfactory bulb experiments
M
A
V
D
P
L
  • Samples tissues from different regions of the
    olfactory bulb.
  • Question 1 differences between different
    regions.
  • Question 2 identify genes with pre-specified
    patterns across regions.
  • Note novel design (controversial?)

30
Regression analysis
Define a matrix X so that E(M)X? Use least
squares estimates for A-L, P-L, D-L, V-L, M-L.
31
Contrasts
  • -- We can estimate all 15 different comparisons
    directly and/or indirectly
  • e.g. D - M (D - L) - (M - L)
  • -- For every gene we have a pattern based on the
    15 different comparisons.
  • e.g. Gene 5699,

32
Genes that share the same pattern
  • Find genes with smallest Euclidean distance to
    gene 5699 (whatever it is another story).
  • The second gene is a replicate of the first.

33

34
(No Transcript)
35
(No Transcript)
36
How the question got refined
  • After the design and carrying out of the
    experiment, and the initial analysis and
    follow-up in situ hybridizations to confirm our
    findings, we realized we had failed to perceive
    the most interesting question,
  • which was
  • Find genes whose expression patterns show
    (spatial) restriction across the bulb, i.e. not
    just gradients (differential expression), but
    localization.


37
(No Transcript)
38
Acknowledgments
  • Statistical collaborators
  • Yee Hwa Yang (Berkeley)
  • Sandrine Dudoit (Stanford)
  • Ingrid Lönnstedt (Uppsala)
  • Natalie Thorne (WEHI)
  • CSIRO Image Analysis Group
  • Michael Buckley
  • Ryan Lagerstorm
  • Ngai Lab (Berkeley)
  • Cynthia Duggan
  • Jonathan Scolnick
  • Dave Lin
  • Vivian Peng
  • Percy Luu
  • Elva Diaz
  • John Ngai
  • LBNL
  • Matt Callow

39
  • Some web sites
  • Technical reports, talk, software etc.
  • http//www.stat.berkeley.edu/users/terry/zarray/Ht
    ml/
  • Statistical software R GNUs S
    http//lib.stat.cmu.edu/R/CRAN/
  • Packages within R environment
  • -- Spot http//www.cmis.csiro.au/iap/spot.htm
  • -- SMA (statistics for microarray analysis)
    http//www.stat.berkeley.edu/users/terry/zarray/Ht
    ml
Write a Comment
User Comments (0)
About PowerShow.com