Title: Microarray Workshop Series
1- Microarray Workshop Series
- What we have done so far
- Introduction to R syntax (Cathal Seoighe)
- BASE and storage of microarray data (Johan van
Heerden) - Microarray Image capture (SallyAnn Walford)
- Normalization (Excel, DNMAD, MIDAS) (Shane
Murray) - linear regression normalization
- LOWESS print tip normalization
- Microarray experimental design (Linda Haines)
-
2What is on the menu for 19th-20th May
workshop?
- Preprocessing of microarray data Nicci Illing
- Identification of differentially expression
genes - Theory behind T-tests and ANOVA Francesca
Little - Working with microarray data Katherine Denby
- Datamining (FatiGO) Nicky Mulder
3Preprocessing of Microarray Data
Reference GEPAS tutorial http//base.mcb.uct.ac.
za/gepas Or http//gepas.bioinfo.cnio.es/
4Preprocessing of Microarray Data
- Scale transformation
- Replicate handling
- Missing value handling
- Flat pattern filtering
- Unknown gene removing
- Pattern standardization
Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
5Preprocessing of Microarray Data
Data signal intensity green channel
signal intensity red channel
6Preprocessing of Microarray Data
A Different representation of expression
patterns of two genes over a 9 experimental data
points Gene A (Cy3) /ref sample (Cy5) Gene B
(Cy3)/ref sample (Cy5)
B Gene expression patterns in a colour
scale green high value red low values
7Preprocessing of Microarray Data
Raw expression ratios are on a asymmetrical
scale Convert expression ratios to log2 scale
Options 2, e or 10
8Preprocessing of Microarray Data
2. Replicate handling
common to have several spots for one cDNA on a
slide several values for expression one
value for analysis of resultseither the average
of median value But what about inconsistencies
among replicates? GEPAS preprocessor has a
function to remove inconsistent replicates based
on maximum distance of data point (threshold) to
median
9Preprocessing of Microarray Data
2. Replicate handling
10Preprocessing of Microarray Data
3. Missing value handling
Gene expression data is often characterised by
missing values can be a problem for standard
hierarchical clustering methods Principal
component analysis cannot deal with missing values
- What can you do to salvage the situation?
- Two options
- Remove patterns with excess of missing values
- Impute missing values
11Preprocessing of Microarray Data
3. Missing value handling
- Remove patterns with excess of missing values
- Impute missing values
65
40
83
K-nearest neighbour K user defined
parameter Define euclidean distance to determine
the nearest Neighbour Need nearly
complete patterns
12Preprocessing of Microarray Data
4. Flat pattern filtering
13Preprocessing of Microarray Data
5. Unknown gene removing
Useful if you want to concentrate your analysis
on a particular set of genes (read from an
external file)
14Preprocessing of Microarray Data
- Scale transformation
- Replicate handling
- Missing value handling
- Flat pattern filtering
- Unknown gene removing
- Pattern standardization
Subtract the average value of the pattern from
each value, and divide the result by the stddev
156. Pattern standardization
16Preprocessing of Microarray Data
- Scale transformation
- Replicate handling
- Missing value handling
- Flat pattern filtering
- Unknown gene removing
- Pattern standardization
Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
17Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for File format do all patterns have the
same number of conditions does each pattern
have a valid identifier indicates which symbols
are interpreted as missing valued Dimensions of
data set server expects to have more rows than
columns Scale of expression values server
plots a histogram of values found in the data
set and looks for negative values (sign that
data is not log transformed)
18Pre-analysis Module dimensions of data set
scale of expression patterns
Scale of expression values server plots a
histogram of values found in the data set and
looks for negative values (sign that data is not
log transformed)
Server automatically log transforms the data, and
continues with the analysis
19Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Replicated genes server looks for
replicated genes and merges them reports list
of replicated genes number displayed plots
histogram of distances to median of
replicates useful for selecting thresholds to
remover inconsistent replicates
20Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Missing values server reports
number of missing values and suggests the best
imputation method
21Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Flat pattern filtering server
reports on number of patterns that will be
removed if filter by no of peaks is
selected filter by RMS is selected filter by
stddev is selected
22number of patterns that will be removed if filter
by no of peaks is selected
23number of patterns that will be removed if filter
by RMS is selected
24number of patterns that will be removed if filter
by stddev is selected
25Preprocessing of Microarray Data
- Scale transformation
- Replicate handling
- Missing value handling
- Flat pattern filtering
- Unknown gene removing
- Pattern standardization
Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
26Preprocessing of Microarray Data
Interface (File formats)
- plain text file
- table separated by tabulators
- each line corresponds to a gene, each column to
an expmtl condition - first column must contain the gene name
- missing values should be left empty (no special
characters) - all lines beginning with are ignored
- server does not need to know names of
conditions, but can be added in - for your own conveniencefirst element must be
NAMES - class labels can be added by starting a line
with LABELS
27Preprocessing of Microarray Data
- Scale transformation
- Replicate handling
- Missing value handling
- Flat pattern filtering
- Unknown gene removing
- Pattern standardization
Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
28Preprocessing of Microarray Data
Exercises GEPAS tutorial http//base.mcb.uct.ac
.za/gepas