Microarray Workshop Series

1 / 28

About This Presentation

Title:

Microarray Workshop Series

Description:

BASE and storage of microarray data (Johan van Heerden) ... Normalization (Excel, DNMAD, MIDAS) (Shane Murray) linear regression ... best imputation method ... –

Number of Views:45

Avg rating:3.0/5.0

Slides: 29

Provided by: nicola75

Category:

more less

Transcript and Presenter's Notes

Title: Microarray Workshop Series

1

Microarray Workshop Series
What we have done so far
Introduction to R syntax (Cathal Seoighe)
BASE and storage of microarray data (Johan van
Heerden)
Microarray Image capture (SallyAnn Walford)
Normalization (Excel, DNMAD, MIDAS) (Shane
Murray)
linear regression normalization
LOWESS print tip normalization
Microarray experimental design (Linda Haines)

2
What is on the menu for 19th-20th May
workshop?

Preprocessing of microarray data Nicci Illing
Identification of differentially expression
genes
Theory behind T-tests and ANOVA Francesca
Little
Working with microarray data Katherine Denby
Datamining (FatiGO) Nicky Mulder

3
Preprocessing of Microarray Data
Reference GEPAS tutorial http//base.mcb.uct.ac.
za/gepas Or http//gepas.bioinfo.cnio.es/
4
Preprocessing of Microarray Data

Scale transformation
Replicate handling
Missing value handling
Flat pattern filtering
Unknown gene removing
Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
5
Preprocessing of Microarray Data
Data signal intensity green channel
signal intensity red channel
6
Preprocessing of Microarray Data
A Different representation of expression
patterns of two genes over a 9 experimental data
points Gene A (Cy3) /ref sample (Cy5) Gene B
(Cy3)/ref sample (Cy5)
B Gene expression patterns in a colour
scale green high value red low values
7
Preprocessing of Microarray Data

Scale transformation

Raw expression ratios are on a asymmetrical
scale Convert expression ratios to log2 scale
Options 2, e or 10
8
Preprocessing of Microarray Data
2. Replicate handling
common to have several spots for one cDNA on a
slide several values for expression one
value for analysis of resultseither the average
of median value But what about inconsistencies
among replicates? GEPAS preprocessor has a
function to remove inconsistent replicates based
on maximum distance of data point (threshold) to
median
9
Preprocessing of Microarray Data
2. Replicate handling
10
Preprocessing of Microarray Data
3. Missing value handling
Gene expression data is often characterised by
missing values can be a problem for standard
hierarchical clustering methods Principal
component analysis cannot deal with missing values

What can you do to salvage the situation?
Two options
Remove patterns with excess of missing values
Impute missing values

11
Preprocessing of Microarray Data
3. Missing value handling

Remove patterns with excess of missing values
Impute missing values

65
40
83
K-nearest neighbour K user defined
parameter Define euclidean distance to determine
the nearest Neighbour Need nearly
complete patterns
12
Preprocessing of Microarray Data
4. Flat pattern filtering
13
Preprocessing of Microarray Data
5. Unknown gene removing
Useful if you want to concentrate your analysis
on a particular set of genes (read from an
external file)
14
Preprocessing of Microarray Data

Scale transformation
Replicate handling
Missing value handling
Flat pattern filtering
Unknown gene removing
Pattern standardization

Subtract the average value of the pattern from
each value, and divide the result by the stddev
15
6. Pattern standardization
16
Preprocessing of Microarray Data

Scale transformation
Replicate handling
Missing value handling
Flat pattern filtering
Unknown gene removing
Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
17
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for File format do all patterns have the
same number of conditions does each pattern
have a valid identifier indicates which symbols
are interpreted as missing valued Dimensions of
data set server expects to have more rows than
columns Scale of expression values server
plots a histogram of values found in the data
set and looks for negative values (sign that
data is not log transformed)
18
Pre-analysis Module dimensions of data set
scale of expression patterns
Scale of expression values server plots a
histogram of values found in the data set and
looks for negative values (sign that data is not
log transformed)
Server automatically log transforms the data, and
continues with the analysis
19
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Replicated genes server looks for
replicated genes and merges them reports list
of replicated genes number displayed plots
histogram of distances to median of
replicates useful for selecting thresholds to
remover inconsistent replicates
20
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Missing values server reports
number of missing values and suggests the best
imputation method
21
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Flat pattern filtering server
reports on number of patterns that will be
removed if filter by no of peaks is
selected filter by RMS is selected filter by
stddev is selected
22
number of patterns that will be removed if filter
by no of peaks is selected
23
number of patterns that will be removed if filter
by RMS is selected
24
number of patterns that will be removed if filter
by stddev is selected
25
Preprocessing of Microarray Data

Scale transformation
Replicate handling
Missing value handling
Flat pattern filtering
Unknown gene removing
Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
26
Preprocessing of Microarray Data
Interface (File formats)

plain text file
table separated by tabulators
each line corresponds to a gene, each column to
an expmtl condition
first column must contain the gene name
missing values should be left empty (no special
characters)
all lines beginning with are ignored
server does not need to know names of
conditions, but can be added in
for your own conveniencefirst element must be
NAMES
class labels can be added by starting a line
with LABELS

27
Preprocessing of Microarray Data

Scale transformation
Replicate handling
Missing value handling
Flat pattern filtering
Unknown gene removing
Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
28
Preprocessing of Microarray Data
Exercises GEPAS tutorial http//base.mcb.uct.ac
.za/gepas

Write a Comment

User Comments (0)