Microarray Workshop Series - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Microarray Workshop Series

Description:

BASE and storage of microarray data (Johan van Heerden) ... Normalization (Excel, DNMAD, MIDAS) (Shane Murray) linear regression ... best imputation method ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: nicola75
Category:

less

Transcript and Presenter's Notes

Title: Microarray Workshop Series


1
  • Microarray Workshop Series
  • What we have done so far
  • Introduction to R syntax (Cathal Seoighe)
  • BASE and storage of microarray data (Johan van
    Heerden)
  • Microarray Image capture (SallyAnn Walford)
  • Normalization (Excel, DNMAD, MIDAS) (Shane
    Murray)
  • linear regression normalization
  • LOWESS print tip normalization
  • Microarray experimental design (Linda Haines)

2
What is on the menu for 19th-20th May
workshop?
  • Preprocessing of microarray data Nicci Illing
  • Identification of differentially expression
    genes
  • Theory behind T-tests and ANOVA Francesca
    Little
  • Working with microarray data Katherine Denby
  • Datamining (FatiGO) Nicky Mulder

3
Preprocessing of Microarray Data
Reference GEPAS tutorial http//base.mcb.uct.ac.
za/gepas Or http//gepas.bioinfo.cnio.es/
4
Preprocessing of Microarray Data
  • Scale transformation
  • Replicate handling
  • Missing value handling
  • Flat pattern filtering
  • Unknown gene removing
  • Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
5
Preprocessing of Microarray Data
Data signal intensity green channel
signal intensity red channel
6
Preprocessing of Microarray Data
A Different representation of expression
patterns of two genes over a 9 experimental data
points Gene A (Cy3) /ref sample (Cy5) Gene B
(Cy3)/ref sample (Cy5)
B Gene expression patterns in a colour
scale green high value red low values
7
Preprocessing of Microarray Data
  • Scale transformation

Raw expression ratios are on a asymmetrical
scale Convert expression ratios to log2 scale
Options 2, e or 10
8
Preprocessing of Microarray Data
2. Replicate handling
common to have several spots for one cDNA on a
slide several values for expression one
value for analysis of resultseither the average
of median value But what about inconsistencies
among replicates? GEPAS preprocessor has a
function to remove inconsistent replicates based
on maximum distance of data point (threshold) to
median
9
Preprocessing of Microarray Data
2. Replicate handling
10
Preprocessing of Microarray Data
3. Missing value handling
Gene expression data is often characterised by
missing values can be a problem for standard
hierarchical clustering methods Principal
component analysis cannot deal with missing values
  • What can you do to salvage the situation?
  • Two options
  • Remove patterns with excess of missing values
  • Impute missing values

11
Preprocessing of Microarray Data
3. Missing value handling
  • Remove patterns with excess of missing values
  • Impute missing values

65
40
83
K-nearest neighbour K user defined
parameter Define euclidean distance to determine
the nearest Neighbour Need nearly
complete patterns
12
Preprocessing of Microarray Data
4. Flat pattern filtering
13
Preprocessing of Microarray Data
5. Unknown gene removing
Useful if you want to concentrate your analysis
on a particular set of genes (read from an
external file)
14
Preprocessing of Microarray Data
  • Scale transformation
  • Replicate handling
  • Missing value handling
  • Flat pattern filtering
  • Unknown gene removing
  • Pattern standardization

Subtract the average value of the pattern from
each value, and divide the result by the stddev
15
6. Pattern standardization
16
Preprocessing of Microarray Data
  • Scale transformation
  • Replicate handling
  • Missing value handling
  • Flat pattern filtering
  • Unknown gene removing
  • Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
17
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for File format do all patterns have the
same number of conditions does each pattern
have a valid identifier indicates which symbols
are interpreted as missing valued Dimensions of
data set server expects to have more rows than
columns Scale of expression values server
plots a histogram of values found in the data
set and looks for negative values (sign that
data is not log transformed)
18
Pre-analysis Module dimensions of data set
scale of expression patterns
Scale of expression values server plots a
histogram of values found in the data set and
looks for negative values (sign that data is not
log transformed)
Server automatically log transforms the data, and
continues with the analysis
19
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Replicated genes server looks for
replicated genes and merges them reports list
of replicated genes number displayed plots
histogram of distances to median of
replicates useful for selecting thresholds to
remover inconsistent replicates
20
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Missing values server reports
number of missing values and suggests the best
imputation method
21
Pre-analysis Module dimensions of data set
scale of expression patterns
Checks for Flat pattern filtering server
reports on number of patterns that will be
removed if filter by no of peaks is
selected filter by RMS is selected filter by
stddev is selected
22
number of patterns that will be removed if filter
by no of peaks is selected
23
number of patterns that will be removed if filter
by RMS is selected
24
number of patterns that will be removed if filter
by stddev is selected
25
Preprocessing of Microarray Data
  • Scale transformation
  • Replicate handling
  • Missing value handling
  • Flat pattern filtering
  • Unknown gene removing
  • Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
26
Preprocessing of Microarray Data
Interface (File formats)
  • plain text file
  • table separated by tabulators
  • each line corresponds to a gene, each column to
    an expmtl condition
  • first column must contain the gene name
  • missing values should be left empty (no special
    characters)
  • all lines beginning with are ignored
  • server does not need to know names of
    conditions, but can be added in
  • for your own conveniencefirst element must be
    NAMES
  • class labels can be added by starting a line
    with LABELS

27
Preprocessing of Microarray Data
  • Scale transformation
  • Replicate handling
  • Missing value handling
  • Flat pattern filtering
  • Unknown gene removing
  • Pattern standardization

Pre-analysis Module dimensions of data set
scale of expression patterns Interface (File
formats) Exercises
28
Preprocessing of Microarray Data
Exercises GEPAS tutorial http//base.mcb.uct.ac
.za/gepas
Write a Comment
User Comments (0)
About PowerShow.com