Overview of Affy Analysis Pipeline

About This Presentation

Title:

Overview of Affy Analysis Pipeline

Description:

Overview of Affy Analysis Pipeline – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 48

Provided by: PMO70

Category:

more less

Transcript and Presenter's Notes

Title: Overview of Affy Analysis Pipeline

1
Overview of Affy Analysis Pipeline

Help Pages
http//affy

2
(No Transcript)
3

Main starting point to start an analysis Run
Three main stop/starting points
Group files
Normalize data
Analyze data

Choose Project
Start New Analysis
Previous analysis sessions
Click to view files with a Group
4

Group CEL files from one or multiple projects

View files already selected
Select projects to choose Affy CEL files from
View all CEL files from selected Projects
5

Once the files are selected, group the files into
Sample Groups
You can add or delete sample groups
If you order the samples your data will be
reported in the same order
Select a reference sample. This sample will be
compared to all the other sample groups during
the analysis phase

6
Make sure all CEL files Are in the correct
Sample Group
7

Start the normalization run
Data is analyzed using Affy Specific Bioconductor
libraries
justRMA or justGCRMA are the main methods

Select processing method
8

The actual processing is moved off the to the
Batch Scheduler, so large runs should not be a
problem
After analysis is complete, follow the link to
view the normalized data file and to start
differential expression analysis

After normalization a single file is produced.
Four annotation columns and 1 column of Log 2
expression values for each CEL file analyzed.

To view previous normalization runs, Click the
Normalization tab

Click to view details
11

Start Differential expression analysis
Two Choices
MEV - Multiple Expression viewer from TIGR
Multtest Collection of algorithms from
Bioconductor

Multtest
MEV
12

Strart MEV via Java Web start
Data is pre-processed and a link is produced to
start MEV

Start SAM analysis or run t-test
Must have at least 2 replicates for each sample
group

14
SAM analysis

SAM analysis description from Tusher et al. 1
"SAM identifies genes with statistically
significant changes in expression by assimilating
a set of gene-specific t tests. Each gene is
assigned a score on the basis of its change in
gene expression relative to the standard
deviation of repeated measurements for that gene.
Genes with scores greater than a threshold are
deemed potentially significant. The percentage of
such genes identified by chance is the false
discovery rate (FDR). To estimate the FDR,
nonsense genes are identified by analyzing
permutations of the measurements. The threshold
can be adjusted to identify smaller or larger
sets of genes, and FDRs are calculated for each
set."1) Significance analysis of microarrays
applied to the ionizing radiation response. Proc
Natl Acad Sci U S A. 2001 Apr 2498(9)5116-21.
Epub 2001 Apr 17. Erratum in Proc Natl Acad Sci
U S A 2001 Aug 2898(18)10515.
The analysis run will automatically compare each
sample group to the reference sample
Example Four Sample groups A, B, C, D would
produce 3 comparisons A_vs_B, A_vs_C, A_vs_D

15
Calculating the FDR

For each Condition calculation run
At each step the Delta cutoff is plugged into an
equation that gives back a list of significant
genes and the FDR.
The program keeps track of which genes are
significant at each step and what the FDR rate
for the group of genes.
At low Delta cutoffs lots of genes will be
returned but they will have a high FDR.
At higher Delta cutoffs a smaller list of genes
should be returned with much lower FDRs.

16
Gene Name FDR
Gene_1 50
Gene_2 50
Gene_3 50
Gene_4 50
Gene_5 50
Gene_6 50
Gene_7 50
Gene_8 50
Gene_9 50
Gene_10 50
Gene_11 50
Gene_12 50
Gene_13 50
Gene_14 50
Gene_15 50
Gene_16 50
Gene_17 50
Gene_18 50
Gene_19 50
Gene_20 50
Gene_21 50
Gene_22 50
Gene_23 50
Gene_24 50
Genes Return At First Delta Cutoff
17
Gene Name FDR
Gene_1 10
Gene_2 10
Gene_3 10
Gene_4 10
Gene_5 10
Gene_6 10
Gene_7 10
Gene_8 10
Gene_9 10
Gene_10 10
Gene_11 10
Gene_12 50
Gene_13 50
Gene_14 50
Gene_15 50
Gene_16 50
Gene_17 50
Gene_18 50
Gene_19 50
Gene_20 50
Gene_21 50
Gene_22 50
Gene_23 50
Gene_24 50
Genes Return At Second Delta Cutoff
18
Gene Name FDR
Gene_1 5
Gene_2 5
Gene_3 5
Gene_4 10
Gene_5 10
Gene_6 10
Gene_7 10
Gene_8 10
Gene_9 10
Gene_10 10
Gene_11 10
Gene_12 50
Gene_13 50
Gene_14 50
Gene_15 50
Gene_16 50
Gene_17 50
Gene_18 50
Gene_19 50
Gene_20 50
Gene_21 50
Gene_22 50
Gene_23 50
Gene_24 50
Genes Return At Last Delta Cutoff
19

For each delta cutoff a list of genes is
returned. Record the lowest FDR a gene is found
at.
Rank the output according to the FDRs. Now the
data can be queried on the FDR which will return
a population of genes with a known FDR
For each condition 2-3 data files will be created
HTML file contains a list of genes with the
lowest FDRs. Also provides links to external
annotation. Plus a false color representation of
the log 2 expression data.
Text file with all the Ratio Data. Columns
include
Probe_set_id
Gene_Symbol, Gene_Title, Unigene, LocusLink,
Public_ID
FDR
SAM_ratio
mu_X
mu_Y
Log_2_Ratio
Log_10_Ratio
All Log 2 expression values for the CEL files
used in the analysis
Text file of updated canonical names
If the data is uploaded into the Get Expression
table in SBEAM this file is produced
Tries to turn all the canonical names to Ref Seq
protein Accession numbers or a Locus Link ID and
if neither exists it keep the DNA accession
number provided by Affymetrix

20
Launch the data into excel
View the web page Directly
21
Example of the text output All 45,000 rows for
mouse
Example of HTML out of the top genes
Results Genes Found with less then 6 FDR
Sample Groups Cast_none_Clean_Brain_vs_SJL_4wks_
Infected_Brain Number of Differential expressed
Genes 18 Number of False Positives 1
22
Add data to Get Expression Table
23
Click button to upload data
24
Warning if Condition already exists in SBEAMS
Click Checkbox to ignore warning
25
Viewing expression data in Cytoscape

Wanted to add expression data to Cytoscape with a
minimal amount of effort.
Different protein networks could be added as
needed
Also would like to take a look at the data in a
graphical format to see what differentially
expressed genes are shared between different
conditions
Making a Cytoscape Expression Network
Query the data from the Get Expression page. The
following options must be selected
Data Columns to Display Log 10 Ratio, False
Discovery Rate.
Display Options Show all conditions., Pivot
Conditions as columns
Behind the scenes the program will sort each
condition by the FDR and take the top 100 genes.
If you select a False Discovery Constraint (which
is recommended) the gene also must fall below the
cutoff
Each condition will be a large diamond shaped
node and all the genes will be a smaller circle.
Draw edges between a condition and a gene if the
gene meets the above criteria.
Once this is done a web page will present with 4
java web starts
We are utilizing the Gaggle version of Cytoscape
from Paul S.

26
Select Project
Select Conditions
Select Required Display fields Log10 ratio and
False Discovery Rate
Select required query options Show all Conditions
and Pivot Conditions
27
Number of gene selected.
Click to start Gaggle version of Cytoscape
28
(No Transcript)
29

The Gaggle Boss window manages all the programs
that can talk to one another

Launching Cytoscape loads the expression network
Each condition is the large diamond shaped node
Genes are circle nodes
Edges indicate that the gene is expressed within
a condition its connected to
Node colors are mapped to the Log10 expression
ratio
Condition A_vs_B
Green Overexpression in sample A
Red Overexpression in sample B
Edges color are mapped to the significance value.
False discovery rate in this example

The data matrix browser brings in all the
expression data
Currently loads the Significance values and Log
10 ratios for each condition

Click the folder first
Both the Log10 ratio and p-value Will be loaded.
Click the load data icon second
p-value is currently used to hold any
significance measurement
32

All the expression data will be loaded that was
found in the SQL query
Remember that if only one condition fell below
the significance cutoff for a particular gene,
the data for all conditions will be shown
Only the top 100 gene (currently) will be shown
on the Cytoscape map

33
Select a condition to map the expression data
onto the Cytoscape map
Run a movie to alternate between the different
conditions expression values
34
Broadcast the selected genes to all the programs
listening on the Gaggle
Select Genes of interest
35

Six genes were broadcast from the Cytoscape
window to the DMV
To view the expression profile of the selected
genes across all condition Click the Graph Button

Number of genes Currently selected
36

View the Graph with or without the condition names

37
(No Transcript)
38
Select a single gene from the profile
Broadcast the single selection
39
To find genes with a similar expression profile
use the correlation finder
Select the threshold and click Select in browser
40
(No Transcript)
41
Use the Volcano plot to get an overview of the
data. Significance val vs Expression val Select
the condition to graph
42
Select genes of interest and broadcast to the
other programs
43
Highlight genes by their GO ontology
Click the annotation type to view
Start Annotation viewer
44
Choose a GO annotation level
Click the Button
45
Select the GO term of interest. The genes will
be highlighted in the Cytoscape Window.
46
Make a custom view to display data in Cytoscape
in a spreadsheet
47
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Overview of Affy Analysis Pipeline - PowerPoint PPT Presentation

Overview of Affy Analysis Pipeline

Overview of Affy Analysis Pipeline – PowerPoint PPT presentation