Overview of Affy Analysis Pipeline - PowerPoint PPT Presentation

About This Presentation
Title:

Overview of Affy Analysis Pipeline

Description:

Overview of Affy Analysis Pipeline – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 48
Provided by: PMO70
Category:

less

Transcript and Presenter's Notes

Title: Overview of Affy Analysis Pipeline


1
Overview of Affy Analysis Pipeline
  • Help Pages
  • http//affy

2
(No Transcript)
3
  • Main starting point to start an analysis Run
  • Three main stop/starting points
  • Group files
  • Normalize data
  • Analyze data

Choose Project
Start New Analysis
Previous analysis sessions
Click to view files with a Group
4
  • Group CEL files from one or multiple projects

View files already selected
Select projects to choose Affy CEL files from
View all CEL files from selected Projects
5
  • Once the files are selected, group the files into
    Sample Groups
  • You can add or delete sample groups
  • If you order the samples your data will be
    reported in the same order
  • Select a reference sample. This sample will be
    compared to all the other sample groups during
    the analysis phase

6
Make sure all CEL files Are in the correct
Sample Group
7
  • Start the normalization run
  • Data is analyzed using Affy Specific Bioconductor
    libraries
  • justRMA or justGCRMA are the main methods

Select processing method
8
  • The actual processing is moved off the to the
    Batch Scheduler, so large runs should not be a
    problem
  • After analysis is complete, follow the link to
    view the normalized data file and to start
    differential expression analysis

9
  • After normalization a single file is produced.
    Four annotation columns and 1 column of Log 2
    expression values for each CEL file analyzed.

10
  • To view previous normalization runs, Click the
    Normalization tab

Click to view details
11
  • Start Differential expression analysis
  • Two Choices
  • MEV - Multiple Expression viewer from TIGR
  • Multtest Collection of algorithms from
    Bioconductor

Multtest
MEV
12
  • Strart MEV via Java Web start
  • Data is pre-processed and a link is produced to
    start MEV

13
  • Start SAM analysis or run t-test
  • Must have at least 2 replicates for each sample
    group

14
SAM analysis
  • SAM analysis description from Tusher et al. 1
    "SAM identifies genes with statistically
    significant changes in expression by assimilating
    a set of gene-specific t tests. Each gene is
    assigned a score on the basis of its change in
    gene expression relative to the standard
    deviation of repeated measurements for that gene.
    Genes with scores greater than a threshold are
    deemed potentially significant. The percentage of
    such genes identified by chance is the false
    discovery rate (FDR). To estimate the FDR,
    nonsense genes are identified by analyzing
    permutations of the measurements. The threshold
    can be adjusted to identify smaller or larger
    sets of genes, and FDRs are calculated for each
    set."1) Significance analysis of microarrays
    applied to the ionizing radiation response. Proc
    Natl Acad Sci U S A. 2001 Apr 2498(9)5116-21.
    Epub 2001 Apr 17. Erratum in Proc Natl Acad Sci
    U S A 2001 Aug 2898(18)10515.
  • The analysis run will automatically compare each
    sample group to the reference sample
  • Example Four Sample groups A, B, C, D would
    produce 3 comparisons A_vs_B, A_vs_C, A_vs_D

15
Calculating the FDR
  • For each Condition calculation run
  • At each step the Delta cutoff is plugged into an
    equation that gives back a list of significant
    genes and the FDR.
  • The program keeps track of which genes are
    significant at each step and what the FDR rate
    for the group of genes.
  • At low Delta cutoffs lots of genes will be
    returned but they will have a high FDR.
  • At higher Delta cutoffs a smaller list of genes
    should be returned with much lower FDRs.

16
Gene Name FDR
Gene_1 50
Gene_2 50
Gene_3 50
Gene_4 50
Gene_5 50
Gene_6 50
Gene_7 50
Gene_8 50
Gene_9 50
Gene_10 50
Gene_11 50
Gene_12 50
Gene_13 50
Gene_14 50
Gene_15 50
Gene_16 50
Gene_17 50
Gene_18 50
Gene_19 50
Gene_20 50
Gene_21 50
Gene_22 50
Gene_23 50
Gene_24 50
Genes Return At First Delta Cutoff
17
Gene Name FDR
Gene_1 10
Gene_2 10
Gene_3 10
Gene_4 10
Gene_5 10
Gene_6 10
Gene_7 10
Gene_8 10
Gene_9 10
Gene_10 10
Gene_11 10
Gene_12 50
Gene_13 50
Gene_14 50
Gene_15 50
Gene_16 50
Gene_17 50
Gene_18 50
Gene_19 50
Gene_20 50
Gene_21 50
Gene_22 50
Gene_23 50
Gene_24 50
Genes Return At Second Delta Cutoff
18
Gene Name FDR
Gene_1 5
Gene_2 5
Gene_3 5
Gene_4 10
Gene_5 10
Gene_6 10
Gene_7 10
Gene_8 10
Gene_9 10
Gene_10 10
Gene_11 10
Gene_12 50
Gene_13 50
Gene_14 50
Gene_15 50
Gene_16 50
Gene_17 50
Gene_18 50
Gene_19 50
Gene_20 50
Gene_21 50
Gene_22 50
Gene_23 50
Gene_24 50
Genes Return At Last Delta Cutoff
19
  • For each delta cutoff a list of genes is
    returned. Record the lowest FDR a gene is found
    at.
  • Rank the output according to the FDRs. Now the
    data can be queried on the FDR which will return
    a population of genes with a known FDR
  • For each condition 2-3 data files will be created
  • HTML file contains a list of genes with the
    lowest FDRs. Also provides links to external
    annotation. Plus a false color representation of
    the log 2 expression data.
  • Text file with all the Ratio Data. Columns
    include
  • Probe_set_id
  • Gene_Symbol, Gene_Title, Unigene, LocusLink,
    Public_ID
  • FDR
  • SAM_ratio
  • mu_X
  • mu_Y
  • Log_2_Ratio
  • Log_10_Ratio
  • All Log 2 expression values for the CEL files
    used in the analysis
  • Text file of updated canonical names
  • If the data is uploaded into the Get Expression
    table in SBEAM this file is produced
  • Tries to turn all the canonical names to Ref Seq
    protein Accession numbers or a Locus Link ID and
    if neither exists it keep the DNA accession
    number provided by Affymetrix

20
Launch the data into excel
View the web page Directly
21
Example of the text output All 45,000 rows for
mouse
Example of HTML out of the top genes
Results Genes Found with less then 6 FDR
Sample Groups Cast_none_Clean_Brain_vs_SJL_4wks_
Infected_Brain Number of Differential expressed
Genes 18 Number of False Positives 1
22
Add data to Get Expression Table
23
Click button to upload data
24
Warning if Condition already exists in SBEAMS
Click Checkbox to ignore warning
25
Viewing expression data in Cytoscape
  • Wanted to add expression data to Cytoscape with a
    minimal amount of effort.
  • Different protein networks could be added as
    needed
  • Also would like to take a look at the data in a
    graphical format to see what differentially
    expressed genes are shared between different
    conditions
  • Making a Cytoscape Expression Network
  • Query the data from the Get Expression page. The
    following options must be selected
  • Data Columns to Display Log 10 Ratio, False
    Discovery Rate.
  • Display Options Show all conditions., Pivot
    Conditions as columns
  • Behind the scenes the program will sort each
    condition by the FDR and take the top 100 genes.
    If you select a False Discovery Constraint (which
    is recommended) the gene also must fall below the
    cutoff
  • Each condition will be a large diamond shaped
    node and all the genes will be a smaller circle.
    Draw edges between a condition and a gene if the
    gene meets the above criteria.
  • Once this is done a web page will present with 4
    java web starts
  • We are utilizing the Gaggle version of Cytoscape
    from Paul S.

26
Select Project
Select Conditions
Select Required Display fields Log10 ratio and
False Discovery Rate
Select required query options Show all Conditions
and Pivot Conditions
27
Number of gene selected.
Click to start Gaggle version of Cytoscape
28
(No Transcript)
29
  • The Gaggle Boss window manages all the programs
    that can talk to one another

30
  • Launching Cytoscape loads the expression network
  • Each condition is the large diamond shaped node
  • Genes are circle nodes
  • Edges indicate that the gene is expressed within
    a condition its connected to
  • Node colors are mapped to the Log10 expression
    ratio
  • Condition A_vs_B
  • Green Overexpression in sample A
  • Red Overexpression in sample B
  • Edges color are mapped to the significance value.
    False discovery rate in this example

31
  • The data matrix browser brings in all the
    expression data
  • Currently loads the Significance values and Log
    10 ratios for each condition

Click the folder first
Both the Log10 ratio and p-value Will be loaded.
Click the load data icon second
p-value is currently used to hold any
significance measurement
32
  • All the expression data will be loaded that was
    found in the SQL query
  • Remember that if only one condition fell below
    the significance cutoff for a particular gene,
    the data for all conditions will be shown
  • Only the top 100 gene (currently) will be shown
    on the Cytoscape map

33
Select a condition to map the expression data
onto the Cytoscape map
Run a movie to alternate between the different
conditions expression values
34
Broadcast the selected genes to all the programs
listening on the Gaggle
Select Genes of interest
35
  • Six genes were broadcast from the Cytoscape
    window to the DMV
  • To view the expression profile of the selected
    genes across all condition Click the Graph Button

Number of genes Currently selected
36
  • View the Graph with or without the condition names

37
(No Transcript)
38
Select a single gene from the profile
Broadcast the single selection
39
To find genes with a similar expression profile
use the correlation finder
Select the threshold and click Select in browser
40
(No Transcript)
41
Use the Volcano plot to get an overview of the
data. Significance val vs Expression val Select
the condition to graph
42
Select genes of interest and broadcast to the
other programs
43
Highlight genes by their GO ontology
Click the annotation type to view
Start Annotation viewer
44
Choose a GO annotation level
Click the Button
45
Select the GO term of interest. The genes will
be highlighted in the Cytoscape Window.
46
Make a custom view to display data in Cytoscape
in a spreadsheet
47
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com