Title: PUMAdb: Data Analysis Tutorial
1PUMAdb Data Analysis Tutorial
2User Help Help, Tutorials and Workshops
- Help FAQ
- http//puma.princeton.edu/help/
- http//puma.princeton.edu/help/FAQ.shtml
- Tutorials regularly scheduled
- Welcome tutorial
- Data analysis, Normalization and Clustering
- Interested? Email array_at_genomics.princeton.edu
- Hybridization Scanning Individual Instruction
- Email dstorton_at_molbio.princeton.edu
3PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
4Data Analysis Background
- Data normalization
- Transforms data for cross-array comparison, by
eliminating or compensating for some biases. - Clustering algorithms
- Identifies and reveals patterns within the data.
- Data centering
- Transforms data for within-array comparison.
5What is data normalization?
- Normalization is an attempt to correct for
systematic bias in data. - Normalization allows you to compare data from one
array to another. - In practice we do not always understand the data
- inevitably some biology will be removed too (or
at least not revealed).
6Tumor
Pool of Cell Lines
7Such biases have consequences
- Plotting the frequency of un-normalized
intensities reveals the differential effect
between the two channels.
8How do we deal with this?
- Normalization
- In general, an assumption is made that the
average gene does not change. - You need to understand your data, to know if that
is an appropriate assumption or not. - The number of reporters (clones or genes) you
are assaying will affect this.
9Normalization
10Effect on log ratios
Un-normalized
Normalized
Frequency
Log-ratios
11Total Intensity Normalization
- For those spots that are thought to be well
measured, calculate mean or median log ratio. - Use this as a normalization factor to adjust all
log ratios. - Equivalent to assuming same total intensity in
both channels. - Our current software
- provides two simple methods for selection of well
measured spots pixel-by-pixel regression, and
foreground over background intensity. - calculates normalized values for all channel 2
measurements, and ratios.
12Normalization by Subset
- Housekeeping genes
- Calculate normalization based on biologically
determined stable genes. - Not always valid even very stable genes can
respond to some conditions. - Spiking or doping controls
- Calculate based on introduced DNA species.
- Requires careful measurement of total DNA in each
channel. - Our software accepts a global (per array),
user-defined normalization factor for this
purpose.
13PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
14Clustering Algorithms
In microarray studies, we often use clustering
algorithms to help us identify patterns in
complex data. For example, we can randomize the
data used to represent this painting and see if
clustering will help us visualize the pattern.
15Clustering algorithms
?
The painting is sliced into rows which are then
randomized.
16Clustering algorithms
Rows ordered by hierarchical clustering with
nodes flipped to optimize ordering
17Clustering algorithms
Rows ordered by Self-Organizing Maps
18Clustering Random vs. Biological Data
From Eisen MB, et al, PNAS 1998 95(25)14863-8
19How does clustering work?
- Compare all expression patterns to each other.
- Join patterns that are the most similar out of
all patterns. - Compare all joined and unjoined patterns.
- Go to step 2, and repeat until all patterns are
joined.
20How do we compare expression profiles?
- Treat expression data for a gene as a
multidimensional vector. - Decide on a distance metric to compare the
vectors. - Plenty to choose from
- Pearson correlation, Euclidean Distance,
Manhattan Distance etc.
21Expression Vectors
- Crucial concept for understanding clustering
- Each gene is represented by a vector where
coordinates are its values (log(ratio)) in each
experiment - x log(ratio)expt1
- y log(ratio)expt2
- z log(ratio)expt3
- etc.
22Distance Metrics
- Distances are measured between expression
vectors - Distance metrics define the way we measure
distances - Many different ways to measure distance
- Euclidean distance
- Pearson correlation coefficient(s)
- Manhattan distance
- Mutual information
- Kendalls Tau
- etc.
- Each has different properties and can reveal
different features of the data
23Euclidean distance
- The Euclidean distance metric detects similar
vectors by identifying those that are closest in
space. - In this example, A and C are closest to one
another.
24Pearson correlation
- The Pearson correlation disregards the magnitude
of the vectors but instead compares their
directions. - In this example, Gene A and Gene B have the same
slope, so would be most similar to each other.
25Distance Metric Pearson vs. Euclidean
A
B
C
- By Euclidean distance, A and B are most similar.
- By Pearson correlation, A and C are most similar.
26Hierarchical Clustering
- Calculate the distance between all genes. Find
the smallest distance. If several pairs share the
same similarity, use a predetermined rule to
decide between alternatives. - Fuse the two selected clusters to produce a new
cluster that now contains at least two objects.
Calculate the distance between the new cluster
and all other clusters. - Repeat steps 1 and 2 until only a single cluster
remains. - Draw a tree representing the results.
27Clustering Optimizing node order
- When joining a gene vector to another, it is
important to think about the order in which the
nodes are joined. - In this example, ASH1 is allegedly most similar
to PIR1, so their patterns are displayed adjacent
to one another.
28And we finally get a cluster
29Clustering Two-way clustering
- Just as gene patterns are clustered, array
patterns can be clustered. - All the data points for an array can be used to
construct a vector for that array and the vectors
of multiple arrays can be compared.
30Clustering Two-way Clustering
Two-way clustering can help show which samples
are most similar, as well as which genes.
31So is clustering the solution?
- Advantages
- Simple
- Easy to implement
- Easy to visualize
- Disadvantages
- Can lead to incorrect/incomplete conclusions
- Discarding of subtleties in 2-way clustering
- May be driven by strong sub-clusters
32Clustering Partitioning Methods
- Split data up into smaller, more homogenous sets
- Should avoid artifacts associated with
incorrectly joining dissimilar vectors - Can cluster each partition independently of
others - Self-Organizing Maps is one partitioning method
33Clustering Self Organizing Maps
- SOMs result in genes being assigned to partitions
of most similar genes. - Neighboring partitions are more similar to each
other than they are to distant partitions.
34The 64,000 question
- How many partitions do I use?
- Ask a statistician
- Tibshirani R, et al. (2000) Estimating the number
of clusters in a dataset via the Gap statistic - http//www-stat.stanford.edu/tibs/ftp/gap.pdf
- Ask us, and well say trial and error -)
- The ideal outcome is a single expression pattern
in each partition, and each partition distinct
from the others.
35PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
36Data Centering
- Centering sets the average value of a vector to
zero. - This results in a loss of information, but may
reveal important patterns.
37Data Centering
- Gene centering is useful when the actual value of
the ratio is not important or is not meaningful
(e.g., common reference). - Centering is generally not appropriate when using
a biologically meaningful control sample, such as
a matched, untreated sample, or a zero timepoint.
38Data Transformation Centering
- To illustrate how centering affects data, a small
sample of data were duplicated. A constant was
added to the second copy of each row
39Data Centering Effects of Different Centering
Strategies
Uncentered Data, No Centering Metric During
Clustering
Uncentered Data, Centering Metric During
Clustering
Centered Data, No Centering Metric During
Clustering
Centered Data, Centering Metric During Clustering
40PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
41Data Retrieval and Analysis
- Experiment names will be listed with feature
extraction software indicated.
42Gene Selection and Annotation
- Specify genes or clones
- Collapse data by SUID or LUID
- Determine UID column
- Choose biological annotation
- Label result set
43Gene Selection Specify Genes or Clones
- Use all genes or clones on an array
- Select a genelist from your loader account
- Enter a list of genes to select. The names
should be separated by two colons
44Gene Selection All genes
- Ten arrays
- All genes
- No control or empty spots, Spot flag 0
- 8690 SUIDs used in cluster
- Using all genes results in a very long cluster!
45Gene Selection Genelists
- Ten arrays
- 500-gene genelist
- No control or empty spots, Spot flag 0
- 380 SUIDs used for cluster
- Using a genelist reduces the length of the cluster
46Gene Selection Specify Genes or Clones
- Using all genes or clones on an array will give
you a very long list of genes. This is the best
option when you have no pre-existing expectations
about your data and simply want to see what is
happening. - Selecting a genelist from your loader account
will give you a more select group of genes. This
can be appropriate for testing hypotheses.
47Gene Selection Retrieving and Collapsing Data
- Collapse or averaging occurs within a single
array. Multiple instances of the same entity
will be combined as specified. - Duplicated entities can be defined in three ways
- Sequence Unique ID (the identifier for a
reporter). A SUID refers to the sequence itself. - Laboratory Unique ID (the identifier for the
source of the sample in the lab). An LUID refers
to a specific microtiter well. Multiple LUIDs
may correspond to one SUID. - SPOT (the number corresponding to a feature on a
print). This option only appears for retrieval
from a single print (array design). Multiple
spots/features on an array may contain a single
LUID or SUID.
48Gene Selection Collapse by SUID
- Ten arrays
- 500 gene genelist
- No control or empty spots, Spot flag 0
- 380 SUIDs used for cluster
49Gene Selection Collapse by LUID
- Ten arrays
- Gene list of 500 genes
- No control or empty spots
- Retrieve by LUID
- 397 LUIDs used for cluster
- Retrieving via LUIDs may increase the number of
gene vectors generated
50Gene Selection Collapse Data
- Retrieving by SUID (databases identifier for
sequence) yields 380 genes -- samples that came
from different microtiter wells will be collapsed
if they are called the same sequence - Retrieving by LUID (the identifier for the
original microtiter well location of the sample)
yields 397 genes -- even if samples are the
same sequence, they will not be collapsed if they
come from different microtiter wells
51Gene Annotation UID column
- Rows of data can be labeled with one of four
options - Systematic name / clone ID (the default)
- SUID gives the databases unique ID
- LUID gives the labs unique ID (we dont always
have data for this defaults to SUID) - SPOT gives the spot number
52Gene Annotation Biological Annotation
- The list includes all information stored within
the database for any gene from the organism in
question. Not all genes will have all
annotations. - Annotations from a genelist (selected earlier)
can be used to describe the genes
53Array Annotation Name Choices
- Arrays (hybridizations) are identified in the
database by slide name (e.g., serial number) and
experiment name, both unique. - Agilent and Affymetrix data sets are further
identified by a result set name possibly more
than one per hybridization, and not guaranteed to
be unique.
54Gene Selection and Annotation Summary
- Specify genes or clones
- Collapse data by SUID or LUID
- Determine UID column
- Choose biological annotation
- Label arrays/hybridizations
55PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
56Data Filtering
- Choose data column to retrieve
- Elect to invert reverse dye replicates
- Elect to filter by spot flag
- Select spot criteria for filtering
- Define image presentation options
57Data Filtering Choose Data to Retrieve
- You can retrieve and cluster any numerical
measurement from your data. - Clustering doesnt necessarily make sense for all
fields. - Default (and most appropriate) fields for
clustering are log ratio (two-channel data) and
signal/intensity (single-channel data).
58Data Filtering Spot Flags, Reverse Replicates
- Unreliable spots (identified by software or
visual inspection) can be flagged. Spots that
are not flagged are given a flag value of 0. - Autoflags (GenePix 5.0) are included in this
option. - If your experiments are identified as reverse
replicates, clicking on the reverse option will
properly invert the ratio and log ratio data.
59Data Filtering Selecting Filtering Criteria
- Each spot will be individually assessed as
specified, prior to any averaging or collapse. - Each filter can be made active and customized as
desired. - Filters can be combined using logical operators
(filter string), defaulting to a logical AND. - Filters available will be appropriate to the
feature extraction software used. The exception
is ScanAlyze and older versions of GenePix, which
get (but cant use) all options for GenePix.
60Data Filtering Default Spot Filters
- Regression correlation measures pixel-by-pixel
agreement between the two channels. - Foreground/Background intensities are a simple
measure of signal to noise. - Absolute intensity cutoffs impose a minimum net
signal. - Failed and Is Contaminated refer to the
quality of the spotted material. - Equivalent defaults are presented for Agilent
data. - Affymetrix data can be filtered on detection,
detection p-value, etc. - Any data, including biological annotations, can
be used for customized filters.
61Data Filtering Filter selection
- Data filters should be customized for the data
retrieved. - Uniform filter values will be applied to each
array retrieved. - The database makes available some basic tools for
examining data and choosing appropriate filter
values.
62Data Filtering Filter Selection
- Any numerical field can be plotted against any
other (or none), in a scatter plot or histogram. - This is useful for quality assessment, and for
selecting filters.
63Data Filtering Regression Correlation
- Plot filter field (here regression correlation)
against test field (log ratio). - Log ratios should center around 0.
- Here, the log ratios appear to diverge below a
regression correlation of about 0.4 - 0.6.
64Spots with low regression correlation
65Data Filtering
- Ten arrays
- 500 gene gene list
- Spot flag 0
- No other filters
- 380 SUIDs used for cluster
66Data Filtering Regression Correlation
- Ten arrays
- 500 gene Genelist
- Spot flag 0
- Regression correlation gt 0.6
- 380 SUIDs used for filtering
- Filtering away spots with low regression
correlation removes many spots
67Data Filtering Regression Correlation
- Ten arrays
- 500-gene genelist
- Spot flag 0
- Regression correlation gt 0.8
- 364 SUIDs used for clustering
- A more stringent filter reduces the data quite a
bit and even removes some genes entirely
68Data Filtering Foreground to Background
Intensity Ratios
- FG/BG (log scale) versus log ratio
- Data center around 0
- Impose cutoff at 2.5 (linear) to eliminate
flare at low relative intensity.
69Data Filtering Intensity to Background Ratios
- Ten arrays used
- 500-gene genelist
- Spot flag 0
- Normalized Channel 2 (red) mean intensity divided
by Normalized Channel 2 median background greater
than 2.5 - 371 SUIDs used for clustering
- Some arrays show very high background and some
genes show such high background that they did not
pass this filter in any array
70Data Filtering Intensity to Background Ratios
- Ten arrays used
- 500-gene genelist
- Spot flag 0
- Channel 1 (green) mean intensity divided by
Channel 1 median background greater than 2.5 - 377 SUIDs used for clustering
- Often, background can be higher in one channel --
note that fewer data are removed here than when
we used the same filter on Channel 2 (red)
71Data Filtering Intensity to Background Ratios
72Data Filtering Intensity Cutoff
- More than one way to look at a fish.
73Data Filtering Combinations of Filters
- Ten arrays
- 500-gene genelist
- Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- 374 SUIDs selected for clustering
- This data set was formed by selecting spots that
are good quality (via the regression correlation)
and good intensity in at least one channel
74Data Filtering
- No filters 380 SUIDs
- Regression correlation gt 0.8 364 SUIDs
- Ratio of intensity to background in both channels
gt 2.5 370 SUIDs - Net intensity in either channel gt 350 377 SUIDs
- 70 of pixels within one standard deviation of
background 345 SUIDs - Regression correlation gt 0.6 AND Net intensity in
either channel gt 350 374 SUIDs
75Data Filtering Image Presentation Options
- Retrieve spot coordinates will allow you to see
an assembled image of each array after
clustering. (However, multiple spots with the
same contents interact poorly with use of
systematic names as IDs - only one spot image
will be shown). - Show all spots allows you to view the spots you
filtered out (in addition to the ones that passed
filtering) after clustering. This slows down
retrieval.
76Data Filtering Summary
- Choose data column to retrieve
- Elect to invert reverse-dye replicates
- Elect to filter by spot flag
- Select spot criteria for filtering (spot filters
dont remove genes, but just gray data that
dont pass, unless all spots are removed) - Define image presentation options
77PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
78Data Retrieval
- General results and progress
- PreClustering (.pcl) file
- Data retrieval summary report
- Option to deposit data in repository
79Data Retrieval Summary
80Data Processing and Clustering
- Experiment Selection
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
81Gene Filtering
- Transform single-channel data
- Filter genes based on data distribution
- Data centering
- Filter genes based on data values
- Filter genes and arrays based on spot filter
criteria
82Gene Filtering Transformation
- Single-channel (e.g., Affymetrix) data only.
- Adjust arrays for simple cross-array
normalization. - Log-transform data for clustering.
- May add a constant for variance stabilization
- May replace non-positive values with very small
values
83Gene Filtering Data Distribution
- Rank will select genes whose retrieved value is
in the top Nth percentile for M or more arrays. - Deviations selects those genes whose retrieved
value has a value significantly above or below
the mean (N standard deviations), for M or more
arrays.
84Gene Filtering Percentile Rank
- Ten arrays
- 500-gene genelist
- Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Rank gt 95 in at least one array
- 66 SUIDs are used for clustering
- Many spots are removed, since only the spots that
were very intense in the red channel were
included
85Gene Filtering Deviation from Mean Value
- Ten arrays, 500-gene genelist
- Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Genes whose Log(Normalized Red/Green) is more
than one standard deviation from mean in at least
one array - 70 SUIDs selected for clustering
- This filter removes spots that do not show
significant variance from the mean -- a good way
to identify genes with potentially interesting
behavior
86Gene Filtering Centering Data
- Data can be centered at this stage. This
transforms the data so that the mean value is
equal to zero. Images and downloaded files will
reflect this transformation. - During clustering, data can be treated as if they
were centered, but the values of the data are not
affected. - Data centering and centering during clustering
can be combined in all four possible ways. - Gene centering is useful for common references.
- Array centering amounts to renormalizing each
array, using the spots that pass the spot filter
criteria.
87Data Centering Effects of Different Centering
Strategies
Uncentered Data, No Centering Metric During
Clustering
Uncentered Data, Centering Metric During
Clustering
Centered Data, No Centering Metric During
Clustering
Centered Data, Centering Metric During Clustering
88Gene Filtering Center Genes
Centered
Uncentered
- Ten arrays, 500-gene genelist, Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Genes centered -- no effect on number of SUIDs
clustered, but distribution of signal is changed
(centered data is displayed on left)
89Gene Filtering Data Values
- Cutoff requires data to exceed a user-defined
value in at least M arrays. This is perhaps our
least useful filter. Especially when data are
centered, you could be losing important
information. - Distance requires that the length of the genes
expression vector, across all arrays, be greater
than a user-defined value. This is a general
measure of response to experimental conditions. - Only available for log ratio data.
90Gene Filtering Values of Log(Red/Green)
- Ten arrays, 500-gene genelist, Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Log of Red/GreenNormalized Ratio (Mean) is
absolute value gt 2 for at least 1 array - 57 SUIDs selected for clustering
- Since this is a filter based on values, caution
should be exercised -- values often change during
normalization and centering.
91Gene Filtering Spot Filter Criteria
- Genes can be screened out if they do not meet the
spot criteria a given percentage of the time, as
specified by the user. - Arrays can be similarly filtered out if they do
not meet the spot filter criteria.
92Gene Filtering Amount of Data Passing Filters
- Ten arrays, 500-gene genelist, Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Centered genes and arrays
- Genes must have 80 of spots pass filters
- 285 SUIDs are used for the cluster
- This reduces the number of missing data genes
and permits the clustering to be performed on
genes with more data points.
93Gene Filtering Amount of Data Passing Filters
- Ten arrays, 500-gene genelist, Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Centered genes and arrays
- Genes and Arrays must have 80 of spots pass
filters - 285 SUIDs are used for the cluster
- Filtering away arrays whose spots fail the
filters at a high frequency is a good way to
remove pathologically bad arrays
94Spot Filtering vs. Gene Filtering
Gene filters remove the genes that do not meet
the filter criteria often enough. This reduces
the number of genes.
Spot filters remove individual data points. That
means there will be more missing (gray) data.
95Gene Filtering Summary
- Correct selection of filters will retain
interesting data and remove those that are
unreliable or uninteresting. - A good understanding of your experiment is
REQUIRED before you can decide which filters make
biological sense. - Not all filtering criteria are useful for all
experiments.
96Gene Filtering Results
- The numbers of genes and arrays are shown
- PreClustering files (.pcl) can be downloaded
- Summary report is available
- May deposit to repository at this stage.
- Proceed to clustering
97Gene Filtering Data Retrieval Summary Report
98Gene Filtering Summary
- Transform single-channel data
- Filter genes based on data distribution
- Center data
- Filter genes based on data values
- Filter genes and arrays based on spot filter
criteria
99PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
100Clustering and Image Generation
- Partitioning options
- Clustering metric selections
- Correlated genes
- Image generation options
101Clustering Metric Selections
- Genes and arrays can be clustered.
- Pearson correlation treats vectors as if they
were the same (unit) length. - Euclidean distance measures the absolute distance
between two points in space. Therefore Euclidean
distance will be affected by both the direction
and the amplitude of the vectors.
102Clustering Gene Clustering
- Ten arrays, 500-gene genelist, Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Centered genes
- Genes must have 80 of spots pass filters
- 274 SUIDs are used for the cluster
- No centering during clustering
- Pearson correlation, genes clustered
103Clustering Tree Displays
- Clustered gene arrays are displayed adjacent to
most similar arrays. - The nodes of the trees indicate the members of an
array and the degree of similarity to its
neighbor.
104Clustering Array clustering
- Ten arrays, 500-gene genelist, Spot flag 0
- Regression correlation gt 0.6
- Net intensity in either channel gt 350
- Centered genes, 80 must pass filters
- 274 SUIDs are used for the cluster
- No centering during clustering
- Pearson correlation, clustering genes and arrays
- Clustering of arrays will change the order of the
arrays in your display
105Clustering Tree Displays
- Clustering arrays will give a tree for the arrays
that is very similar to that for the genes
106Clustering Array Clustering
No Array Clustering
With Array Clustering
107Clustering Partitioning Data
- Data can be partitioned into a Self Organizing
Map (SOM) - If partitioned, dimensions of the SOM must be
specified
108Clustering Self Organizing Maps
- SOMs result in genes being assigned to partitions
of most similar genes - Neighboring partitions are more similar to each
other than they are to distant partitions
109Clustering Correlated Genes
- A file listing the best-correlated genes, for
each gene retrieved, can be produced.
110Clustering Image Generation Options
- Contrast can be modified
- Missing data can be assigned different colors of
gray - Both red/green and blue/yellow schemes can be
used - You can elect to view spot images
111Clustering Visualization
- Click on the image to get a dynamic display.
- Click on one of the other options to see static
displays with or without the spot images. - Downloadable files (.cdt, .atr, .gtr, report) for
use with other tools (e.g., TreeView).
112Clustering Cluster Image
- Scale is indicated on the color bar
- Gene names are at the right
- Tree generated by hierarchical clustering is at
the left
113Clustering Display Clustered Spot Images
114Clustering DisplayAdjacent Cluster and
Clustered Spot Images
115Clustering Display Hierarchical Cluster View
116Clustering and Image Generation Summary
- Partitioning data
- Clustering metric selections
- Correlated genes
- Image generation options
117PUMAdb Data Analysis
- Data Analysis Background
- Data normalization
- Clustering algorithms
- Data centering
- Using the Databases Analysis Pipeline
- Gene Selection and Annotation
- Data Filtering
- Data Retrieval
- Gene Filtering
- Clustering and Image Generation
118User Help Help, Tutorials and Workshops
- Help FAQ
- http//puma.princeton.edu/help/
- http//puma.princeton.edu/help/FAQ.shtml
- Tutorials regularly scheduled
- Welcome tutorial
- Data analysis, Normalization and Clustering
- Interested? Email array_at_genomics.princeton.edu
- Hybridization Scanning Individual Instruction
- Email dstorton_at_molbio.princeton.edu
119PUMAdb Office Hours
- CIL 135
- Thursday 9-11 am
- Friday 2-4 pm