Title: Goals:
1- Introduction
- Goals
- Offers ERIC users an integrated set of
web-based analysis tools and a data management
system to store and analyze cDNA/oligo/Affymetrix
Gene Expression data using open systems design. - Currently supports Axon GenePix, Perkin-Elmer
QuantArray, and Arraysuite II / IP Lab image
analysis software for two-color, Pat Brown-type
spotted arrays. - The ERIC team stands ready to assist you in
working with mAdb. Please contact us with any
questions.
2Access mAdb from the ERIC portal http//www.ericbr
c.org
email comments to administrator_at_ericbrc.org
3- mAdb Quick Facts
- mAdb systems are in place at the NCI, NIAID, and
FDA Microarray Centers, the Netherlands Cancer
Institute, and the Genome Center of Singapore. - Data sharing is determined by each investigator
no one has access to all the data. Permissions
may be set all the way from private to fully
public. - MIAME capable format.
- Look for the mAdb icon ( ) throughout for
on-line help.
4mAdb System Features
- Gene Discovery
- Outlier detection row retrieval tools
- Scatter plots
- Ad hoc keyword queries
- Multiple array viewer
- Pathway summaries GO, KEGG, others
- Boolean comparison of data
- Class Discovery (unsupervised)
- Clustering Hierarchical, K-means, SOMs
- Multidimensional Scaling
- Principal Components Analysis
- Class Comparison
- t-test Wilcoxon ANOVA Kruskal-Wallis SAM
- Class Prediction
- PAM classifier
5- mAdb Analysis Paradigm
- Create a project, a logical grouping of arrays
for addressing a biological question. Upload the
arrays to that project. - Perform quality control Project Summary and
Graphical Reports. - Create a filtered dataset
- Select arrays for analysis.
- Define quality parameters (minimum signal values,
S/N, etc.). - Select normalization method, so different arrays
can be compared. - Align genes from different array layouts (based
on well IDs). - 4. Apply Data/Gene criteria filters, if desired,
to create subset dataset(s). - 5. Apply appropriate Analysis/Visualization tools
to the dataset(s). - 6. Repeat steps 3, 4, and 5 as desired.
- 7. Interpret datasets/results.
6Architecture for ?Array Informatics
coming soon
coming soon
coming soon
7Outline of this training session
- Introduction to mAdb.
- Open and manage a user account from the ERIC
portal. - The mAdb Gateway Page.
- Create and manage a project.
- Grant access and privileges to your
collaborators. - Make a project publicly available to the user
community. - Data upload and validation.
- Upload arrays and monitor upload progress.
- Upload MIAME protocols and associate them with
arrays. - Standard extraction of a dataset.
- Assess array quality via Project Summary and
Graphical Reports. - Initial dataset extraction.
- The Dataset Display Page.
- Assessing replicates.
- 6. Further analysis of datasets.
8II. Open an account from the ERIC portal
Fill out the Account Request form, click Submit
to review the information. Then click Request
Account and account information will be sent to
your e-mail address.
9With your Username and Account ID, login from the
ERIC portal to the Gateway Page
Upon initial login (see slide 2), you will be
prompted to change your password. You will then
be directed to the mAdb Gateway page.
Click Manage User Profile to change personal info
or passwords
10The Gateway Page is the starting point for mAdb
activities
1
3
2
5
4
- With these links you can upload arrays or MIAME
protocols, monitor the upload progress, or create
an Identifier List. - Use these links to manage your account, and
create or manage projects. - In-progress datasets are stored to and retrieved
from three areas, which persist 24 hrs after last
access, 30 days, or indefinitely. Finally, when a
colleague posts a dataset to you, a Pickup area
appears on your Gateway. - Public access datasets can be retrieved from the
Training/Public area. - The Projects menu lists projects to which you
have access. Select a project and a tool from the
Tool menu to launch data extraction and analysis.
1
2
3
4
5
11III Creating and Managing a Project
A project is a logical grouping of arrays.
1. Existing projects for an account are listed in
the Projects window on the mAdb Gateway page. A
project is only visible to account holders
authorized to view it. 2. To create a new project
in the account, or manage an existing one, click
the Create/Manage Projects link under Management
Tools.
12- To create a new project, click the Create New
Projects link. - Any publicly available project may be added to
your personal Projects menu list by selecting
here and clicking Add Project. For further
details, see slide 18. - Any project in the account can be managed (i.e.
descriptions edited, permissions granted) from
here by clicking Management Options for the
respective project.
13Creating Projects
- On the Create New Project page, enter a Project
Title, Project Description and any comments you
may have. - When done, click the Create button to return the
browser to the Managing Projects page. The new
project is listed.
14Return the browser to the mAdb Gateway page and
the new project is listed in the Projects window.
The prefixes before a project name indicate your
privileges (A administrator, U upload, X
non-administrator, non-upload) and the project
owners login name.
To add arrays to a new or existing project, the
user must own the project, or have Upload
privileges.
15Managing Projects - click Management Options link
to open the Project Management Options page.
View a project summary. The Access List names the
mAdb account holders who may use the project
(bold names have administrative privileges).
- Delete a project (only functional if it does not
yet contain any arrays). - Edit project information, i.e. title,
description, comments. - Add other mAdb users to the Access List so they
may view/upload data. - Grant/revoke other users Privileges to this
project. - Make a project Public to the user community.
16Granting Access to a Project
- To add a mAdb account holder to a project Access
List, check the box next to their name(s) and
click the Add User(s) button. - Adding a user to the Access List allows that
account holder to view the arrays in a project
and work with the data.
17Changing User Access Levels
On the Project Management Options page, click the
Privileges link to open the Change User(s)
Privileges page.
Admin or Upload privileges grant a user
additional authority on a project. Admin users
may edit the project, array or dataset
descriptions, or move arrays between projects.
Upload privilege allows a user to add arrays to
the project. Check the appropriate box next to a
user name to grant Admin or Upload privileges (or
both). Uncheck the box(es) to revoke privileges.
18Tips on making a project public.
When you make a project public (slide 15), its
arrays are available to all ERIC users for
analysis (the specific analyses you have done are
NOT made public.) ERIC users can access the
arrays in two ways.
Once a project is made public, any new arrays you
subsequently add to it are immediately available
to the user community. If you prefer to keep the
new arrays confidential and use them with arrays
in the public project (1) Create a new project
with the new arrays, and (2) from the Gateway
page Tools menu, copy any or all arrays from the
old project into the new project (see slide 40).
19IV. Data upload and upload validation
20Array Data Upload
Once a project is created in mAdb, arrays may be
added to it. The Upload pages in the following
section describe the required information and
data files coming from GenePix-type and
Affymetrix software. Consult with the ERIC mAdb
team if your data is from other types of
scanners. When placed into queue, a mAdb
administrator will map the data to a layout and
load it into the database, in the order in which
it is received.
- Open the Gateway page and click the Upload Array
data link - The next step is to select the type of array and
the project.
21- Choose Spotted Array or Affymetrix.
- Browse to the project intended to receive the new
arrays. You are limited to projects for which you
have Upload privileges.
22Form for Uploading Spotted Arrays
23Spotted Array Data Upload Form Part I
- The top of the form requires information about
the experiment. - Print Name/Lot ID. Use identifiers issued by your
printing facility. - Experiment Name (If there is a slide number
scratched on the slide, use it as part of the
Experiment Name, as a unique identifier). - Short Description
- Long description (optional)
- Channel A name and dye
- Channel B name and dye
24The center of the form allows the (optional)
associating of the array with specific MIAME
protocols, by selecting from the respective
dropdown boxes. Protocols that were previously
input to the system may be viewed only by the
user who entered them or persons sharing the
project. You may upload your arrays without
protocols and then attach them later, through the
Edit link on the projects Project Summary Report
page.See slide 35 for procedures on adding new
MIAME protocols.
Spotted Array Data Upload Form Part II
25Spotted Array Data Upload Form Part III
- The bottom of the form allows you to upload the
actual microarray data and composite image files
generated by the scanner. The files that can be
uploaded are - ArraySuite Sample Intensities file or GenePix
.gpr file (required). - Composite Image file (optional, most often
.jpeg files). - Gene Array List (.gal) file (optional).
- Raw Data file (optional, may contain probe
sequences and annotation information). - When the form is complete, click Upload.
26Confirming Upload
Clicking the Upload button opens a confirmation
page. Check that the image and file type selected
are correct and that the Data Values count is
roughly equal to the number of spots on the
array. Clicking the Confirm button puts the array
in queue for processing by a mAdb administrator.
The administrator will map your array data with a
gene array layout, and trigger the database to
ingest the mapped data. You may check the
progress of your upload via the Status of Uploads
link on the Gateway Page.
27Upload Status
After uploading data to the server, the status of
array processing may be viewed by clicking on the
Upload Status link on top of any mAdb page.
- Array processing occurs in two steps
- Data is parsed and entered into the database.
- Image is processed and stored.
- Loaded data may be analyzed while the image is
still loading.
28GenePix Software Notes
- Carefully grid each block
- Allow program to Find spots and adjust spot
size - Set option to Analyze absent spots
- Adjust JPEG for desired contrast/brightness.
These settings determine the mAdb image display. - Analyze spots
29Common Spotted Array Errors
- Common GenePix Errors
- Setting incorrect option for Analyze Absent
Feature (box should be checked) results in
truncated blocks. - Deleting blocks.
- Improper gridding.
- Common Upload Errors
- Files are directed to the incorrect file input
fields on the Upload form. - Loading a GAL file, Excel file, or Set Up file
instead of the GenePix data (.gpr) file in the
input field intended for data. - Loading TIFF file instead of composite JPEG or
PICT file.
30Spotted Array Data Upload Summary
- Fill in experimental info for each array.
- Provide identifying information, comments and
descriptions of samples. - Associate MIAME protocols with array (optional).
- Select data file for array
- Select image file of array (optional)
- 2. Submit and confirm upload.
- 3. Check upload status page to monitor progress.
- 4. Close browser when finished (for security)
31Form for uploading Affymetrix arrays
If Affymetrix chips are to be uploaded, the
following form is displayed.
- Browse to Metrics (.txt) file for the first box.
(The metrics file is an export option from the
Affymetrix CHP file - see the next slide.) - Browse to the corresponding .CEL file in second
box - Then click on the Continue button to submit the
files for upload. A separate tutorial on Creating
Affymetrix upload files can be opened via the
indicated link.
32Affymetrix Upload - generating a metrics.txt file
- Run chip through fluidics station to get .CEL
file - Analyze .CEL file (usually scale all spots to
500) to produce .CHP file. - With .CHP file open, set analysis options on
metrics tab as
- Click on Metric tab
- Save file as . Xxxx.txt
- Note If uploading comparison data, first upload
absolute baseline data.
33Confirming Affymetrix Upload
Clicking the Continue button opens a confirmation
page. Check that the uploaded data files are
correct. (Optional) Add a sample description and
a comment, and associate MIAME protocols to this
array on the bottom half of the form. Again,
protocols may be attached later through the Edit
link on the projects Project Summary Report
page. Clicking on the Confirm button puts the
array in queue for processing by a mAdb
administrator.
34Affymetrix Data Upload Summary
- Browse and select for
- Data File (Metrics - .txt file)
- CEL file
- Confirm the correct files for upload. Enter
Experiment data (array name, description, etc.). - Submit.
- Check upload status page to display progress.
- Close browser when finished (for security).
35Uploading MIAME Protocols
- mAdb allows MIAME protocols to be stored and
associated with experiments. - MIAME describes the Minimum Information About a
Microarray Experiment that is needed to enable
the interpretation of the results of the
experiment unambiguously and to reproduce the
experiment.
The Upload MIAME data link on the Gateway page
provides access to existing MIAME protocols and
allows uploading new protocols.
36MIAME Submission Types
Select the types of submission for input
Protocol Submission or Array Submission
37Protocol Submission
Selecting the Protocol Submission link opens a
menu for adding/editing 5 types of MIAME
protocols Extraction, Labeling, Hybridization,
Scanning, and Washing and staining.
38Adding a new Protocol
Each protocol submission page has a link for
adding a new MIAME protocol.
Fill in the form and click Submit. When prompted,
confirm the submission.
39Viewing Submissions
Each protocol submission page also lists existing
protocols of that type. Newly submitted protocols
are displayed automatically. Existing protocols
maybe edited by clicking on the Edit link on the
left.
40Copy or move arrays between projects
Arrays uploaded to a project may be copied or
moved to another project. This requires
administrative access to all projects whose
arrays are being copied/moved. The Copy/Move
Arrays tool is accessible from the Tool dropdown
menu on the Gateway Page.
41Re-order arrays within a project
Arrays may be reordered within a single project
into a more meaningful arrangement.
From the mAdb Gateway page, select a project and
the Order Arrays within a Project tool, and
click Continue. Rearrange the arrays by selecting
an array name and using the Up or Down arrows to
move it in the list.
42V. Standard extraction/filtration of array data
Goal To analyze arrays using only high
quality/reliable spots
43Dataset Structure -Filtering hierarchy /tree
structure
A
Extended Dataset Extractions
B
B.1
Additional filtering and analysis tools
Extracted subsets
44Begin analysis Select an un-extracted dataset
from the Projects menu, choose Project Summaries
Report from the Tool menu and click Continue.
If a project contains spotted arrays and
Affymetrix Absolute Expression arrays, you are
prompted to view Summary Reports for each
separately. (Previously extracted/analyzed
datasets are stored in Transient Temporary or
Permanent areas. These are not visible upon
initial login.)
45For each array in the project, assess quality via
overall statistics, histograms and images
- Click the Image icon to the left and view its
image file in a new window. - Click the Histogram icon to view an array
histogram in a new window, and gain access to
comprehensive QA graphics. This will be most
useful after dataset extraction and normalization
(see slide 47).
If you have administrative access to a project,
you can edit project and array descriptions from
the Edit links. Admin privileges also permit you
to attach MIAME protocols to an existing array or
replace a current protocol.
46Inspecting the images can help identify problem
streaks, bleeds or artifacts, and suggest
appropriate filtering steps, i.e. exclude spots
with background greater than ltthreshold
valuegt. Check the Show GRID checkbox and click
the Resize button to view and assess the grid
pattern produced by your spotfinding
software. Close the images and return to the
Gateway page.
47Initial Dataset Extraction
If a project contains spotted arrays and
Affymetrix Absolute Expression arrays, you are
prompted to extract each group separately.
48mAdb Definitions
- Signal - refers to background corrected values
(i.e.Target Intensity - Background Intensity). - Defaults
- MEAN Intensity MEDIAN background (the GenePix
default) - MEAN Intensity MEAN background (the ArraySuite
default) - Normalization factor initially calculated so
that the median overall ratio (Cy3 Signal/Cy5
Signal) for each array is adjusted to 1.0 (linear
space) or 0.0 (log base 2). Spots with an
extremely low signal are excluded from this
calculation. - Default settings may be changed on the Extended
Dataset Extraction page.
49- Why normalizing ratios is necessary when
comparing across multiple arrays - Unequal incorporation of labels (green Cy3
incorporates better than red Cy5). - Unequal amounts of samples.
- Unequal photomultiplier voltage settings.
- Different backgrounds.
- Total brightness may differ between chips.
- Why use ratios converted to log base 2?
- Makes variation of ratios more independent of
absolute magnitude. - Symmetrical graphing otherwise up-regulated
genes range from 1 to 8 down-regulated genes
get compressed between 0 and 1. - Clearer interpretation negative numbers are
downregulated genes positive numbers are
upregulated genes.
50Select the Filtering Options - 1
1. Under Signal, Normalization, Ratio Options
accept the defaults for Signal Calculation Mean
Int Median Bkg Normalization Method 50th
Percentile (Median) and Default Ratio
ChanA/ChanB. Leave other boxes unchecked. This
will normalize output based on spots that pass
the filters. 2. Under Spot Filter Options, check
boxes on the left to activate a filter(s), and
type appropriate values in fields to the right.
For now use - Exclude any Spots Indicated as
Bad or Not Found - Signal Above Bkgd gt 2
and 2 SDs - Signal gt 200 and 200 -
Override if Chan B Signal gt 5000 - Override
if Chan A Signal gt 5000
51Select the Filtering Options - 2
3. Under Dataset Properties, choose Rows
Ordered by Average (Log2 Ratio) and Descending
and Dataset Location Temporary Area. For the
Dataset Label use a name to clearly identify the
set, e.g. ltNamegt default Quality filters. 4.
Under Array Selection, use the radio buttons or
the All button to check the arrays for
extraction. Note For a dye swap array,
check the 1/R box. 5. Click Submit. When
extraction is complete the filtered set is stored
in your Temporary Area. Click Continue to view
the results.
When testing extraction conditions, you can
repeatedly refilter the project using different
settings and save each set in the Transient Area.
Then pick the best one and promote it to
Temporary or Permanent for further use.
52Select the Filtering Options Affymetrix arrays
53mAdb Dataset Display Part 1
1. The View Array Summaries button returns you
to array summaries for the filtered set, now
normalized on the 4847 spots that passed the
filters. 2. After the Dataset name (which can be
edited with the link to the left), is a history
of what was done in the preceding filtering
step(s). 3. You can Post this or any analyzed
dataset to colleagues with mAdb accounts. It will
be sent to a Pickup area on their Gateway Page
(slide 10).
54mAdb Dataset Display Part 2
4. Scroll down the Dataset Display page, to
select other Filtering/Grouping/Analysis Tools,
i.e. Additional Filtering Options, Group
Statistics, etc. Or view the filtered dataset
with a number of Interactive Graphical Viewers,
i.e. MultiDimensional Scaling, Principal
Components Analysis, Scatter Plots, etc. Or
reformat the dataset for import to other
applications. 5. Check or uncheck other display
options Background Color, Limiting Display to n,
KEGG Pathways, etc. 6. For now, change nothing
and continue scrolling down the Datatset Display
page
55mAdb Dataset Display Part 3
- 7. Each column represents an array each row a
feature. Grey cells represent missing points, or
data that failed to pass filters. Clicking the
Records arrow advances to the next 25 features.
Rows can be sorted in ascending and descending
order here they are sorted on Well ID. - 8. The Well ID links to data and images of the
feature on all arrays. Feature ID links to a
Feature Report, with annotations related to the
feature. - When available, KEGG annotations can be
displayed. When a feature is involved in gt 1
pathway, all are listed in the respective
dropdown menu. - When you add/subtract display options (i.e. Show
Spot Images, see previous slide), click the
Redisplay button to refresh this view.
56Generate and work with a Pathway Summary Report
57On the Dataset Display page scroll up to the list
of arrays and click a histogram icon
58(No Transcript)
59- M vs. A plot
- For each spot, Cy3/Cy5 ratio vs. their average
intensity - Reveals biases in the data. Ideally, ratios
should be independent of signal strength, i.e.
the plot should scatter along a horizontal line
centered at zero (in log 2 space).
60Evaluating replicates
- Now that the data is normalized in log2 space,
mAdb offers two tools for evaluating
reproducibility of replicates (including
biological replicates and dye-swaps, when
present) - The Correlation Summary Report provides a quick
tabular view of all pairings. - 2. An interactive Scatter Plot compares selected
pairs, with the option of isolating and
identifying outliers.
61On the Dataset Display page scroll to the Tool
menu, choose Array Group Assignment/Filtering and
click the Proceed button.
- Select the replicate arrays, enter a label and
click Submit. Option Expand the number of Groups
and assign replicates accordingly.
- Upon return to the Dataset Display Page, choose
Correlation Summary Report from the Tools menu
and click Proceed.
62mAdb Correlation Report
- Correlation values should approach 1.0 for paired
replicates. - Click a cell in the table to view its respective
data in a static scatter plot. - If filters were activated on the Extraction Page,
the correlations will be calculated only on spots
that pass the filters.
Alternately, on the Dataset Display page, choose
Scatter Plot log Ratios from the Viewer menu,
and click View
63Interactive Scatter Plot Applet
- Choose arrays to be paired on X and Y axes and
click Submit. - Select outlying spots with mouse features are
shown in lower display box. - Click on a feature name in lower display box to
get a Feature Report.
64VI. Further Analysis
- Exercises in the next few slides will show how
to - Filter to view spots found in X of arrays.
- Find spots that fail to reproduce within a
defined SD. - Generate an ad hoc feature query.
- Filter to focus on features in an external list.
- Find spots that change X-fold to reference.
- Compare intersections of sets.
65Exercise 1- Filter for Spots Found in x Arrays
After using the default quality filter settings
only 4847 spots remain of the 5776 in the
original set. Some are perhaps represented on
only a few arrays. We will refilter this dataset
to select features represented on all arrays.
66Other Filtering Options
67Exercise 2 - Filter features that exceed a
defined SD
Return to the descendent set of 4847 features,
i.e. default quality-filtered in which
replicates were grouped into 4 groups.
68Exercise 3 - Filter for features from a
User-Defined List of Genes
An Identifier List can be created based on
user-specified criteria, i.e. names, pathways,
reproducibility, etc. Use the intersection of
this list and the array results to focus your
analysis.
69Exercise 4 Form and view an ad hoc query
70Exercise 5 - Filter for Outlier Spots
Return again to the descendent set
quality-filtered (Group statistics). Open it in
a Dataset Display page.
These are features that are 2x increased/decreased
relative to the reference in 25 of the arrays.
For example, feature CHIP1718 is decreased in all
three arrays for strain 5D005, but not decreased
in any other arrays.
71- Or view outliers via the Multi-array Viewer
72Exercise 6 Boolean comparison of sets
1. On the Dataset Display page of the previous
extracted set 2x change in 25 of arrays,
choose Boolean Comparison with Another Set from
the Tools menu. 2. Choose a set B for comparison
from this or another project. You may be need to
navigate to another Storage Area, or click Expand
Set, to find the subset you want to compare. 3.
View the results.
73Future tutorials to cover mAdb tools for