Goals: - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Goals:

Description:

Composite Image file (optional, most often .jpeg files). Gene Array List (.gal) file (optional) ... Re-order arrays within a project ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 74
Provided by: Zare
Category:
Tags: goals

less

Transcript and Presenter's Notes

Title: Goals:


1
  • Introduction
  • Goals
  • Offers ERIC users an integrated set of
    web-based analysis tools and a data management
    system to store and analyze cDNA/oligo/Affymetrix
    Gene Expression data using open systems design.
  • Currently supports Axon GenePix, Perkin-Elmer
    QuantArray, and Arraysuite II / IP Lab image
    analysis software for two-color, Pat Brown-type
    spotted arrays.
  • The ERIC team stands ready to assist you in
    working with mAdb. Please contact us with any
    questions.

2
Access mAdb from the ERIC portal http//www.ericbr
c.org
email comments to administrator_at_ericbrc.org
3
  • mAdb Quick Facts
  • mAdb systems are in place at the NCI, NIAID, and
    FDA Microarray Centers, the Netherlands Cancer
    Institute, and the Genome Center of Singapore.
  • Data sharing is determined by each investigator
    no one has access to all the data. Permissions
    may be set all the way from private to fully
    public.
  • MIAME capable format.
  • Look for the mAdb icon ( ) throughout for
    on-line help.

4
mAdb System Features
  • Gene Discovery
  • Outlier detection row retrieval tools
  • Scatter plots
  • Ad hoc keyword queries
  • Multiple array viewer
  • Pathway summaries GO, KEGG, others
  • Boolean comparison of data
  • Class Discovery (unsupervised)
  • Clustering Hierarchical, K-means, SOMs
  • Multidimensional Scaling
  • Principal Components Analysis
  • Class Comparison
  • t-test Wilcoxon ANOVA Kruskal-Wallis SAM
  • Class Prediction
  • PAM classifier

5
  • mAdb Analysis Paradigm
  • Create a project, a logical grouping of arrays
    for addressing a biological question. Upload the
    arrays to that project.
  • Perform quality control Project Summary and
    Graphical Reports.
  • Create a filtered dataset
  • Select arrays for analysis.
  • Define quality parameters (minimum signal values,
    S/N, etc.).
  • Select normalization method, so different arrays
    can be compared.
  • Align genes from different array layouts (based
    on well IDs).
  • 4. Apply Data/Gene criteria filters, if desired,
    to create subset dataset(s).
  • 5. Apply appropriate Analysis/Visualization tools
    to the dataset(s).
  • 6. Repeat steps 3, 4, and 5 as desired.
  • 7. Interpret datasets/results.

6
Architecture for ?Array Informatics
coming soon
coming soon
coming soon
7
Outline of this training session
  • Introduction to mAdb.
  • Open and manage a user account from the ERIC
    portal.
  • The mAdb Gateway Page.
  • Create and manage a project.
  • Grant access and privileges to your
    collaborators.
  • Make a project publicly available to the user
    community.
  • Data upload and validation.
  • Upload arrays and monitor upload progress.
  • Upload MIAME protocols and associate them with
    arrays.
  • Standard extraction of a dataset.
  • Assess array quality via Project Summary and
    Graphical Reports.
  • Initial dataset extraction.
  • The Dataset Display Page.
  • Assessing replicates.
  • 6. Further analysis of datasets.

8
II. Open an account from the ERIC portal
Fill out the Account Request form, click Submit
to review the information. Then click Request
Account and account information will be sent to
your e-mail address.
9
With your Username and Account ID, login from the
ERIC portal to the Gateway Page
Upon initial login (see slide 2), you will be
prompted to change your password. You will then
be directed to the mAdb Gateway page.
Click Manage User Profile to change personal info
or passwords
10
The Gateway Page is the starting point for mAdb
activities
1
3
2
5
4
  • With these links you can upload arrays or MIAME
    protocols, monitor the upload progress, or create
    an Identifier List.
  • Use these links to manage your account, and
    create or manage projects.
  • In-progress datasets are stored to and retrieved
    from three areas, which persist 24 hrs after last
    access, 30 days, or indefinitely. Finally, when a
    colleague posts a dataset to you, a Pickup area
    appears on your Gateway.
  • Public access datasets can be retrieved from the
    Training/Public area.
  • The Projects menu lists projects to which you
    have access. Select a project and a tool from the
    Tool menu to launch data extraction and analysis.

1
2
3
4
5
11
III Creating and Managing a Project
A project is a logical grouping of arrays.
1. Existing projects for an account are listed in
the Projects window on the mAdb Gateway page. A
project is only visible to account holders
authorized to view it. 2. To create a new project
in the account, or manage an existing one, click
the Create/Manage Projects link under Management
Tools.
12
  • To create a new project, click the Create New
    Projects link.
  • Any publicly available project may be added to
    your personal Projects menu list by selecting
    here and clicking Add Project. For further
    details, see slide 18.
  • Any project in the account can be managed (i.e.
    descriptions edited, permissions granted) from
    here by clicking Management Options for the
    respective project.

13
Creating Projects
  • On the Create New Project page, enter a Project
    Title, Project Description and any comments you
    may have.
  • When done, click the Create button to return the
    browser to the Managing Projects page. The new
    project is listed.

14
Return the browser to the mAdb Gateway page and
the new project is listed in the Projects window.
The prefixes before a project name indicate your
privileges (A administrator, U upload, X
non-administrator, non-upload) and the project
owners login name.
To add arrays to a new or existing project, the
user must own the project, or have Upload
privileges.
15
Managing Projects - click Management Options link
to open the Project Management Options page.
View a project summary. The Access List names the
mAdb account holders who may use the project
(bold names have administrative privileges).
  • Delete a project (only functional if it does not
    yet contain any arrays).
  • Edit project information, i.e. title,
    description, comments.
  • Add other mAdb users to the Access List so they
    may view/upload data.
  • Grant/revoke other users Privileges to this
    project.
  • Make a project Public to the user community.

16
Granting Access to a Project
  • To add a mAdb account holder to a project Access
    List, check the box next to their name(s) and
    click the Add User(s) button.
  • Adding a user to the Access List allows that
    account holder to view the arrays in a project
    and work with the data.

17
Changing User Access Levels
On the Project Management Options page, click the
Privileges link to open the Change User(s)
Privileges page.
Admin or Upload privileges grant a user
additional authority on a project. Admin users
may edit the project, array or dataset
descriptions, or move arrays between projects.
Upload privilege allows a user to add arrays to
the project. Check the appropriate box next to a
user name to grant Admin or Upload privileges (or
both). Uncheck the box(es) to revoke privileges.
18
Tips on making a project public.
When you make a project public (slide 15), its
arrays are available to all ERIC users for
analysis (the specific analyses you have done are
NOT made public.) ERIC users can access the
arrays in two ways.
Once a project is made public, any new arrays you
subsequently add to it are immediately available
to the user community. If you prefer to keep the
new arrays confidential and use them with arrays
in the public project (1) Create a new project
with the new arrays, and (2) from the Gateway
page Tools menu, copy any or all arrays from the
old project into the new project (see slide 40).
19
IV. Data upload and upload validation
20
Array Data Upload
Once a project is created in mAdb, arrays may be
added to it. The Upload pages in the following
section describe the required information and
data files coming from GenePix-type and
Affymetrix software. Consult with the ERIC mAdb
team if your data is from other types of
scanners. When placed into queue, a mAdb
administrator will map the data to a layout and
load it into the database, in the order in which
it is received.
  • Open the Gateway page and click the Upload Array
    data link
  • The next step is to select the type of array and
    the project.

21
  • Choose Spotted Array or Affymetrix.
  • Browse to the project intended to receive the new
    arrays. You are limited to projects for which you
    have Upload privileges.

22
Form for Uploading Spotted Arrays
23
Spotted Array Data Upload Form Part I
  • The top of the form requires information about
    the experiment.
  • Print Name/Lot ID. Use identifiers issued by your
    printing facility.
  • Experiment Name (If there is a slide number
    scratched on the slide, use it as part of the
    Experiment Name, as a unique identifier).
  • Short Description
  • Long description (optional)
  • Channel A name and dye
  • Channel B name and dye

24
The center of the form allows the (optional)
associating of the array with specific MIAME
protocols, by selecting from the respective
dropdown boxes. Protocols that were previously
input to the system may be viewed only by the
user who entered them or persons sharing the
project. You may upload your arrays without
protocols and then attach them later, through the
Edit link on the projects Project Summary Report
page.See slide 35 for procedures on adding new
MIAME protocols.
Spotted Array Data Upload Form Part II
25
Spotted Array Data Upload Form Part III
  • The bottom of the form allows you to upload the
    actual microarray data and composite image files
    generated by the scanner. The files that can be
    uploaded are
  • ArraySuite Sample Intensities file or GenePix
    .gpr file (required).
  • Composite Image file (optional, most often
    .jpeg files).
  • Gene Array List (.gal) file (optional).
  • Raw Data file (optional, may contain probe
    sequences and annotation information).
  • When the form is complete, click Upload.

26
Confirming Upload
Clicking the Upload button opens a confirmation
page. Check that the image and file type selected
are correct and that the Data Values count is
roughly equal to the number of spots on the
array. Clicking the Confirm button puts the array
in queue for processing by a mAdb administrator.
The administrator will map your array data with a
gene array layout, and trigger the database to
ingest the mapped data. You may check the
progress of your upload via the Status of Uploads
link on the Gateway Page.
27
Upload Status
After uploading data to the server, the status of
array processing may be viewed by clicking on the
Upload Status link on top of any mAdb page.
  • Array processing occurs in two steps
  • Data is parsed and entered into the database.
  • Image is processed and stored.
  • Loaded data may be analyzed while the image is
    still loading.

28
GenePix Software Notes
  • Carefully grid each block
  • Allow program to Find spots and adjust spot
    size
  • Set option to Analyze absent spots
  • Adjust JPEG for desired contrast/brightness.
    These settings determine the mAdb image display.
  • Analyze spots

29
Common Spotted Array Errors
  • Common GenePix Errors
  • Setting incorrect option for Analyze Absent
    Feature (box should be checked) results in
    truncated blocks.
  • Deleting blocks.
  • Improper gridding.
  • Common Upload Errors
  • Files are directed to the incorrect file input
    fields on the Upload form.
  • Loading a GAL file, Excel file, or Set Up file
    instead of the GenePix data (.gpr) file in the
    input field intended for data.
  • Loading TIFF file instead of composite JPEG or
    PICT file.

30
Spotted Array Data Upload Summary
  • Fill in experimental info for each array.
  • Provide identifying information, comments and
    descriptions of samples.
  • Associate MIAME protocols with array (optional).
  • Select data file for array
  • Select image file of array (optional)
  • 2. Submit and confirm upload.
  • 3. Check upload status page to monitor progress.
  • 4. Close browser when finished (for security)

31
Form for uploading Affymetrix arrays
If Affymetrix chips are to be uploaded, the
following form is displayed.
  • Browse to Metrics (.txt) file for the first box.
    (The metrics file is an export option from the
    Affymetrix CHP file - see the next slide.)
  • Browse to the corresponding .CEL file in second
    box
  • Then click on the Continue button to submit the
    files for upload. A separate tutorial on Creating
    Affymetrix upload files can be opened via the
    indicated link.

32
Affymetrix Upload - generating a metrics.txt file
  • Run chip through fluidics station to get .CEL
    file
  • Analyze .CEL file (usually scale all spots to
    500) to produce .CHP file.
  • With .CHP file open, set analysis options on
    metrics tab as
  • Click on Metric tab
  • Save file as . Xxxx.txt
  • Note If uploading comparison data, first upload
    absolute baseline data.

33
Confirming Affymetrix Upload
Clicking the Continue button opens a confirmation
page. Check that the uploaded data files are
correct. (Optional) Add a sample description and
a comment, and associate MIAME protocols to this
array on the bottom half of the form. Again,
protocols may be attached later through the Edit
link on the projects Project Summary Report
page. Clicking on the Confirm button puts the
array in queue for processing by a mAdb
administrator.
34
Affymetrix Data Upload Summary
  • Browse and select for
  • Data File (Metrics - .txt file)
  • CEL file
  • Confirm the correct files for upload. Enter
    Experiment data (array name, description, etc.).
  • Submit.
  • Check upload status page to display progress.
  • Close browser when finished (for security).

35
Uploading MIAME Protocols
  • mAdb allows MIAME protocols to be stored and
    associated with experiments.
  • MIAME describes the Minimum Information About a
    Microarray Experiment that is needed to enable
    the interpretation of the results of the
    experiment unambiguously and to reproduce the
    experiment.

The Upload MIAME data link on the Gateway page
provides access to existing MIAME protocols and
allows uploading new protocols.
36
MIAME Submission Types
Select the types of submission for input
Protocol Submission or Array Submission
37
Protocol Submission
Selecting the Protocol Submission link opens a
menu for adding/editing 5 types of MIAME
protocols Extraction, Labeling, Hybridization,
Scanning, and Washing and staining.
38
Adding a new Protocol
Each protocol submission page has a link for
adding a new MIAME protocol.
Fill in the form and click Submit. When prompted,
confirm the submission.
39
Viewing Submissions
Each protocol submission page also lists existing
protocols of that type. Newly submitted protocols
are displayed automatically. Existing protocols
maybe edited by clicking on the Edit link on the
left.
40
Copy or move arrays between projects
Arrays uploaded to a project may be copied or
moved to another project. This requires
administrative access to all projects whose
arrays are being copied/moved. The Copy/Move
Arrays tool is accessible from the Tool dropdown
menu on the Gateway Page.
41
Re-order arrays within a project
Arrays may be reordered within a single project
into a more meaningful arrangement.
From the mAdb Gateway page, select a project and
the Order Arrays within a Project tool, and
click Continue. Rearrange the arrays by selecting
an array name and using the Up or Down arrows to
move it in the list.
42
V. Standard extraction/filtration of array data
Goal To analyze arrays using only high
quality/reliable spots
43
Dataset Structure -Filtering hierarchy /tree
structure
A
Extended Dataset Extractions
B
B.1
Additional filtering and analysis tools
Extracted subsets
44
Begin analysis Select an un-extracted dataset
from the Projects menu, choose Project Summaries
Report from the Tool menu and click Continue.
If a project contains spotted arrays and
Affymetrix Absolute Expression arrays, you are
prompted to view Summary Reports for each
separately. (Previously extracted/analyzed
datasets are stored in Transient Temporary or
Permanent areas. These are not visible upon
initial login.)
45
For each array in the project, assess quality via
overall statistics, histograms and images
  • Click the Image icon to the left and view its
    image file in a new window.
  • Click the Histogram icon to view an array
    histogram in a new window, and gain access to
    comprehensive QA graphics. This will be most
    useful after dataset extraction and normalization
    (see slide 47).

If you have administrative access to a project,
you can edit project and array descriptions from
the Edit links. Admin privileges also permit you
to attach MIAME protocols to an existing array or
replace a current protocol.
46
Inspecting the images can help identify problem
streaks, bleeds or artifacts, and suggest
appropriate filtering steps, i.e. exclude spots
with background greater than ltthreshold
valuegt. Check the Show GRID checkbox and click
the Resize button to view and assess the grid
pattern produced by your spotfinding
software. Close the images and return to the
Gateway page.
47
Initial Dataset Extraction
If a project contains spotted arrays and
Affymetrix Absolute Expression arrays, you are
prompted to extract each group separately.
48
mAdb Definitions
  • Signal - refers to background corrected values
    (i.e.Target Intensity - Background Intensity).
  • Defaults
  • MEAN Intensity MEDIAN background (the GenePix
    default)
  • MEAN Intensity MEAN background (the ArraySuite
    default)
  • Normalization factor initially calculated so
    that the median overall ratio (Cy3 Signal/Cy5
    Signal) for each array is adjusted to 1.0 (linear
    space) or 0.0 (log base 2). Spots with an
    extremely low signal are excluded from this
    calculation.
  • Default settings may be changed on the Extended
    Dataset Extraction page.

49
  • Why normalizing ratios is necessary when
    comparing across multiple arrays
  • Unequal incorporation of labels (green Cy3
    incorporates better than red Cy5).
  • Unequal amounts of samples.
  • Unequal photomultiplier voltage settings.
  • Different backgrounds.
  • Total brightness may differ between chips.
  • Why use ratios converted to log base 2?
  • Makes variation of ratios more independent of
    absolute magnitude.
  • Symmetrical graphing otherwise up-regulated
    genes range from 1 to 8 down-regulated genes
    get compressed between 0 and 1.
  • Clearer interpretation negative numbers are
    downregulated genes positive numbers are
    upregulated genes.

50
Select the Filtering Options - 1
 1. Under Signal, Normalization, Ratio Options
accept the defaults for Signal Calculation Mean
Int Median Bkg Normalization Method 50th
Percentile (Median) and Default Ratio
ChanA/ChanB. Leave other boxes unchecked. This
will normalize output based on spots that pass
the filters. 2. Under Spot Filter Options, check
boxes on the left to activate a filter(s), and
type appropriate values in fields to the right.
For now use - Exclude any Spots Indicated as
Bad or Not Found - Signal Above Bkgd gt 2
and 2 SDs - Signal gt 200 and 200 -
Override if Chan B Signal gt 5000 - Override
if Chan A Signal gt 5000
51
Select the Filtering Options - 2
 3. Under Dataset Properties, choose Rows
Ordered by Average (Log2 Ratio) and Descending
and Dataset Location Temporary Area. For the
Dataset Label use a name to clearly identify the
set, e.g. ltNamegt default Quality filters. 4.
Under Array Selection, use the radio buttons or
the All button to check the arrays for
extraction. Note For a dye swap array,
check the 1/R box. 5. Click Submit. When
extraction is complete the filtered set is stored
in your Temporary Area. Click Continue to view
the results.
When testing extraction conditions, you can
repeatedly refilter the project using different
settings and save each set in the Transient Area.
Then pick the best one and promote it to
Temporary or Permanent for further use.
52
Select the Filtering Options Affymetrix arrays
53
mAdb Dataset Display Part 1
 1. The View Array Summaries button returns you
to array summaries for the filtered set, now
normalized on the 4847 spots that passed the
filters. 2. After the Dataset name (which can be
edited with the link to the left), is a history
of what was done in the preceding filtering
step(s). 3. You can Post this or any analyzed
dataset to colleagues with mAdb accounts. It will
be sent to a Pickup area on their Gateway Page
(slide 10).
54
mAdb Dataset Display Part 2
 4. Scroll down the Dataset Display page, to
select other Filtering/Grouping/Analysis Tools,
i.e. Additional Filtering Options, Group
Statistics, etc. Or view the filtered dataset
with a number of Interactive Graphical Viewers,
i.e. MultiDimensional Scaling, Principal
Components Analysis, Scatter Plots, etc. Or
reformat the dataset for import to other
applications. 5. Check or uncheck other display
options Background Color, Limiting Display to n,
KEGG Pathways, etc. 6. For now, change nothing
and continue scrolling down the Datatset Display
page
55
mAdb Dataset Display Part 3
  • 7. Each column represents an array each row a
    feature. Grey cells represent missing points, or
    data that failed to pass filters. Clicking the
    Records arrow advances to the next 25 features.
    Rows can be sorted in ascending and descending
    order here they are sorted on Well ID.
  • 8. The Well ID links to data and images of the
    feature on all arrays. Feature ID links to a
    Feature Report, with annotations related to the
    feature.
  • When available, KEGG annotations can be
    displayed. When a feature is involved in gt 1
    pathway, all are listed in the respective
    dropdown menu.
  • When you add/subtract display options (i.e. Show
    Spot Images, see previous slide), click the
    Redisplay button to refresh this view.

56
Generate and work with a Pathway Summary Report
57
On the Dataset Display page scroll up to the list
of arrays and click a histogram icon
58
(No Transcript)
59
  • M vs. A plot
  • For each spot, Cy3/Cy5 ratio vs. their average
    intensity
  • Reveals biases in the data. Ideally, ratios
    should be independent of signal strength, i.e.
    the plot should scatter along a horizontal line
    centered at zero (in log 2 space).

60
Evaluating replicates
  • Now that the data is normalized in log2 space,
    mAdb offers two tools for evaluating
    reproducibility of replicates (including
    biological replicates and dye-swaps, when
    present)
  • The Correlation Summary Report provides a quick
    tabular view of all pairings.
  • 2. An interactive Scatter Plot compares selected
    pairs, with the option of isolating and
    identifying outliers.

61
On the Dataset Display page scroll to the Tool
menu, choose Array Group Assignment/Filtering and
click the Proceed button.
  • Select the replicate arrays, enter a label and
    click Submit. Option Expand the number of Groups
    and assign replicates accordingly.
  • Upon return to the Dataset Display Page, choose
    Correlation Summary Report from the Tools menu
    and click Proceed.

62
mAdb Correlation Report
  • Correlation values should approach 1.0 for paired
    replicates.
  • Click a cell in the table to view its respective
    data in a static scatter plot.
  • If filters were activated on the Extraction Page,
    the correlations will be calculated only on spots
    that pass the filters.

Alternately, on the Dataset Display page, choose
Scatter Plot log Ratios from the Viewer menu,
and click View
63
Interactive Scatter Plot Applet
  • Choose arrays to be paired on X and Y axes and
    click Submit.
  • Select outlying spots with mouse features are
    shown in lower display box.
  • Click on a feature name in lower display box to
    get a Feature Report.

64
VI. Further Analysis
  • Exercises in the next few slides will show how
    to
  • Filter to view spots found in X of arrays.
  • Find spots that fail to reproduce within a
    defined SD.
  • Generate an ad hoc feature query.
  • Filter to focus on features in an external list.
  • Find spots that change X-fold to reference.
  • Compare intersections of sets.

65
Exercise 1- Filter for Spots Found in x Arrays
After using the default quality filter settings
only 4847 spots remain of the 5776 in the
original set. Some are perhaps represented on
only a few arrays. We will refilter this dataset
to select features represented on all arrays.
66
Other Filtering Options
67
Exercise 2 - Filter features that exceed a
defined SD
Return to the descendent set of 4847 features,
i.e. default quality-filtered in which
replicates were grouped into 4 groups.
68
Exercise 3 - Filter for features from a
User-Defined List of Genes
An Identifier List can be created based on
user-specified criteria, i.e. names, pathways,
reproducibility, etc. Use the intersection of
this list and the array results to focus your
analysis.
69
Exercise 4 Form and view an ad hoc query
70
Exercise 5 - Filter for Outlier Spots
Return again to the descendent set
quality-filtered (Group statistics). Open it in
a Dataset Display page.
These are features that are 2x increased/decreased
relative to the reference in 25 of the arrays.
For example, feature CHIP1718 is decreased in all
three arrays for strain 5D005, but not decreased
in any other arrays.
71
  • Or view outliers via the Multi-array Viewer

72
Exercise 6 Boolean comparison of sets
 1. On the Dataset Display page of the previous
extracted set 2x change in 25 of arrays,
choose Boolean Comparison with Another Set from
the Tools menu. 2. Choose a set B for comparison
from this or another project. You may be need to
navigate to another Storage Area, or click Expand
Set, to find the subset you want to compare. 3.
View the results.
73
Future tutorials to cover mAdb tools for
Write a Comment
User Comments (0)
About PowerShow.com