Title: Welcome to SMD
1Welcome to SMD
- Stanford Microarray Database
- April 6, 2009
- Janos Demeter
2User Help Tutorials and Workshops
- SMD Help FAQ
- http//smd.stanford.edu/help/index.html
- SMD Office hours
- Monday 3 - 5 pm
- Wednesday 2 - 4 pm
- SMD Tutorials regularly scheduled
- Welcome to SMD tutorial
- Data analysis, Normalization and Clustering
3Welcome to SMD a tutorial
- What well talk about
- User Registration /Accounts
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
- What we will not discuss, or only brush the
surface of - Experimental Design
- Data Normalization
- Data Quality Assessment
- Data Retrieval and Analysis (clustering)
- External User Tools (XCluster, TreeView, etc.)
- Please fill out the sign-up sheet and survey form
- Questions? email us at array_at_genome.stanford.edu
4Welcome to SMD What is SMD?
- Database
- Stores/gives you access to your data
- Gives you access to data from other members of
your lab/collaborators/public data - You can control who has access to your data
- Services e.g. gene annotations kept up-to-date
- Tools to analyze microarray data
- File management system to keep results of your
analyses easily retrievable and annotated.
5Welcome to SMD
6Welcome to SMD
7Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
8User Registration
- The registration form is on SMDs login page
- http//smd.stanford.edu/cgi-bin/tools/display/regi
stration.pl - Accounts are grouped by labs. Access shared
within groups. - Lab PI must verify every user in group and the
type of account needed - Two requirements to obtain account in SMD
- Upon publication all experiments are made public.
Once published, data are deposited in GEO and
ArrayExpress - SMD is free again for Stanford users. External
user groups have to pay(/per group/per year) - of arrays loaded fee
- gt200 15,000
- 100-199 9,000
- 50-99 4,500
- 1-49 2,250
9Accounts Types of Users
- Accounts are grouped into lab groups
- Unrestricted User (lab members)
- Can load data into SMD (i.e. have loader account)
- Can edit/delete/change access to all his/her
experiments - Can view/clone all experiments of his/her lab
group - Restricted User (collaborators)
- May view only those arrays for which they were
given access privileges by the experimenter - Can NOT edit or delete data
- Can NOT load data into SMD
- Inactive Users (users who have left the lab)
- Can still see, but no longer enter/edit/delete
their own data - Can no longer view group data
10Accounts loader account
- Every unrestricted user gets an sFTP account
on loader.stanford.edu - Login information (user name and password) is the
same for web and loader accounts - You need an sFTP capable program locally to
transfer files to and from loader account - http//www.stanford.edu/group/itss/ess/
- (e.g. SecureFX for PC, Fetch for Mac)
- This account is used for transferring files to SMD
11Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
- Submitting a Printlist
12Navigating SMD web-site Unrestricted User menu
- Most frequently used programs are accessible from
menu - All help docs help - First item context
specific help - lists -gt index page
13Navigating SMD web-site Unrestricted User
Index page
Restricted users can only see Search and List
Data options
14Navigating SMD web-site SMD Tools
- Find tools that are not so easy to find
15Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
- Submitting a Printlist
16Submitting Data to SMD Prerequisites
- SMD currently accepts expression data produced
by Agilent - GenePix
- ScanAlyze
- SpotReader
- Affymetrix (MAS 5, GCOS, dChip)
- NimbleGen (single channel)
- Illumina (soon)
- The appropriate print design file has to be
entered into the database before experimental
data can be loaded. - If you are using arrays printed at SFGF, this is
done automatically and you dont need to worry
about this - If slides come from external source, please send
us the design file in MAGE-ML or gal format a few
days before you plan to load the data. - If you are creating your own prints godlist file
17Submitting Data to SMD (Finding a print)
- You can find prints from the lists menu -gt
prints - Type in the name of the print (hoad)
- Click on the gal file icon
- Select the id/annotation column you need
- Download the gal file
18Submitting Data to SMD
- Files required
- For ScanAlyze Data
- data.dat, grid.sag, channel1.scn and channel2.scn
- For GenePix Data
- data.gpr, grid.gps, channel1.tif, channel2.tif
- For SpotReader Data
- data.srr, grid.sra, channel1.tif, channel2.tif
- For Agilent Data
- data.txt, shape.shp, channel1.tif, channel2.tif
- For Affymetrix Data
- expt.exp, image.dat, cell.cel, probeset.txt
(currently text versions) - Compressed files are accepted for loading
19Submitting Data to SMD Data Entry
Select Enter My Data from the menu
Or, go here to load your experiments
Go here to annotate your experiments
20Submitting Data to SMD Single Experiment Entry
- Choose whether you are entering a new
experiment, a new result set for already existing
experiment or a batch of experiments/result sets.
- Select printing method
- Select feature extraction software used for data
generation - Select organism whose genes are arrayed
21Submitting Data to SMD Agilent or Affymetrix
Experiment Entry
- Select a Result Set Name and Description
- As for any single Agilent or Affymetrix
experiment there may be n result sets, you must
create a name for each of these sets so that each
result set may be identified and retrieved
unambiguously from the database
22Submitting Data to SMD Data File Locations
- Select the Print Name from the pull-down list
- Enter a Slide Name
- unique
- should be informative
- Barcode
- Slide
- Max of 30 characters
- Choose the data, grid, green scan and red scan
files to be loaded from your loader account - You will get a pull down menu with a list of all
files in the incoming directory of your loader
account
23Submitting Data to SMD Loader
- incoming
- Stores all files prior to experiment loading
(This is where you will upload the data files.) - ORA-OUT
- Feedback files from the database are written to
this directory - Experiment loading logs
- arraylists
- The database will look in this folder to retrieve
arraylists (a.k.a. result set lists) - genelists
- The database will look in this folder to retrieve
any genelists
Clean-up - all inappropriate files are
deleted daily - data files in incoming
directory are deleted when more than 3 weeks
old.
24Submitting Data to SMD Common Problems
- Cant connect to loader- check protocol (sFTP)
- File transfer not complete. Check size of
uploaded files. - Filename problems dont use space and unusual
characters (e.g. !_at_/) - Files stored on loader longer than 3 weeks are
deleted
25Submitting Data to SMD
- Although SMD does archive your data, this is your
primary result
Please Archive Your Data!
26Submitting Data to SMD Affy Expt Loading
- The first 3 files have to be tab-delimited text
- Exp file generated by Affy scanning software and
contains protocol information. - Cell file Data for individual features on the
slide - Gene Intensity file Probe set (gene) level data
- DAT file Image file
27Submitting Data to SMD Affy Expt Loading
- These files should be exported from the analysis
software as tab-delimited text files - The probe set file (MAS5/GCOS) should include the
following columns - Analysis Name, Probe Set Name, Stat Pairs, Stat
Pairs Used, Signal Detection, Detection p-value - Cell file should have these columns
- X, Y, MEAN, STDV, NPIXELS
- We are working to accept binary files
28Submitting Data to SMD Experiment Description
Details
- Experiment Date
- Date of Hybridization
- Date of Data Entry
- SMD Experiment Name
- Unique
- Should be descriptive
- Category/Subcategory
- Green Red channel descriptions
- Reverse Replicate for normalization data
quality purposes - Normalization Type
- (described later)
29Submitting Data to SMDDoping Controls
- Doping control data
- - recommended amount 10 ng
- - after amplification, dilution,
- used 12 ng
- gt factor 1.2
- DCV2.1is given in 2 tubes DCV2.1_MJ and
DCV2.1_A_S. - You can enter them separately (if factors are
different), or as a single mix (if factors are
the same)
30Submitting Data to SMD Experiment Access
- SMD Experimenter
- Person who will have edit/delete/access
privileges - Experiment Owner
- SMD Group
- By default, your lab group will be able to see
all your experiments - If you wish for another entire group to view your
experiments, you select the group name here - SMD Individual User
- Give an individual user the ability to view your
experiment
31Submitting Data to SMD Experiment Access cont.
- World Access
- Selecting Yes here will make your data viewable
by the WORLD! - This is usually only done for published data
32Submitting Data to SMD Errors
- Loading software checks for common errors
- Experiments will not be loaded if there are
errors. You must go back, correct your error(s)
and resubmit your data
33Submitting Data to SMD Queue
- After passing the checks, your data goes to the
loading queue - The queue holds all experiments being loaded and
processes them in an ordered fashion - Progression of loading can be checked by looking
at the log file or from the email you are sent. - If error, please, send link to curators.
34Submitting Data to SMD Batch Loading
- Instead of loading experiments one by one, you
can load experiments in batch - All experiments have to be listed in a
tab-delimited file (a batch file) in your loader
account - There are sample batch files located on the batch
entry help page - http//smd.stanford.edu/help/batch_load.shtml
35Submitting Data to SMD Assembling a Batch File
- (Result Set Name)
- Print Name
- Experiment Category
- Experiment SubCategory
- Slide Name
- Data File Location
- Grid File Location
- Green Scan File Location
- Red Scan File Location
- Experiment Date
- Experiment Name
- Green Channel (CH1) Description
- Red Channel (CH2) Description
- Normalization Type
- Norm Value
- Experimenter
- Experiment Description
- Collaborative Group
- Individual User
All underlined column headers indicate required
data
36Submitting Data to SMD Batch File
37Submitting Data to SMD Batch Loading
- Your batch file and all experiment files MUST be
in your loader account in the incoming directory. - You have the option to first check your batch
file. - This will check for the usual errors before the
data are loaded into queue. - After your batch file has passed the check, you
can load your batch file. - Experiment loading proceeds as for single
experiment entry.
38Submitting Data to SMDCategories Subcategories
- Currently, these are the only searchable fields
to find experiments. Once you publish your data,
you will want outside users to be able to search
the data with useful terms - Before you enter any data into the database you
must have a category and subcategory to annotate
your experiments - Make sure that your categories and subcategories
are meaningful and not cryptic. - Once you have chosen your terms and their
definitions, please email your request to - array_at_genome.stanford.edu
39Submitting Data to SMDCategories Subcategories
Go here to see existing lists of categories
subcategories
40Normalization Why normalize data?
- Normalization allows you to recognize the
biological information in your data and compare
data from one array to another - Goal remove signal that is unrelated to
biological information (dye bias, location bias,
intensity dependence, ). - During data loading simple normalizations are
done - adding new data columns to existing ones
41Submitting Data to SMD Normalizaton
Normalization
Normalized data columns added to original data
(Genepix, Scanalyze, SpotReader)
Uploaded Result file After analysis of the
scanned images
No normalization
No normalization, but several result
sets Agilent, Affy, NimbleGen
42Normalization Channel biases
Before Normalization
43Normalization Channel biases
After Normalization
44Normalization Default normalization in SMD
- Assume that on average the channel intensity
ratio should be 1 (i.e. no difference between
samples/channels) - not always true - Choose spots that are nice in both channels
- Calculate a factor for nice spots
- Apply this factor to channel twos data for all
spots
45Normalization Choosing Spots
- There are two options for selecting nice spots
for normalization (Only non-flagged spots are
used for each) - Regression correlation spots with uniform
color. (pixel regression correlation greater than
0.6) - Computed bright spots. (large percentage of
pixels are at least one standard deviation above
background) - nice spots are those with at least 65 of
pixels significantly above background. - If less than 10 of spots on the array meet the
threshold, the 65 threshold is reduced stepwise
until either 10 of spots pass or the threshold
reaches 55 of pixels above background (whichever
comes first)
46Normalization Calculating and Applying the
Factor
- Normalization factor is the geometric mean of the
red/green ratio of the nice spots (arithmetic
mean of log-ratios) - Alternatively, a user can specify a normalization
factor - Both foreground and background intensity of
channel 2 (red) for all spots are divided by the
normalization factor - Other normalized values are calculated from these
47NormalizationAvailable Tools
- Additional normalization/ background correction
methods are available for experiments already in
the database - In batch
- Individual experiments (see later)
48Re-normalize Experiments
- Default normalization options - Bioconductor
normalizations - Use a subset of spots to do
normalization - Background correction options
49Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
- Submitting a Printlist
50Finding Your Data in SMD
- Ways to search for data (searches public lab
group collaborators your own data) - Advanced Search
- Basic Search
- Experiment List
- Name Search
- Direct ways to data you own
- Display My Data
- Select My Data
51Finding Your Data in SMD Basic Search
- There are three ways to find your data via Basic
Search - Publications include all published data in SMD
- Experiment sets allow you to select pre-defined
experiment groups. - Search for data by category
52Finding Your Data in SMD Advanced Search Results
53Advanced vs Basic Search
- Use Basic Search to retrieve
- a single Publication
- a single Experiment set
- your personal sets
- others, if viewable
- a single Experimental category
- Use Advanced Search to perform
- boolean search
- by Experimenter
- by Category
- by Subcategory
- retrieval by Print
- retrieval by result set list
- search experiment
- word search in description field
54Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
- Submitting a Printlist
55Displaying Your Data
56Display Data
Clone experiment
57Display DataDownload Original Data Files
- Simple script to download original datafiles
loaded into SMD - Same files that were uploaded
- Currently only works for single experiments
- Can be run in batch for a result set list
58Display Options Raw Data
- Downloaded file contains all measured and
normalized data, biological annotations, and
experiment annotation. - File name
- ltexptidgtltsoftwaregtltresult setgt.xls
- E.g., 12345GENEPIX0.xls
- The file is actually a tab-delimited text file
that can be opened in any program
59Display Options Alignment data
60Display Options View Data
- Select Columns to be displayed
- Array expression data
- Biological Annotation Data
- Select filtering criteria
- Select sorting column
- Select how many rows to be displayed per page
- Include controls/nulls
- Data may be viewed or downloaded
61Display Data Clickable Image
- Gives you the array image (gif image, not
original tif files) - Does not give you the filtering option
- If you click on a spot, you get the spot details
62Display Data Spot Image
63Display Data View images with grids
- Assess data quality
- Select data for grids with the usual filtering
options. - Selected spots may be flagged or un-flagged en
masse. - If you see a spot that you want to flag, you can
do so by clicking on the spot. - When you click on a spot you get
64Display Data Plot Array Data
- You can evaluate data quality by plotting values
for any array. - Histograms and scatter graphs
- These graphs are very useful before/after
normalization or selecting filter values for data
retrieval. - Arrays in an Experiment Set may be plotted
simultaneously.
65Display Options View Details
- All experiment and protocol information
- Displays the normalization method and value
- Links to tools to assess the quality of data
- ArrayQuality plots (doping control plots)
66Ratios on Array Tool
- Quick visualization of log-ratio distribution on
the slide - Color assignments are based on log-ratio values
and also intensity - Can visualize normalized or non-normalized
log-ratios - PLUS ANOVA analysis to detect spatial bias
(print-tip or plate)
67Ratios on Array Tool
- Not normalized vs. normalized (loess intensity,
print-tip)
68Display DataClone an Experiment
69Display DataEdit Experiment Details
- Edit all names and descriptions
- Associate clinical information with an array
(only for human arrays) - Experiment Type
- CGH
- Chromatin IP
- Expression Type I
- Expression Type II
- GMS
- Associate procedural information
- View Data Distribution
- Re-normalize data
- Edit doping controls
70Display DataEnter Experiment Annotations
- Single array entry available from edit tool.
- Tag-value data and/or free-text protocols.
- Batch entry available from main page.
71Display DataExperiment Annotation in Batch
- Tab-delimited text file with procedures and
parameters (http//smd.stanford.edu/help/batch_pro
cedure.shtml). Put it into the incoming directory
on loader. - List of procedures, protocols, parameters and
experiment types available
72Display DataExperiment Annotation
- MGED - Microarray Gene Expression Database
Society - In September 2002, MGED sent out a letter to
journals and reviews requesting the microarray
publications have the minimal MIAME information - Several journals have adopted these policies
concerning publications - MIAME checklist http//www.mged.org/Workgroups/M
IAME/miame_checklist.html - Nature Genetics (2001) 29, 365-371.
- SMD allows you to store all required information
(but it wont happen on its own).
73Display DataExperiment Annotation
- Experimental Design
- Array Design
- Biological Samples
- Hybridizations
- Measurements
- Data Normalization and Transformation
74Display Data Changing Access
- Here, you can add or remove experiment access to
individual users or to groups by experiment. - To add access in batch make a resultset list
(arraylist) and use
75Display DataDelete an Experiment
- Only the owner (the experimenter) of an
experiment can delete it - Once an experiment is deleted from the database,
it can not be recovered - Once an experimenter leaves the lab, the lab head
should consider what to do with his/her
experiments, i.e. should the user still have the
ability to delete all their experiments? -
76Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
- Submitting a Printlist
77Organizing Data
- Grouping experiments
- Experiment sets -gt database record
- Result set lists (a.k.a. arraylists) -gt text file
- Grouping genes
- Genelists -gt text file
- Saving files online
- Repository -gt database record
78Organizing DataExperiment sets Arraylists
- From Search pages -gt
- data retrieval
- Create Result Set List
- Create Experiment Set
79Organizing Data Creating a Result Set List
- Name your Result Set List
- Select which experiments to include in the list
- Order experiments
- Select filters to use during data retrieval
- First, customize filters per array, then save the
list - or, save the list without customizing it.
- A file is created in the arraylists directory of
your loader account - You can use this result set list to select your
experiments within the Advanced Search or give
batch access to experiments
80Organizing Data Creating an Experiment Set
- Same first page as for Resultset list to select
and order arrays (no filters) - The following pages allow you to specify
experimental factors (see next page) - and meta-information about the set
- Name
- Experiment set design
- Longevity
- Etc.
- Group of experiments can be made public here. We
ask you to make sets for publications.
81Organizing Data Creating an Experiment Set
- Enter experimental factors and values.
- May be drawn from procedural data, if entered.
82Organizing Data Creating an Experiment Set
- Factor data are available when viewing the
experiment set.
83Organizing Data Arraylist vs Experiment Set
- Result Set List
- Tab-delimited text file that exists in your
loader account - Contains filters and their values per array
- Accessed through Advanced Search
- Can be converted to Experiment Set (link under
Tools)
- Experiment Set
- Annotated list of experiments
- Exists in the database
- Accessed through Basic Search
- Required for publication of data
84Organizing Data Genelists
- What is a genelist?
- A tab-delimited text file containing a list of
gene identifiers that exists in your loader
account in the genelists directory - What is the purpose of a genelist?
- Data retrieval for only a set of genes
- Collapse gene data -gt synthetic genes
- Have you own annotation for these genes instead
of using SMDs - Other uses e.g. normalize arrays based on a list
of genes - There are several shared standard genelists that
are available for many organisms. - You may create your own precompiled list of
genes.
85Organizing Data Creating your own genelists file
- Create a tab-delimited text file.
- The first line of the file must have the
appropriate label for the data contained within
it. - NAME (e.g. YPR119W, IMAGE1542757, HPY1808,
hSQ000234) - LUID (SMD identifier, unique for an instance of a
sequence - plate well) - SPOT
- GOID
- GOTERM
- Your file may contain additional columns with any
type of annotation data you desire for each gene
(Annotation).
86Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Repository
- Submitting a Printlist
87Organizing Data Repository
Here!
88Organizing Data Repository
89Using Your Repository PCL Deposits
- Re-enter data retrieval pipeline to
- Filter by gene expression pattern
- Cluster
- Avoid repeating data retrieval
- Use SVD and KNN Impute tools
- Average data by synthetic genes
- Use analysis tools in GenePattern
- View retrieval report
- Download files
- Share data with collaborators
- 200 MB storage space
90Using Your Repository CDT Deposits
- View cluster using GeneXplorer or (java)TreeView
- View cluster images
- View retrieval and clustering report
- Download files
- Assign access
91Depositing Data
- Deposit from data retrieval pipeline
- Upload from desktop computer
92SMD Staff
Heng Jin Scientific Programmer
Gavin Sherlock Asst. Professor, Co-PI
Michael Nitzberg Database Administrator
Catherine Ball Director
Farrell Wymore Lead Programmer
Janos Demeter Computational Biologist
Zachariah Zachariah Sr. Systems Manager
Jeremy Hubble Programmer
93Questions to SMD
SMD
Send e-mail array_at_genome.stanford.edu Office
hours Mondays 3-5 pm Wednesdays 2-4
pm Office Grant S201 Phone 736 -
0075 Online help http//smd.stanford.edu/help/
index.html
Fairchild
Bio-X
94Welcome to SMD
- User Registration/Accounts
- Navigating SMD
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Submitting a Printlist
95Submitting a Printlist to SMD
- The creation of a print within SMD is a complex
process, but is absolutely required prior to
experiment entry. - If you receive your arrays from SFGF, this is
done automatically and you do not need to stay
for the remainder of the tutorial - A printlist (godlist) is a tab-delimited list of
plate samples (well address contents) in the
order the plates were put in the printer. - There is a program to assist you in printlist
submission - Located under Tools on the SMD homepage
- Printlist must be in your ORA-OUT directory on
loader
96Submitting a Printlist Is a new list required?
- Yes, if the plates used have not been previously
entered into the database - Yes, if the plate was entered in the past, but
their contents have changed over time (well
contamination, well emptied) - No, if your lab makes 3 different prints using
the exact same plates in the same or different
order - Need to supply the database with a list of SMD
plateIDs and plateNames from the first print in
their new order.
97Submitting a Printlist Column Headers for New
Plates
- PLAT The plate number eg 1, 2, 3, etc.
INTEGER - PROW The plate row eg A, B, C, etc. CHARACTER
- PCOL The plate column eg 1, 2, 3, etc.
INTEGER - NAME The sequence name
- usually a systematic name or clone identifier
(I.e. YBL016 or IMAGE753234) - This is the only name used for samples of TYPE
other than CDNA. - TYPE The sequence type
- Usually ORF, CDNA, CONTROL, or EMPTY.
- List of types can be seen from the SMD homepage
under List Data Sequence Type - FAIL Whether the PCR failed
- 0 one distinct band - success
- 1 no signal - fail
- 2 multiple distinct bands
- 3 signal, but not a distinct band (smear)
- 4 multiple smears
- 5 unknown
- 101 worst cases of peeled away or haloed
spots(assigned on a 96 well plate basis) - 102 less bad cases of peeled away or haloed
spots(assigned on a 96 well plate basis) - Null is assumed to be 0 (success)
98Submitting a Printlist Additional Columns for
cDNA data
- CLONEID Required for samples of TYPECDNA, if
ACC is absent/null. Real cDNA clones must have a
cloneID. - ACC Required if CLONEID is absent/null.
- This is the GenBank accession, usually acquired
from dbEST. - IS_CONT Whether the sample is known to be
contaminated. A blank entry will default to
unknown (U) - IS_VER Whether the DNA in a well has been
verified. A blank entry will default to
unverified (U). - SOURCE A string describing the source of the
clone or DNA. This has typically been used to
indicate the original plate source, and the 96
and 384 well plate locations that a clone has
been in - GF20096(1A1)384(1A1).
- GF200 refers to a set of resgen plates
99Submitting a Printlist Optional Columns
- DESC A description of the molecular entity. This
description is associated with the SUID itself
(not a clone or platesample description) - LUID Laboratory Unique ID For those samples
that have identical NAME and TYPE, but require
distinction within the laboratory for
experimental reasons (different sources, new
PCR,new plate). If you wish to enter LUIDs for
your labs platesamples, please contact the
curators array_at_genome.stanford.edu - GENE_NAME Sometimes clones will stop being
included in UniGene for spurious reasons, but
users have a 'Preferred Name' for those clones. - ORIGIN For CDNA clones, this can indicate
whether this is a public or private clone. - SAMPLE_DESC A description, if any, about that
particular sample. This description is specific
to the plate sample. - ORGANISM If submitting a print containing
samples from multiple organisms (i.e. human,
yeast). For those few rows where the sample is
derived from an organism other than the default
(user-defined), the organism code must be
specified.
100Submitting a Printlist Creating New SUIDs
- New samples in your printlist (i.e. not currently
in the database) will need to have a unique
identifier assigned to them (SUID) - A SUID is meant to represent a unique molecular
entity within SMD. It is meaningless outside the
context of the database. - The combination NAMETYPEORGANISM uniquely
identify an SUID - YBL001CORFSC ? SUID3429
- IMAGE486544CDNAHS ?SUID28546
- SUIDs allow comparison of the same samples across
different prints. - It is extremely important that erroneous SUIDs
are not created. - This will prevent comparisons between
prints/experiments
101Submitting a Printlist Avoiding Common Name
Errors
- Erroneous SUIDs are usually created by a bad NAME
- misspelled, non-standard, or non-systematic
- ACT1ORFSC or ActinORFSC ? YFL039CORFSC
- 3X SSCCONTROLSC ? 3xsscCONTROLSC
- Every new sample must be verified by the user
before it is assigned a new SUID and before the
printlist can be entered. - Please be a conscientious user and verify that
any new SUIDs you approve are valid. - Empty wells must be specified as such
- All empty wells must be designated NAMEgtEMPTY
and TYPEgtEMPTY. - Do not use "blank or "control" to describe empty
wells.
102Submitting a Printlist Avoiding Common Errors
- Headers misspelled or absent
- Required data missing
- except FAIL, CLONEID, but column header must
still be present - Correct Plate ordering
- No wells may be skipped (with the exception of
the last plate in the print run). - Useful check number of plate samples number of
printed spots - samples (printlist rows-1) lt tips rows
per sector columns per sector spots -
103Submitting a Printlist Printlist Check Program
- The printlist must be placed in your ORA-OUT
directory on your loader account - This program will assist you in printlist
submission - It follows the rules stipulated above.
- The program will send all feedback to your
ORA-OUT directory - Filename.new
- Filename.errors
104Submitting a Printlist Notify Curators
- Additional information needed
- Number of sector rows/columns
- Distance of rows/columns in sector
- Printing algorithm http//smd.stanford.edu/help/c
reatePrint.shtml - Number of slides printed
- Plate location
- Printer used for printing
- When your printlist is correct - send email with
info above to array_at_genome.stanford.edu
105SubmittingDatato SMDThework flow
106Welcome to SMD
- User Registration/Accounts
- Category/Subcategory
- Submitting Data
- Finding Your Data
- Displaying Your Data
- Organizing Data
- Submitting a Printlist
107Submitting Data to SMD Successful Experiment
Entry
- Once your experiment has been loaded into the
database, there are 2 methods to get the details
of the experiment loading process - From the queue page
- A file will be created on your loader account in
the ORA-OUT directory - process_no.log
108Submitting Data to SMD Example queue logfile
Loading Expt Batch NO 3279
Experiment Name blah blah Thu Dec 13 155401
2001 Processing Data File
/loader/ftphome/youruserid/incoming/slidename.gpr
Inserting experiment info into experiment
table... exptID 28765 The experiment
data has been successfully inserted into
experiment table! Updating Experiment
Access Control Table ... Updating
expt_access for experimenter YOURUSERID () ...
OK Updating expt_access for Brown/Botstein labs
() ... OK Calculate norm value...
Reading all data from datafile and doing all
calculation now... PassCriteria 16005 Using
36490 spots for normalization 43.8 passed
criteria of a good spot with 0.65
Updating exptNorm table... NormType
Computed NormValue 0.96 Updating Result
table... Total Record 43200
Updating Result table...
Expected 43200, actual is 43200 1000 . .
.
109Normalization computed method
- nice spots are those with at least 65 of
pixels significantly above background. - If less than 10 of spots on the array meet the
threshold, the 65 threshold is reduced stepwise
until either 10 of spots pass or the threshold
reaches 55 of pixels above background (whichever
comes first)
110Submitting Data to SMD Data Standards
- MGED - Microarray Gene Expression Database
Society - initially established November, 1999, Cambridge,
UK. - MGED 7, last year at Toronto had over 300
participants - international. - Interest in a data standards and format
specifications.
111Submitting Data to SMD MIAME
- Nature Genetics (2001) 29, 365-371.
112Submitting Data to SMD Upload gif files
- How
- Use your copy of tif files
- Make composite and save as .gif
- Upload on loader into incoming/
- Use the Upload.gif link
- When
- The gif created by the default process is not
acceptable - After renormalization
- If SMDs gif creation fails, notify the curators
before uploading your own - they may be able to
fix the problem.
Data Entry
113Display Options View Details
- Data Distribution
- Plot Data
- Ratios on Array
- Channel Intensities
- These graphs are covered in the data analysis
tutorial.
114Submitting Data to SMD Experiment Entry Log File
- The log file will give you the following
information - ExptID (experiment ID)
- Information on experiment access
- Information on normalization applied
- Number of spots that pass criteria
- Spots used to calculate normalization
- Percentage of spots that passed criteria
- Normalization Value
- Error message if (sub)process failed