Title: Standards and Data Handling with Microarrays
1Standards and Data Handling with Microarrays
2What is he going to talk about?
- Data handling- some database stuff
- followed by
- Standards- the MGED group of standards
3What is he going to talk about?
- Data handling- some database stuff
- followed by
- Mostly standards- the MGED group of standards
- MIAME
- Ontology
- MAGE
4Some Databases
- ArrayExpress- the EMBL of Microarrays- MAGE based
- MIAMEExpress
- Various species specific databases
5The Microarray Gene Expression Data Society (MGED)
- A standards body for microarrays
- Split into 4 working groups
- MIAME
- Ontology
- MAGE
- Data processing
http//www.mged.org
6Why have standards?
- Standards allow us to
- Be prescriptive (Oi you, do this)
- Aid communication between people
- Produce best practices
- Enables replication of experiments
7The Microarray Gene Expression Data Society (MGED)
- A standards body for microarrays
- Split into 4 working groups
- MIAME
- Ontology
- MAGE
- Data processing
http//www.mged.org
8The MIAME Standard
- Pronounced like the city. Miami.
- An abbreviation for Minimum Information about A
Microarray Experiment
9MIAME
- Scientific papers should contain all information
required to replicate an experiment - What information do you need for a microarray
experiment? - People talk of documents being MIAME-compliant.
10A guide to MIAME
11World of Microarrays
- Array Design
- Experimental Design
- Samples used, extract preparation and labelling
- Hybridization procedures and parameters
- Measurement data and specifications of data
processing
12Array Design
13ArrayArray as a whole
Platform Type
Dimensions of the array
Surface and coating specifications
Number of Features
Availability of array, or production protocol
14Some terminology
- Any given slide/genechip/whatever is an Array
- The design of the array (which spot goes where,
and where all of the spots are) is an Array
Design - Any spot etc. on the array is a Feature
- What the feature is made of is a Reporter
15Features
- A feature is a specific spot on an ArrayDesign
- Record
- Feature dimensions
- How the feature is attached to the array
- Where each individual feature is on the
ArrayDesign
16Reporters
- A reporter is a specific probe
- You could spot a reporter multiple times on one
array, or even on multiple arrays
17Reporters
- Record
- What type of reporter it is
- Single or double stranded
- The sequence if available
- If not PCR primers, approximate length
- How this reporter was made
- What it reports for (database reference)
18Special features
- Control features
- How is it a control
- Composite features
- How combined?
- What does it report for?
19Samples used, extract preparation and labelling
20Biosource and Biosample
- Biosources and Biosamples are either
- The original, living things that the RNA was
extracted from - Or an extract from that organism
- e.g. Patient, Mouse, mouse brain, RNA extract are
all Biosamples/Biosources. - Biosources are the original material you start
with- biosamples are all subsequent steps.
21Biosourcerecord
Contact details for sample (where did you get it
from?)
Mouse (?)
Arabidopsis
Human
Organism
22More BioSource
- Relevant descriptors such as
- Sex
- Age
- Developmental stage
- Organism part
- Cell type
- Plant strain or line
- Genetic variation
- Individual genetic characteristics
- Disease state
- Additional clinical information
- Which individual?
23Progression of a microarray experiment
Protocol
Protocol
Protocol
Protocol
Seed
Plant
Cold plant
RNA extract
Labelled RNA extract
BioSource
BioSample
BioSample
BioSample
LabelledExtract
24Protocols
- Between each step is a protocol
- e.g.
- Growth protocol
- Sample treatment protocol
- Separation technique
- Extraction protocol
- Labelling protocol
25Extraction protocol
- Extraction method
- RNA, mRNA, genomic DNA extracted
- Amplification steps
26Labelling Protocol
- Amount labelled
- Label used
- Label incorporation method
- Spikes type, qualifier (e.g. concentration,
expected ratio), the elements on arrays that
match the spike
27Hybridisation Protocol
- Which samples were hybridised to which arrays
- Solution used
- Blocking agent
- Wash procedure
- Quantity of labelled extract used
- Time, concentration, volume, temperature
- Description of hybridisation instruments
28Measurement Data and Specifications of Data
Processing
- Three steps
- Scan
- Raw analysis
- Summary analysis
29Scanning protocol
- Scanning Hardware, Software used
- Scan Parameters used e.g. PMT voltage
- Scanned images (?)
30Image analysis protocol
- Image analysis software and version (e.g. GenePix
version 4) - Description of the algorithm used
- Parameters used
- All output from the analysis
31Normalised and Summary data
- Data processing protocol
- Gene expression data tables derived from the
experiment as a whole. - Derived measurement value
- Reliability indicators
32Experiment
- About the complete experiment
33Experiment description
- Which Arrays are part of the experiment?
- Which Samples are part of the experiment?
34Experiment Design
- Contact details for the experiment
- Type of the experiment (gene knock-out,
treatment) - Arrays and experimental factors
- Quality control steps taken
- Description for the experiment
35No more MIAME!
36The Microarray Gene Expression Data Society (MGED)
- A standards body for microarrays
- Split into 4 working groups
- MIAME
- Ontology
- MAGE
- Data processing
http//www.mged.org
37Computer Standards
- Computers are a lot less forgiving than people
- Computer standards allow lots of different
software to work together
38Computer standards
- People are beginning to realise that it is useful
for computers to understand databases, as well as
people
39Ensembl example
40Advantages of data standards
- Use many databases together
- Better automated searching
- One piece of software can use many databases
interchangeably
41What is an ontology?
- An ontology is like a dictionary- it defines many
terms - An ontology is a restricted dictionary- it
attempts to avoid duplication - An ontology is a structured dictionary- it states
how the terms relate to each other
42Demonstration of Gene Ontology, MGED ontology
43The Microarray Gene Expression Data Society (MGED)
- A standards body for microarrays
- Split into 4 working groups
- MIAME
- Ontology
- MAGE
- Data processing
http//www.mged.org
44MAGE
- Computer standard
- A standard object model for Microarrays
- Describes the various parts of a microarray
experiment in a standard way
45MAGE-ML
- A collection of these objects in an XML format
- One file can represent an entire experiment, part
of an experiment, many experiments, or part of
many experiments - Example- Affymetrix supply ArrayDesign objects in
a file to describe all of their chips
46The end!
- Ive shown you
- Some microarray databases you can get data from
- MIAME
- Ontologies
- MAGE