Title: Hunting strategy of the bigcat
1Hunting strategy of the bigcat
BiGCaT Bioinformatics
2BiGCaT,bridge between two universities
TU/eIdeas Experience in Data Handling
Universiteit Maastricht Patients,
Experiments,Arrays and Loads of Data
BiGCaT
3Major Research Fields
Nutritional EnvironmentalResearch
CardiovascularResearch
BiGCaT
4What are we looking for?
5What are we looking for?
Different conditions show different levels of
gene expression for specific genes
6Differences in gene expression?
- Between e.g.
- healthy and sick
- different stages of disease progression
- different stages of healing
- failed and successful treatment
- more and less vulnerable individuals
- Shows
- important pathways and receptors
- which then can be influenced
7The transfer of informationfrom DNA to protein.
From Alberts et al. Molecular Biology of the
Cell, 3rd edn.
8Eukaryotic genesin somewhat more detail
9Gene expression measurement
DNA ? mRNA ? protein
- Functional genomics/transcriptomics
- Changes in mRNA
- Gene expression microarrays
- Suppression subtraction lybraries
-
- Proteomics
- Changes in protein levels
- 2D gel electrophoresis
- Antibody arrays
-
10Gene expression arrays
- Microarrays relative fluorescense signals.
Identification.
Macroarrays absolute radioactive signal.
Validation.
11Layout of a microarray experiment
- Get the cells
- Isolate RNA
- Make fluorescent cDNA
- Hybridize
- Laser read out
- Analyze image
12The cat and its preythe data
- Comprises
- Known cDNA sequences (not known genes!)on the
array reporters - Data sets typically contain 20,000 image spot
intensity values in 2 colors - One experiment often contains multiple data
points for every reporter (e.g. times or
treatments) - Each datapoint can (should) consist of multiple
arrays - Bioinformatics should translate this in to useful
biological information
13Hunting
- Comprises
- Analyze reporters
- Data pretreatment
- Finding patterns in expression
- Evaluate biological significance of those patterns
14Reporter analysis
- Reporter sequence must be known(can be sequenced
using digest electrophoresis). - Lookup sequence in genome databases (e.g.
Genbank/Embl or Swissprot) - Will often find other RNA experiments (ESTs) or
just chromosome location.
15Blast reporters against what?
- Nucleotide databases (EMBL/Genbank)Disadvantages
many hits, best hit on clone, we actually want
function (ie protein) - Nucleotide clusters (Unigene)Disadvantage still
no function - Protein databases (SwissprottrEMBL)Disadvantages
non coding sequence not found, frameshifts in
clones
16Two implemented solutions
- Start with Unigene (from Blastn or platform
provider), mine using SRS (direct, through PDB,
through PIR) -gt Swissprot/trEMBL - Use dedicated EMBL-Swissprot X-linked DB (Blast
against EMBL subset get Swissprot/trEMBL)
17Two implemented solutions
- Start with Unigene (from Blastn or platform
provider), mine using SRS (direct, through PDB,
through PIR) -gt Swissprot/trEMBL - Use dedicated EMBL-Swissprot X-linked DB (Blast
against EMBL subset get Swissprot/trEMBL)
18Scotland - Holland 1-0?
- Check Affymetrix reporter sequences.
- Each reporter 16 25-mer probes.
- Blast against ENSEMBL genes(takes 1 month on UK
grid). - Use for cross-species analysis
- Adapt RMA statistical analysis in Bioconductor
19Next slide shows data of one single actual
microarray
- Normalized expression shown for both channels.
- Each reporter is shown with a single dot.
- Red dots are controls
- Note the GEM barcode (QC)
- Note the slight error in linear normalization
(low expressed genes are higher in Cy5 channel)
20(No Transcript)
21Next slide shows same data after processing
- Controls removed
- Bad spots (lt40 average area) removed
- Low signals (lt2.5 Signal/Background) removed
- All reporters with lt1.7 fold change removed (only
changing spots shown)
22(No Transcript)
23Final slide shows information for one single
reporter
- This signifies one single spot
- It is a known genean UDP glucuronyltransferase
- Raw data and fold change are shown
24(No Transcript)
25Secondary Analyses
- Gene clustering(find genes that behave equally)
- Cluster evaluation(what do we see in clusters )
- Physiological evaluation(for arrays, proteomics,
clusters) - Understand the regulation
26Clustering find genes with same pattern
Left hand picture shows expression patterns for 2
genes (these should probably end up in the same
cluster). Right hand picture shows the expression
vector for one gene for the first 2 dimensions.
Can be normalized by amplitude (circle) or
relatively (square).
27Cluster evaluation
- Group genes (function, pathway, regulations etc.)
- Find groups in patterns using visualization tools
and automatic detection. - Should lead to results likeThis experiment
shows that a large number of apoptosis genes are
up-regulated during the early stage after
treatment. Probably meaning that cells are dying
28Example of GenMAPP results
Manual lookup on a MAPP
29Understanding regulation
- The main idea co-regulated genes could have
common regulatory pathways. - The basic approach annotate transcription factor
binding sites using Transfac and use for
supervised clustering. - The problem each gene has hundreds of tfbs.
- Solution? Use syntenic regions using rVista (work
in progress with Rick Dixon)
30Understanding QTLs
- Get blood pressure QTLs from ENSEMBL/cfg
Welcome group. - Look up functional pathways and Go annotations
using GenMapp virtual experiment assume all
genes in QTL are changing. - Create a new blood pressure Mapp confront this
with real blood pressure/heart failure microarray
data. - Work in progress TU/e MDP3 group.
31People involved
Bigcat Maastricht Rachel van Haaften (IOP),
Edwin ter Voert (BMT), Joris Korbeeck (BMT/UM),
Willem Ligtenberg (IOP), Stan Gaj (tUL), Chris
Evelo Tue Peter Hilbers, Huub ten Eijkelder,
Patrick van Brakel, lots of students CARIM Yigal
Pinto, Umesh Sharma, Blanche Schroen, Matthijs
Blankesteijn, Jos Smits, Jo de Mey, Danielle
Curfs, Kitty Cleutjens, Natasja Kisters, Esther
Lutgens, Birgit Faber, Petra Eurlings,
Ann-Pascalle Bijnens, Mat Daemen, Frank Stassen,
Marc van Bilssen, Marten Hoffker. NUTRIM Wim
Saris, Freddy Troost, Johan Renes, Simone van
Breda.GROW Daisy vd Schaft, Chamindie
PuyandeeraIOP Nutrigenomics Milka Sokolovic,
Theo Hackvoort, Meike Bunger, Guido Hooiveld,
Michael Müller, Lisa Gilhuis-Pedersen, Antoine
van Kampen, Edwin Mariman, Wout Lamers, Nicole
Franssen, Jaap keijer Cfg Welcome group Neil
Hanlon (Glasgow) Gontran Zepeda (Edinburg), Rick
Dixon (Leicester), Sheetal Patel (London). Paris
leptin group Soraya Taleb, Rafaelle
Cancello,Nathalie Courtin, Carine
ClementOrganon Jan Klomp, Rene van
Schaik. BioAsp Marc Laarhoven.