Title: RECOMBINOMICS: Myth or Reality?
1RECOMBINOMICS Myth or Reality?
Laxmi Parida IBM Watson Research New York, USA
2RoadMap
- Motivation
- Reconstructability (Random Graphs Framework)
- Reconstruction Algorithm (DSR Algorithm)
- Conclusion
3(No Transcript)
4www.nationalgeographic.com/genographic
5www.ibm.com/genographic
6- Five year study, launched in April 2005 to
address anthropological questions on a global
scale using genetics
as a tool - Although fossil records fix human origins in
Africa, little is known about the great journey
that took Homo sapiens to the far reaches of the
earth.
How did we, each of
us, end up where we are? - Samples all around the world are being collected
and the mtDNA and Y-chromosome are being
sequenced and analyzed
7DNA material in use under unilinear transmission
16000 bp
58 mill bp 0.38
8Missing information in unilinear transmissions
past
present
9Paradigm Shift in Locus Analysis
- Using recombining DNA sequences
- Why?
- Nonrecombining gives a partial story
- represents only a small part of the genome
- behaves as a single locus
- unilinear (exclusively male of female)
transmission - Recombining towards more complete information
- Challenges
- Computationally very complex
- How to comprehend complex reticulations?
-
10RoadMap
- Motivation
- Reconstructability (Random Graphs Framework)
- Reconstruction Algorithm (DSR Algorithm)
- Conclusion
L Parida, Pedigree History A
Reconstructability Perspective using
Random-Graphs Framework, Under preparation.
11RoadMap
- Motivation
- Reconstructability (Random Graph Framework)
- Reconstruction Algorithm (DSR Algorithm)
- Conclusion
L Parida, M Mele, F Calafell, J Bertranpetit and
Genographic Consortium Estimating the Ancestral
Recombinations Graph (ARG) as Compatible Networks
of SNP Patterns Journal of Computational
Biology, vol 15(9), pp 122, 2008
L Parida, A Javed, M Mele, F Calafell, J
Bertranpetit and Genographic Consortium,
Minimizing Recombinations in Consensus Networks
for Phylogeographic Studies, BMC Bioinformatics
2009
12INPUT Chromosomes (haplotypes)
OUTPUT Recombinational Landscape (Recotypes)
13Our Approach
Granularity g
NO
Acceptable p-value?
YES
IRiS
Analyze Results
M Mele, A Javed, F Calafell, L Parida, J
Bertranpetit and Genographic Consortium
Recombination-based genomics a genetic
variation analysis in human populations,under
submission.
14Preprocess Dimension reduction via Clustering
11 12 13 14 15 16 0 17 1 18 4 19 6
5 20 8 21 9 10 7 22 23 3 2 24
15Analysis Flow
Granularity g
NO
Acceptable p-value?
YES
IRiS
Analyze Results
16p-value Estimation
17Comparison of the Randomization Schemes
18SNP Blocks (granularity g3)
19Analysis Flow
Granularity g
NO
Acceptable p-value?
YES
IRiS
Analyze Results
20IRiS(Identifying Recombinations in Sequences)
Stage Haplotypes use SNP block patterns
Segment along the length infer trees
computational insights
Infer network (ARG)
L Parida, M Mele, F Calafell, J Bertranpetit and
Genographic Consortium Estimating the Ancestral
Recombinations Graph (ARG) as Compatible Networks
of SNP Patterns Journal of Computational
Biology, vol 15(9), pp 122, 2008
21Segmentation
12345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890
1234567890123456789012345 111111111111111111111111
11111111111111112222222222222222222222222222222222
233333333344444444455555555555555----
22Segmentation
23Consensus of Trees
24Algorithm Design
- Ensure compatibility of component trees
- Parsimony model minimize the no. of
recombinations
25Algorithm Design
- Ensure compatibility of component trees
- Parsimony model minimize the no. of
recombinations
Theorem The problem is NP-Hard.
It is impossible to design an algorithm that
guarantees optimality.
26DSR Scheme (DominantSubdominant---Recombinant)
27DSR Scheme Level 1
28DSR Assignment Rules
- At most one D per row and column if no D, at
most one S per row and column - At most one non-R in the row and column, but not
both
29DSR Assignment Rules
- Each row and each column has at most one
D ELSE has at most one S - A non-R can have other non-Rs either in its row
or its column but NOT both
30DSR Scheme Level 1
31DSR Scheme Level 2
32DSR Scheme Level 2
33DSR Scheme Level 3
34DSR Scheme Level 3
35DSR Scheme Level 4
36DSR Scheme Level 5
37Mathematical Analysis Approximation Factor
- Greedy DSR Scheme
- Z and Y are computable functions of the input
L Parida, A Javed, M Mele, F Calafell, J
Bertranpetit and Genographic Consortium,
Minimizing Recombinations in Consensus Networks
for Phylogeographic Studies, BMC Bioinformatics
2009
38Analysis Flow
Granularity g
NO
Acceptable p-value?
YES
IRiS
Analyze Results
39 IRiS Output RECOTYPE
- Recombination vectors
- R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11
R12 R13 R14 . - s1 1 0 0 0 1 1 1 1
0 0 0 0 1 0 . - s2 0 1 0 1 1 1 0 1
0 0 1 0 0 0 . - .
- .
- .
- .
40Quick Sanity CheckUltrametric Network on
RECOTYPES
41IRiS(Identifying Recombinations in Sequences)
Stage Haplotypes use SNP block patterns
IRiS software will be released by the end of
summer 09 Asif Javed
Segment along the length infer trees
computational insights
Infer network (ARG)
L Parida, M Mele, F Calafell, J Bertranpetit and
Genographic Consortium Estimating the Ancestral
Recombinations Graph (ARG) as Compatible Networks
of SNP Patterns Journal of Computational
Biology, vol 15(9), pp 122, 2008
42Whats in a name?
RECOMBIN-OMICS
Jaume Bertranpetit RECOMBIN-OM
ETRICS
Robert Elston
- Allele-frequency variations between populations
is also reflected in the purely
recombination-based variations - Detects subcontinental divide from short segments
- based on populations level analysis
- Detects populations from short segments
- based on recombination events analysis
43- Allele-frequency variations between populations
is also reflected in the purely
recombination-based variations - Detects subcontinental divide from short segments
- based on populations level analysis
- Detects populations from short segments
- based on recombination events analysis
Are we ready for the OMICS / OMETRICS? o
population-specific signals ?o other critical
signals ? o anything we didnt already know?
44Thank you!!
45(No Transcript)