Title: People are different
1??????????? ?????? ????????????-???,
15.04.06
??????? ?????????, ???????? ???????????? ????????
??. ???????????? ??? , ??????
2People are different
3and so are their genomes
4???????????
SNP (single nucleotide polymorphism)
????????????? ? ????????? ?? ????? ? ??? ??
??????? ???????? ??? ???? ???????????? ?????????
? ???????? ????? ??????? ???????? (??????) 1
NaNg N, Na/N 0.01, Ng/N 0.01
5??????????? ? ???????????
- ???? ???? ? ????????? ??????????????????? ??????
????. ???? - ????? ??????????? ?? ????? ? ??????? ?????
?????????????? ????? (?.????????, ??????
?????????) - ? ????????? ???? ??? ????????????? ???? ?????
????????????? ?????? ????????? (?.?. ??????????
??? ??? ??????? ????? ???????) - ??????????? ????????????? ??????????? ?????????
?????? ? ?????????(-??), ??? ? ??????? ????????
???? ????????
6???? ???????????? ? ??????
???????????????? (SNP) ????????
???????/??????? ???????????????? ??????
????????? ????? (VNTR, variable number tandem
repeat) ??????? ??????? ?????????????
???????????? (MNP)
7????????? ???????? SNPs
- Comprise the 90 of human genetic variation
- Occur with an average density 1/600 bp
- Transition C?T(G?A) occurs at 2/3 of all cases,
three transversions C?A (G?T), C?G(G?C), T?A(A?T)
in 1/6 of all cases each - Most of them (85) are common to all
populations (with differing allele frequencies)
8Why SNPs are important?
- Convenient genetic markers
- Responsible for existence of various phenotypes,
with primary interest in disease ones - Pharmacogenomics individual response to drugs
- Clues to understand human evolution
9SNP ? ?????? ????????
10????????????? SNP ?? ????????? ? ??????
1. ???? 1.1 UTR 1.2 ?????? (cSNP) 1.2.1
????????????(sSNP) 1.2.2 ??????????????
(nsSNP) 1.3 ??????? 1.4 ????? ?????????? 2.
???????????? ??????? ????? (rSNP) 3. ?????????
???????
11Synonymous vs. non-synonymous SNPs
Example Lysosomal alpha-glucosidase precursor
(SwissProt P10253)
Hypothetical SNP C ? T
HGVBase ID SNP000003023 G ? C
CAC CAG CTC CTG TGG GGG GAG GCC CTG CT CAC CAG
CTC CTG TGC GGG GAG GCT CTG CT
nsSNP Trp746?Cys
sSNP Ala749?Ala
12Summary of Annotation on human Genome Build 33 dbSNP Build 124 Summary of Annotation on human Genome Build 33 dbSNP Build 124 Summary of Annotation on human Genome Build 33 dbSNP Build 124 Summary of Annotation on human Genome Build 33 dbSNP Build 124
FUNCTION CLASS CODE SNP COUNT GENE COUNT FUNCTIONAL
FUNCTION CLASS CODE SNP COUNT GENE COUNT CLASSIFICATION
1 338787 26210 Locus region
3 39214 14342 Allele synonymous to contig nucleotide
4 50772 15710 Allele nonsynonymous to contig nucleotide
5 546965 17898 untranslated region
6 2925773 19332 intron
7 832 769 splice site
8 89554 18655 Allele is same as contig nucleotide
9 7111 1006 Coding synonymy unknown
13????????? ???? SNP (?? MillerKwok, 2001)
- ????????? ?????? ?????????? ???????? ?????
??????? (100 ??????? ?? ??????????) - ????????? ?? ??????? ????????? ????????? ??
????? ?????? - ????????? ?????????? ??????? ? ?????????
- ???????? ?????? ?????? (0 vs. 100), ???????????
? between-species difference
14?????????
????????? ???? ????????? ???? SNP ???????? 0.3
??? ???. ???????????, ??? ?????????? ???????? ?
???????? ????????? 5 ??? ??? ?????, ? ?????
H.sapiens ?? ?????? ? ?????????? ?????????
????????? 0.1-0.2 ??? ??? ?????, ???????
?????????? (?) ?????????? SNPs ? ???????? ?
?????? ?????, (?) private SNP, ?.?.
?????????????? ? ???????? ????? ????????????
?????????
15Why polymorphisms are maintained in the
population?
- Selectionists because heterozygotes have higher
fitness - Neutralists because all observed polymoprhisms
are selectively neutral - - - - - - -- - - - - - - - - - - - - - - - - - -
- - - - - - - Reality is always somewhat more complicated
16Why SNPs are important?
- Convenient genetic markers
- Responsible for existence of various phenotypes,
with primary interest in disease ones - Pharmacogenomics individual response to drugs
- Clues to understand human evolution
17nsSNPs vs. disease mutations
- Disease mutations are rare (ltlt1) and usually
cause monogenic diseases (e.g., cystic fibrosis) - nsSNPs are frequent (gt1) and can modify risks of
major common (multigenic, complex) diseases
(e.g., cancer, cardiovascular disease, mental
illness, autoimmune states, diabetes) - In some cases, however, it is difficult to make a
distinction
18Some common nsSNPs are known to affect critical
structure features
Frequency of the haemochromatosis allelic variant
of HLA-H protein Cys260Tyr (with destroyed
disulphide bond) is up to 6 in Northern Europe
19Application area for prediction methods
- Genetics of complex diseases
- Analysis of human birth defects
- Genetics of rare developmental phenotypes
(analysis of de novo mutations that cannot be
mapped by genetic techniques) - Genetics of model organisms (identification of
genes involved in diverse processes by
mutagenesis screens) - Genomics and evolutionary genetics (e.g.,
quantifying selective pressure)
20Identifying SNPs responsible for complex
diseases general strategies
- whole genome scan hypothesis free approach
extraordinary number of candidate SNPs - candidate gene studies requires a priori
models nevertheless, large numbers of candidate
SNPs must be tested
21Identifying SNPs responsible for complex
diseases application
1. A SNP with established association need not be
functional therefore, in silico expertise is
required for selection of potentially functional
SNPs 2. Detection of enrichment of rare
potentially functional alleles in the disease
population (plasma levels of HDL-cholesterol,
hypertension, colorectal cancer)
22Methods for prediction of effect of nsSNPs
- Sequence-based methods analysis of multiple
alignment with homologs Ng-Henikoff 2002 - Structure-based methods analysis of various
structural parameters Wang, Moult 2001
Chasman, Adams 2001 - Combined methods sequence and structure
analysis Sunyaev,Ramensky,Bork 2000, 2001, 2002
23PolyPhen prediction of amino acid substitution
effect on protein function
Prediction benign (neutral), damaging
(deleterious)
24PolyPhen prediction of amino acid substitution
effect on protein function
- Data sources
- Sequence annotation of the query protein
- PSIC profile matrix values derived from multiple
alignment with homologous proteins - Structural parameters and contacts of query
protein structure or its gt50 homolog
Prediction benign (neutral), damaging
(deleterious)
25I. Sequence annotation
Hereditary hemochromatosis protein precursor
(HLA-H, Q30201)
Features checked bond DISULFID, THIOLEST,
THIOETH site BINDING, ACT_SITE, LIPID, METAL,
SITE, MOD_RES, SE_CYS region TRANSMEM, SIGNAL,
PROPEP
26II. PSIC profile analysis of homologous sequences
- Align with homologous proteins with seq. ide.
30..94
27II. PSIC profile analysis of homologous sequences
2. Calculate the profile matrix with PSIC
algorithm
Profile matrix Sa,j ln pa,j / qa , a
1,..20, j 1,..N, N alignment length
28II. PSIC profile analysis of homologous sequences
3. Analyse difference between profile scores for
two a.a. variants
Asn?Cys ? SAsn,4 SCys,4 1.591
29III. 3D structure analysis
1. Residues that are in spatial contact with a
ligand or other critical residues
Zen 999
Bos Taurus trypsin PDB ID 1ql7
residues in 5Å contact with Zen 999
30III. 3D structure analysis
2. Residues that form the hydrophobic core of the
protein (buried residues)
Surface residues Buried residues
Bos Taurus trypsin PDB ID 1ql7
31Structural parameters and contacts
- Secondary structure
- Phi-psi dihedral angles
- Solvent accessible surface area, normed s.a.s.a
- Change in accessible surface propensity
- Change in residue side chain volume
- Contacts with heteroatoms
- Interchain contacts
- Contacts with functional sites (BINDING,
ACT_SITE, LIPID, and METAL) - Region of the phi-psi map (Ramachandran map)
- Normalised B-factor (temperature factor)
32(No Transcript)
33Validation control sets
- all dam unknown dam/(damben)
-
- Disease mutations
- Strict set 444 366 3 82.9
- Total 2,782 2,047 70 75.4
- Between species substitutions
- Total 671 58 5 8.7
34Validation case studies
- APEX1 protein 24 out of 26 substitutions
predicted correctly (Xi et al.) - Plasminogen activator inhibitor-2 18 out of 20
(Di Guisto et al.) - 3 HapMap populations and 10 primate species
analysis of 27,000 nsSNPs with frequencies
(Victoria Carlton, AFFYMETRIX, private
communication)
35Validation allele frequency
36Validation nsSNPs vs. human-mouse interspecies
variation
37PolyPhen predictions for dbSNP b.121
Ivan Adzhubei, 2004
- All
- 9,502 unknown
- 27,991 benign...............67.6
- 7,905 possibly damaging....19.1
- 5,521 probably damaging....13.3
- 50,919 total (44,005 unique rss)
- With structure
- 42 unknown
- 2,142 benign...............57.1
- 531 possibly damaging....14.2
- 1,076 probably damaging....28.7
- 3,791 total (,167 uniqe rss)
38PolyPhen predictions for dbSNP b.121
Ivan Adzhubei, 2004
- All
- Filtered ?5 seq. in multiple alignment
- 16,813 benign...............64.2
- 5,195 possibly damaging....19.8
- 4,168 probably damaging....15.9
- 26,176 total (21,677 unique rss)
- With structure
- Filtered ?5 seq. in multiple alignment
- 2,021 benign...............56.6
- 499 possibly damaging....14.0
- 1,050 probably damaging....29.4
- 3,570 total (2,983 unique rss)
39Hydrophobic core stability parameters are the
best predictors
Ramensky et al., Nucleic Acids Res. (2002)
303894-90
40PolyPhen http//www.bork.embl.de/PolyPhen
- PolyPhen input
- Protein identifier OR sequence
- Substitution position
- Substitution type
41PolyPhen http//www.bork.embl.de/PolyPhen
42PolyPhen nsSNPs data collection
43Transphyretin (PDB 1tyr, SNP000012365) Thr118 ?
Asn occurs at the ligand (REA) binding site
Thr 118
REA 130
DAMAGING nsSNPs
44Trypsin (PDB 1trn, SNP000012965) Ser142?Phe
results in the strong side chain volume change at
a buried position
Ser 142
DAMAGING nsSNPs
45Damaging nsSNPs
- We estimate that 20 of non-synonymous cSNPs
from databases are damaging - Average allele frequency of non-synonymous cSNPs
predicted to be damaging is twice lower than for
benign non-synonymous cSNPs - We propose to use these predictions for
prioritisation of candidates for association
studies
46Development directions
- Better multiple alignment pipeline
- Compensated nsSNPs
- Non-globular structural regions
- Non-coding SNPs
47An example of compensated pathogenic deviation
48Polyphenism the ability of a single genome to
produce two or more alternative morphologies
within a single population in response to an
environmental cue (such as temperature,
photoperiod, or nutrition). Dr. Ehab Abouheif,
McGill University, Montréal Québec
The seasonal morphs of the buckeye butterfly,
Precis coenia (Nymphalidae). The ventral surfaces
are shown. The Summer morph ("linea") is on the
left the Fall morph ("rosa") is on the right.
Scott F.Gilbert, A Companion to Developmental
Biology. Chapter 22, Seasonal Polyphenism in
Butterfly Wings
49People
Shamil Sunyaev(1), Vasily Ramensky(2), Steffen
Schmidt(1), Ivan Adzhubei(1) (1) Division of
Genetics, Department of Medicine, Brigham and
Womens Hospital, Harvard Medical School, Boston,
USA) (2) Engelhardt Institute of Molecular
Biology Moscow Russia)
Peer Bork, Yan P. Yuan (European Molecular
Biology Laboratory, Heidelberg, Germany)