Software for population genetics - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Software for population genetics

Description:

Software for population genetics Structure: J. K. Pritchard et al. Geneclass 2: S. Piry et al. Structure Identification of genetic clusters Identification of ... – PowerPoint PPT presentation

Number of Views:255

Avg rating:3.0/5.0

Slides: 28

Provided by: IesNi3

Category:

more less

Transcript and Presenter's Notes

Title: Software for population genetics

1
Software for population genetics
Structure J. K. Pritchard et al. Geneclass 2
S. Piry et al.
2
Structure

Identification of genetic clusters
Identification of subclustering within breeds or
relationships between breeds
Breed assignment of unkown samples to reference
set

3
StructureIdentification of genetic clusters

Baysian likelihood method of identifying K
clusters
K number of clusters/populations provided by
user or inferred by Structure

4
StructureAncestry models

No admixture model Each individual originates
from one of the K populations
Admixture models Each individual has genomic
fractions of more than one of the K populations
Linkage model admixture model, but linked loci
are more likely to originate from the same
population.
Prior information model user pre-defines (some
of) the clusters
NB the model is also determined by the type of
data one has!!

5
StructureAnchestry models and input data

Dominant markers noadmixture model.
AA and Aa cannot be distinguished so only a
present or absent genotype is available.
AFLP, RFLP etc
Sequence data, Y chrom or mtDNA haplotypes
linkage model. Consider this as a single locus
with many alleles.

6
Structureallele frequency models

Correlated allele frequencies frequencies in
different populations are likely to be similar
(due to migrations or shared ancestry).
Independent allele frequencies allele freqencies
are independent draws from a distribution
specified by a factor ?

7
StructureDetermining the K

How to estimate the number of populations /
clusters in your dataset?
Fully resolving all the groups in your data (high
K) testing all K values until highest likelihood
values are reached.
Determining the rough relations (low K)
Trail and error

8
Structurerunning parameters

Likelihood method the program optimizes its own
internal parameters.
Startup configuration can have a very low
probability, so Structure needs a learning run
the burnin (10.000-100.000 replicates)
Actual run enough replicates to obtain
statistically sound results (depending on your
dataset) 50.000 (?)

9
(No Transcript)
10
Geneclass 2breed assignment

Software for Genetic assignment and
first-generation Migrant Detection
S. Piry, A. Alpetite, J.-M. Cornuet, D. Paetkau,
L. Baudouin, A. Estoup
INRA, Fr.
Journal of Heredity 200495(6) 536-9

11
(No Transcript)
12
Geneclass 2breed assignment

Infers the probability of assignment of reference
populations as origin of sampled individuals on
the basis of multilocus genotypic data.
Haploid or diploid or mix.
Likelihood criteria
Genetic distances
Allele frequencies
Bayesian algorithm
Monte Carlo resampling

13
Two examples

Products of protected geographical origin (PGI)
Vitellone dellAppennino Centrale
Allowed breeds Chianina,Romagnola, Marchicana
Not allowed Piedmontese, Maremmana, Pezzata
Rossa Italiana, Italian Brown, Italian Friesian,
Charolais, Limousin, Belgian Blue
Veau du Limousin
Allowed breeds Limousin, Blonde
d'Aquitaine,Limousin, Bazadaise
Not allowed Holstein, Friesian, Fries-Hollands,
Belgian Blue, Main-Anjou, Normand,
Bretonne-pied-noire, Charolais, Hereford,
Aberdeen Angus, Gasconne, Aubrac, Salers,
Montbélliard, Simmental, Piedmontese, Swiss
Brown, Pirinaica

14
Objective?

Identify a representative sample from a batch
Traceability
Fraud?
Protection of the (cultural, economic) integrity
of the product

15
How?

Typing with microsatellites.
Compare patterns / allele frequencies with
reference set.
Reference library product of EU diversity
project Resgen
45 breeds (still adding)
20 animals per breed
30 microsatellite markers

16
(No Transcript)
17
Title
Markerorder
Genotypes (allele1allele2)
Populations
18
(No Transcript)
19
(No Transcript)
20
Optimization

No need to type all 30 microsatellites
Product specific level of marker information
Geneclass 2 option selfidentification
Isolate breeds involved in the product (allowed
or not allowed)
Infer the level of successful selfidentification
per maker
Rank the markers in order of level of information

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Conclusions

Breed assignment of unknown samples to a (large)
reference set is quite successful
Optimizing markerorder for each question greatly
decreases the amount of typing necessary.
For a more detailed picture of relationships,
data can be analyzed in structure

25
Exercise