Software for population genetics - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Software for population genetics

Description:

Software for population genetics Structure: J. K. Pritchard et al. Geneclass 2: S. Piry et al. Structure Identification of genetic clusters Identification of ... – PowerPoint PPT presentation

Number of Views:252
Avg rating:3.0/5.0
Slides: 28
Provided by: IesNi3
Category:

less

Transcript and Presenter's Notes

Title: Software for population genetics


1
Software for population genetics
Structure J. K. Pritchard et al. Geneclass 2
S. Piry et al.
2
Structure
  • Identification of genetic clusters
  • Identification of subclustering within breeds or
    relationships between breeds
  • Breed assignment of unkown samples to reference
    set

3
StructureIdentification of genetic clusters
  • Baysian likelihood method of identifying K
    clusters
  • K number of clusters/populations provided by
    user or inferred by Structure

4
StructureAncestry models
  • No admixture model Each individual originates
    from one of the K populations
  • Admixture models Each individual has genomic
    fractions of more than one of the K populations
  • Linkage model admixture model, but linked loci
    are more likely to originate from the same
    population.
  • Prior information model user pre-defines (some
    of) the clusters
  • NB the model is also determined by the type of
    data one has!!

5
StructureAnchestry models and input data
  • Dominant markers noadmixture model.
  • AA and Aa cannot be distinguished so only a
    present or absent genotype is available.
  • AFLP, RFLP etc
  • Sequence data, Y chrom or mtDNA haplotypes
    linkage model. Consider this as a single locus
    with many alleles.

6
Structureallele frequency models
  • Correlated allele frequencies frequencies in
    different populations are likely to be similar
    (due to migrations or shared ancestry).
  • Independent allele frequencies allele freqencies
    are independent draws from a distribution
    specified by a factor ?

7
StructureDetermining the K
  • How to estimate the number of populations /
    clusters in your dataset?
  • Fully resolving all the groups in your data (high
    K) testing all K values until highest likelihood
    values are reached.
  • Determining the rough relations (low K)
  • Trail and error

8
Structurerunning parameters
  • Likelihood method the program optimizes its own
    internal parameters.
  • Startup configuration can have a very low
    probability, so Structure needs a learning run
    the burnin (10.000-100.000 replicates)
  • Actual run enough replicates to obtain
    statistically sound results (depending on your
    dataset) 50.000 (?)

9
(No Transcript)
10
Geneclass 2breed assignment
  • Software for Genetic assignment and
    first-generation Migrant Detection
  • S. Piry, A. Alpetite, J.-M. Cornuet, D. Paetkau,
    L. Baudouin, A. Estoup
  • INRA, Fr.
  • Journal of Heredity 200495(6) 536-9

11
(No Transcript)
12
Geneclass 2breed assignment
  • Infers the probability of assignment of reference
    populations as origin of sampled individuals on
    the basis of multilocus genotypic data.
  • Haploid or diploid or mix.
  • Likelihood criteria
  • Genetic distances
  • Allele frequencies
  • Bayesian algorithm
  • Monte Carlo resampling

13
Two examples
  • Products of protected geographical origin (PGI)
  • Vitellone dellAppennino Centrale
  • Allowed breeds Chianina,Romagnola, Marchicana
  • Not allowed Piedmontese, Maremmana, Pezzata
    Rossa Italiana, Italian Brown, Italian Friesian,
    Charolais, Limousin, Belgian Blue
  • Veau du Limousin
  • Allowed breeds Limousin, Blonde
    d'Aquitaine,Limousin, Bazadaise
  • Not allowed Holstein, Friesian, Fries-Hollands,
    Belgian Blue, Main-Anjou, Normand,
    Bretonne-pied-noire, Charolais, Hereford,
    Aberdeen Angus, Gasconne, Aubrac, Salers,
    Montbélliard, Simmental, Piedmontese, Swiss
    Brown, Pirinaica

14
Objective?
  • Identify a representative sample from a batch
  • Traceability
  • Fraud?
  • Protection of the (cultural, economic) integrity
    of the product

15
How?
  • Typing with microsatellites.
  • Compare patterns / allele frequencies with
    reference set.
  • Reference library product of EU diversity
    project Resgen
  • 45 breeds (still adding)
  • 20 animals per breed
  • 30 microsatellite markers

16
(No Transcript)
17
Title
Markerorder
Genotypes (allele1allele2)
Populations
18
(No Transcript)
19
(No Transcript)
20
Optimization
  • No need to type all 30 microsatellites
  • Product specific level of marker information
  • Geneclass 2 option selfidentification
  • Isolate breeds involved in the product (allowed
    or not allowed)
  • Infer the level of successful selfidentification
    per maker
  • Rank the markers in order of level of information

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Conclusions
  • Breed assignment of unknown samples to a (large)
    reference set is quite successful
  • Optimizing markerorder for each question greatly
    decreases the amount of typing necessary.
  • For a more detailed picture of relationships,
    data can be analyzed in structure

25
Exercise
  • 37 unknown samples (file exercise.txt)
  • Use the reference set (file reference.txt) to
    assign breednames to the samples
  • Play with the loci to see the effect of different
    markers on the solution

26
(No Transcript)
27
Solution
Write a Comment
User Comments (0)
About PowerShow.com