Ellis Lau, SuIn Lee, Daphne Koller - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Ellis Lau, SuIn Lee, Daphne Koller

Description:

Ellis Lau, Su-In Lee, Daphne Koller. Learning the structure of a Markov network ... Because of the nature of genetic recombination, genes in close proximity tend to ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 14
Provided by: DAGS
Category:
Tags: suin | daphne | ellis | koller | lau | lee

less

Transcript and Presenter's Notes

Title: Ellis Lau, SuIn Lee, Daphne Koller


1
Learning the structure of a Markov network for
modeling the correlation structure of genetic data
  • Ellis Lau, Su-In Lee, Daphne Koller

2
Outline
  • Background/Motivations
  • Learning the structure of a Markov network
  • Preliminary results
  • Ongoing/Future Work

3
Background/Motivations (I)
  • Human DNA has about 3 billion total basepairs,
    and variation is found in 10-13 million of these
    (SNPs).
  • Because of the nature of genetic recombination,
    genes in close proximity tend to be inherited
    together and therefore be correlated.
  • Additionally, there are situations, called
    linkage disequilibrium, where certain
    combinations of alleles occur more or less
    frequently than would be expected from random
    formation, given the frequencies of the alleles.
    These correlations arise because of ancestral
    recombination.

4
Background/Motivations (II)
  • Linkage disequilibrium is not limited to genes in
    close proximity, and long range correlations for
    distant genes has yet to be studied.
  • Identifying the correlations in genes will give
    insight into complex traits which involve
    multiple genes, will help uncover genes related
    to disease, and will increase the efficiency with
    which further data can be gathered and further
    studies can be conducted .

5
Learning the Model (I)
  • Basic idea of the learning algorithm
  • For N nodes, there are NC2 possible edges.
  • Learn the parameters of a fully connected graph
    with L1 regularization (Laplacian prior)
  • Feature induction Feature selection

rs915674
rs915674
rs917864
rs917864
rs9605977
rs9605977
rs11160
rs11160
rs1654
rs1654
6
Learning the Model (II)
  • Markov network with nodes representing the SNPs,
    edges defined by the correlation between a pair
    of SNPs.
  • SNPs only take on one of two possible nucleotide
    values, so we label them 0 and 1.
  • 6 parameter types 2 for nodes, 4 for edges

0
1
0
w00
w01
0
w0
1
w11
w10
1
w1
7
Learning the Model (III)
  • The model is built using gradient ascent to
    selectively activate nodes and edges in the graph
    at each iteration.
  • Gradient(w) Empirical Probability(w) Expected
    Probability(w)
  • The parameters are updated at each iteration
    using line search.
  • wk1 wk ? Gradient (wk)
  • ? chosen to be the smallest step that will give
    an acceptable increase to an objective function
    (the log likelihood of the model)

8
Preliminary Results
  • Ran tests on chromosome 22 for the CEU and YRI
    populations
  • 53,000 54,000 SNPs
  • Weights
  • Size of subgraphs
  • Number of neighbors
  • Position on chromosome

9
CEU
Number of graphs
Number of graphs
Size of Subgraphs
Size of Subgraphs
YRI
Number of graphs
Number of graphs
Size of Subgraphs
Size of Subgraphs
10
CEU
Weights
Weights
Chromosomal Distance (bp)
Chromosomal Distance (bp)
YRI
Weights
Weights
Chromosomal Distance (bp)
Chromosomal Distance (bp)
CEU
YRI
Number of Pairs
Number of Pairs
Chromosomal Distance (bp)
Chromosomal Distance (bp)
11
(No Transcript)
12
(No Transcript)
13
Ongoing/Future work
  • Sample statistics
  • Comparison among different populations
  • Prediction Tasks
  • Improving the model increasing the number of
    parameters
Write a Comment
User Comments (0)
About PowerShow.com