Genome analysis. - PowerPoint PPT Presentation

About This Presentation

Title:

Genome analysis.

Description:

Role of 'junk' DNA in a cell. 97. 13.338. 137Mb. D.melanogaster. 934. 410. 224 ... Non-coding regions can be very well conserved between the species and many ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 33

Provided by: Pan92

Learn more at: https://www2.seas.gwu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Genome analysis.

1
Genome analysis.

Genome the sum of genes and intergenic
sequences of a haploid cell.

2
The value of genome sequences lies in their
annotation

Annotation Characterizing genomic features
using computational and experimental methods
Genes Four levels of annotation
Gene Prediction Where are genes?
What do they look like?
What do they encode?
What proteins/pathways involved in?

3
Koonin Galperin
4
Accuracy of genome annotation.

In most genomes functional predictions has been
made for majority of genes 54-79.
The source of errors in annotation
- overprediction (those hits which are
statistically significant in the database search
are not checked)
- multidomain protein (found the
similarity to only one domain, although the
annotation is extended to the whole protein).
The error of the genome annotation can be as big
as 25.

5
Sample genomes
6
So much DNA so few genes
7
Human Genome project.
8
Comparative genomics - comparison of gene number,
gene content and gene location in genomes..
Campbell Heyer Genomics
9
Analysis of gene order (synteny).

Genes with a related function are frequently
clustered on the chromosome.
Ex E.coli genes responsible for synthesis of Trp
are clustered and order is conserved between
different bacterial species.
Operon set of genes transcribed simultaneously
with the same direction of transcription

10
Analysis of gene order (synteny).
Koonin Galperin Sequence, Evolution, Function
11
Analysis of gene order (synteny).

The order of genes is not very well conserved if
identity between prokaryotic genomes is less
than 50
The gene neighborhood can be conserved so that
all neighboring genes belong to the same
functional class.
Functional prediction can be based on gene
neighboring.

12
Role of junk DNA in a cell.

There is almost no correlation between the number
of genes and organisms complexity.
There is a correlation between the amount of
nonprotein-coding DNA and complexity.

13
New interpretation of introns.

Modern introns envaded eukaryotes late in
evolution, they are derived from self-splicing
mobile genetic elements similar to group II
introns.
Nucleus which separates transcription and
translation, appears only in eukaryotes. For
prokaryotes there would not be time for introns
to splice themselves out.
Hypothesis important regulatory role of introns.

14
Regulatory role of non-coding regions.

Micro-RNAs control timing of processes in
development and apoptosis.
Introns RNAs inform about the transcription of a
particular gene.
Alternative splicing can be regulated by
non-coding regions.
Non-coding regions can be very well conserved
between the species and many genetic deseases
have been linked to variations/mutations in
non-coding regions.

15
COGs Clusters of Orthologous Genes.

Orthologs genes in different species that
evolved from a common ancestral gene by
speciation
Paralogs paralogs are genes related by
duplication within a genome.

16
Classwork I Comparing microbial genomes.

Go to http//www.ncbi.nlm.nih.gov/genomes/lproks.c
gi
Select Thermus thermophilus genome
View TaxTable
What gene clusters do you see which are common
with Archaea?

17
Systems biology.

Integrative approach to study the relationships
and interactions between various parts of a
complex system.
Goal to develop a model of interacting
components for the whole system.

18
Basic notions of networks.

Network (graph) a set of vertices connected via
edges.
The degree of a vertex the total number of
connections of a vertex.
Random networks networks with a disordered
arrangement of edges.

19
Properties of networks.

Vertex degree distribution/connectivity.
Clustering coefficient.
Network diameter.

20
Characteristics of networks vertex degree
distribution.
K2
K2
K3
K1
P(k,N) degree distribution, k - degree of the
vertex, N - number of vertices. If vertices are
statistically independent and connections are
random, the degree distribution completely
determines the statistical properties of a
network.
21
Characteristics of networks vertex degree
distribution.
22
Characteristics of networks clustering
coefficient.

The clustering coefficient characterizes the
density of connections in the environment close
to a given vertex.

d total number of edges connecting nearest
neighbors n number of nearest verteces for a
given vertex
C 2/6
23
Characteristics of networks diameter,
small-world.

Diameter of a network shortest path along the
existing links averaged over all pairs of
verteces. Distance between two verteces the
smallest number of steps one can take to reach on
vertex from another.
Small-world character of the networks any two
verteces can be connected by relatively short
paths.
For random networks the diameter increases
logarithmically with the addition of new verteces.

24
Different network modelsErdos-Renyi model.

Start with the fixed set of vertices.
Iterate the following process
Chose randomly two vertices and connect them
by an edge.
Stop at certain number of edges.

ln(P(k))
Degree distribution Poisson distribution, ?
average degree
ln( k )
25
Different network models model 2.

At each step, a new vertex is added to the graph
Simultaneously, a pair of randomly chosen
vertices is connected by an edge.
This is a non-equilibrium model the total
number of vertices is not fixed.

ln(P(k))
Degree distribution exponential distribution.
ln(k)
26
Different network models Barabasi-Alberts.

Model of preferential attachment.
At each step, a new vertex is added to the graph
The new vertex is attached to one of old vertices
with probability proportional to the degree of
that old vertex.

ln(P(k))
Degree distribution power law distribution.
ln(k)
27
Power Law distribution
Multiplying k by a constant, does not change the
shape of the distribution scale free
distribution.
From T. Przytycka
28
Difference between scale-free and random networks.
Random networks are homogeneous, most nodes have
the same number of links. Scale-free networks
have a few highly connected verteces.
29
Example 1 the large-scale organization of
metabolic networks.
Glycolysis metabolic network
enzymes
subsbstrate
Slide credit Hagai Ginsburg
30
Example 1 the large-scale organization of
metabolic networks.

Jeong et al, Nature, 2000
Compared metabolic networks of 43 organisms.
Verteces substrates connected with each other
through links/metabolic reactions.

Results - Scale-free nature of metabolic
networks for all organisms, ? 2.2 - Diameters
of metabolic networks for all organisms are the
same.
31
Biological interpretations of power-law
connectivity.

Few verteces dominate the overall connectivity of
network.
Self-similarity of networks.
Small diameter, respond quickly to a mutation
which can destroy an enzyme, activate different
paths quickly.

32
Protein-protein interaction networks.