Introduction to Genomic Sequencing Assembly Annotation Diego Martinez - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

Description:

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez Why do we want to know the sequence of an entire genome?? To know all the genes then ... – PowerPoint PPT presentation

Number of Views:322
Avg rating:3.0/5.0
Slides: 27
Provided by: Mag8175
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Genomic Sequencing Assembly Annotation Diego Martinez


1
Introduction to Genomic SequencingAssemblyAnnot
ationDiego Martinez
                                                                        
2
Why do we want to know the sequence of an entire
genome??
  • To know all the genes then proteins, then
    pathways
  • We can understand the biochemistry of the
    organism
  • We can understand diseases
  • Evolution
  • Regulation
  • know all the upstream/downstream regions for
    proteins to bind and control transcription

3
Convinced???? Good. Lets Sequence!!!
  • 2 ways
  • Map (Public)
  • Several long steps tiling
  • Expensive because they generated a complete map
  • Whole Genome Shotgun (Private)
  • Direct/cuts out some steps, missed 103 genes
  • Repeats!
  • Synthesis approach

4
Generate Data
5
Core principles
  • BAC-end sequences allow one to know physical
    association of sequence
  • More coverage leads to better sequence
  • Better algorithms make it easier
  • Longer sequence reads are critical

6
Figure 2.5 Relationships of chromosomes to
genome sequencing markers
Sixteen overlapping clones represent 1,408 BACs
needed to span the 163 Mb X chromosome. (Avg
insert 146 kb)
7
Assemblyput it all back together
  • Assemble
  • BAC assemblies, Phrap (Phil Green, UW)
  • Celera, WGS Celera Assembler
  • both find overlaps of same sequence,
    build regions (contigs)
  • put contigs together using paired end
    information Order and Orient into large
    Scaffolds (also called super contigs.)
  • Whole Genome Shotgun automated,
    without tiling
  • Finishing

8
(No Transcript)
9
Problems even Map-based couldnt fix
10
Which method worked best?
  • WGS failed with highly repetitive regions
  • WGS, however, reduced overall workload for
    sequencing
  • Use hybrid approach
  • WGS used for 6-fold coverage
  • Reduced number of BACs needed to sequence by 93

11
Annotation
  • Need to make it useable and fun!!!
  • What is annotation?
  • Find sequence features in the genome
  • find genes (focus here)
  • The act or process of furnishing critical
    commentary or explanatory notes.
  • pseudogenes
  • repeats
  • reg. elements(very difficult, still in its
    infancy)
  • attempt to describe gene function

12
Figure 2.6 alternative splicing
NADPH oxidase
H channel
13
Table 2.1 How annotation can be used to
infer/understand biological niche
14
Example of annotation - What is a gene?
15
Functional Annotation What does the protein
do?
  • Found Genes
  • Basic approach By similarity to known protein.
  • Old Style Best Blast Hit
  • Can lead to funny incorrect annotations

16
Funny examples
17
Critical residues/multiple sequence
alignment(lysozyme)
18
Gene Family Expansion
19
Signaling Pathways
20
(No Transcript)
21
(No Transcript)
22
Phanerochaete chrysosporium
  • Degrades lignin
  • 30 million base pair (30 MB)
  • This was the 1st basidiomycete so gene finding
    was a
  • big challenge
  • Estimate 11,777 genes

23
Genome Facts!
24
http//genome.jgi-psf.org/Phchr1/Phchr1.info.html
25
Phylogenies of Genes
26
Genome Evolution and RIP
Write a Comment
User Comments (0)
About PowerShow.com