Title: Introduction to Genomic Sequencing Assembly Annotation Diego Martinez
1Introduction to Genomic SequencingAssemblyAnnot
ationDiego Martinez
2Why do we want to know the sequence of an entire
genome??
- To know all the genes then proteins, then
pathways - We can understand the biochemistry of the
organism - We can understand diseases
- Evolution
- Regulation
- know all the upstream/downstream regions for
proteins to bind and control transcription
3Convinced???? Good. Lets Sequence!!!
- 2 ways
- Map (Public)
- Several long steps tiling
- Expensive because they generated a complete map
- Whole Genome Shotgun (Private)
- Direct/cuts out some steps, missed 103 genes
- Repeats!
- Synthesis approach
4Generate Data
5Core principles
- BAC-end sequences allow one to know physical
association of sequence - More coverage leads to better sequence
- Better algorithms make it easier
- Longer sequence reads are critical
6Figure 2.5 Relationships of chromosomes to
genome sequencing markers
Sixteen overlapping clones represent 1,408 BACs
needed to span the 163 Mb X chromosome. (Avg
insert 146 kb)
7Assemblyput it all back together
- Assemble
- BAC assemblies, Phrap (Phil Green, UW)
- Celera, WGS Celera Assembler
- both find overlaps of same sequence,
build regions (contigs) - put contigs together using paired end
information Order and Orient into large
Scaffolds (also called super contigs.) - Whole Genome Shotgun automated,
without tiling - Finishing
-
8(No Transcript)
9Problems even Map-based couldnt fix
10Which method worked best?
- WGS failed with highly repetitive regions
- WGS, however, reduced overall workload for
sequencing - Use hybrid approach
- WGS used for 6-fold coverage
- Reduced number of BACs needed to sequence by 93
11Annotation
- Need to make it useable and fun!!!
- What is annotation?
- Find sequence features in the genome
- find genes (focus here)
- The act or process of furnishing critical
commentary or explanatory notes. - pseudogenes
- repeats
- reg. elements(very difficult, still in its
infancy) - attempt to describe gene function
12Figure 2.6 alternative splicing
NADPH oxidase
H channel
13Table 2.1 How annotation can be used to
infer/understand biological niche
14Example of annotation - What is a gene?
15Functional Annotation What does the protein
do?
- Found Genes
- Basic approach By similarity to known protein.
- Old Style Best Blast Hit
- Can lead to funny incorrect annotations
16Funny examples
17Critical residues/multiple sequence
alignment(lysozyme)
18Gene Family Expansion
19Signaling Pathways
20(No Transcript)
21(No Transcript)
22Phanerochaete chrysosporium
- Degrades lignin
- 30 million base pair (30 MB)
- This was the 1st basidiomycete so gene finding
was a - big challenge
- Estimate 11,777 genes
23Genome Facts!
24http//genome.jgi-psf.org/Phchr1/Phchr1.info.html
25Phylogenies of Genes
26Genome Evolution and RIP