Title: Genome Sequence Acquisition and Analysis
1Genome Sequence Acquisition and Analysis
2Outline
- Defining Genomes
- What Have We Learned from the Human Genome Draft
Sequences?
3Goals for this Chapter 1
- 1.1 Defining Genomes
- Define the filed of genomics
- Learn how genomes are sequenced
- Understand the utility of short DNA segments
- Utilize online tools to analyze genome sequences
4What Is Genomics?
- Genome
- The total DNA content of a haploid cell or half
the DNA content of a diploid cell. - Genomics
- Genomics involves large data sets (about 3
billion base pairs for the human genome) and
high-throughput methods (fast methods for
collecting the data) - Sequencing DNA and collecting genome variations
with a population as well as transcription
control of genes. - From DNA sequence analysis to an organisms
response to environmental perturbations.
5Proteomics
- Proteome
- The complete protein content of a cell/organism
at a given moment. - Proteomics (-omic)
- Other terms
- Transcriptome
- Metabolome
6How Are Whole Genomes Sequenced?
- Sanger method (dideoxy method)
- Dr. Fred Sanger
- Polymerase chain reaction (PCR)
7Figure 1.1
8Figure 1.2
9What Is An E-value?
- A BLASTn search returns hits, sequences that
produce significant alignments to the query
sequence. - The significance of a hit is measured by its
E-value, or expect value. - Biological significance hits will tend to have
E-values much less than 1.0. - The larger the E-value, the greater the chance
that the similarity between the hit and the query
is due to more coincidence.
10(No Transcript)
11(No Transcript)
12Why Do the Databases Contain So Many Partial
Sequences?
- Human Genome Project (HGP)
- By comparing different genomes, we should be able
to better understand genomes in general, and the
human genome in particular. - Sequence-tagged Sites (STSs)
- The short segments of unique DNA sequence along
every chromosome. - Bacterial Artificial Chromosome (BACs)
- 150 Kb
- Yeast Artificial Chromosome (YACs)
- 150 Kb 1.5 Mb
13Figure 1.3
14Expressed Sequence Tags (ESTs)
- The short segments of cDNA
- Used to identify genes
- ESTs hints at the size of the genome and
alternative ways to splice mRNA. - ESTs is helpful for labs interested in cloning
particular genes. - Related database
- dbEST,
- UniGene,
- HomoloGene.
15Institutes
- National Institutes of Health (NIH)
- The Institute for Genomics Research (TIGR)
16How Do We Make Sense of All These Bases?
- Online Mendelian Inheritance in Man (OMIM)
- A comprehensive list of everything known about
human biology and diseases. - GeneCard Database of annotated genes in human
genome. - ORFs and Translation
- Open reading frames (ORFs, pronounced orphan)
- Recorded as accession number.
- Coding Sequence (CDS)
17Can We Predict Protein Functions?
- Kyte-Doolittle plot (hydropathy plot)
- To predict whether a protein is an internal
membrane protein or not. - The 3D shape of a protein is probably its most
important characteristics. - Conserved Domain (CD)
183D Structures
- Protein Data Banks (PDB)
- Entrez Structure
19Structure-Function Relationships
- Gene Ontology (GO)
- Biological process is the why the overall
objective toward which this protein contributes. - Molecular function is the what the biochemical
activity the protein accomplishes. - Cellular component is the where the location of
protein activity. - Gene Ontologys unification of protein roles will
help us communicate more effectively as we
determine how genomes produce multifunctional
cells.
20How Well Are Genes Conserved in Diverse Species?
- Clusters of Orthologous Groups (COGs)
- Enzyme Commission (EC) numbers
- Swiss-Prot
- Phylogenetic tress
- Paralogs
- The genes arose from a common ancestral gene
within one species - Orthologs
- The same gene in two organisms evolved from a
common ancestral gene in another species. - Synteny
- Genetic loci located on the same chromosome
within a species, even if they ware separated by
a great distance. - Homology
- Two sequences were described as homologous if
their sequences were similar because of a common
evolutionary origin.
21How Do You Know Which Bases Form a Gene?
- Intergenic sequence
- Eukaryotic genes may contain introns as well as
the coding exons. - The ORFs are called pseudogenes since mutation
has rendered them nonfunctional.
22Gene Expression Process
(???)
23The Gene Structure
Genes
Coding Regions
Upstreams
Downstreams
DNA-binding
Transcription Factors
5-AGCAATAGG-3 3-TCGTTATCC-5
Binding Sites
24Transcription Factors (TFs)
- DNA-binding proteins
- Recognize specific sites (sequences) gt binding
sites. - Transcription factors activate transcription
initiation by RNA Polymerase II or III. - Regulating the gene transcription process.
Binding Site
A Transcription Factor
25Some TF Binding Sites
26Intron and Exon
27RNA Splicing (Pre-mRNA -gt Mature mRNA)
Gene
5
3
DNA
Transcription
Exon
Intron
Transcription
pre-mRNA
Splicing
RNA
mRNA
Translation
Translation
Protein
protein
28Splice Sites
Exon
Exon
5
3
Intron
?
?
?
Pre-mRNA
Cut
Cut
PyPyPyPyPyNCAG?
AG?GTAGGT
Donor site (5 splice site)
Acceptor site (3 splice site)
29How Many Proteins Can One Gene Make?
30Goals for this Chapter 2
- 1.2 What Have We Learned from the Human Genome
Draft Sequences? - Survey human genome
- Verify genome annotations with online tools
- Recognize alternative forms of genes
- Explore epigenetic regulation of genome function
31Overview of Human Genome First Draft
- Published on 15 Feb. 2001.
- Humans have approximately 35,000 genes.
- Draft sequence means the DNA was sequenced on
average four times, with finished sequence having
eightfold coverage and errors estimated to be one
in 10,000 bp.
32Figure 1.5
CpG dinucleotides from CpG island and the
cytosine base is often methylated.
33UTR
34Figure 1.6
353-Dimensional
Gene Regulation Mechanism
TF 1
TF 2
Cooperation (Protein-Protein Interaction)
Correlation of Site Occurrences
Binding Site 2
Binding Site 1
Transcription start direction (??????)
The graphic is from Dr. Thomas Werners tutorial
in ISMB2000.
36Figure 1.7
- Humans have more cells with specialized functions
than yeast, or worms. - Human needs to regulate gene expression very
carefully. - Proteome complexity is regulated by gene
expression.
37Figure 1.8a
38Figure 1.8b
39When Are the Data Sufficient?
- A Gene Is a Gene Is a Gene Short of
- Every Gene Has a Promoter
- Other Than rRNA and tRNA, All Genes Produce
Proteins
40Can the Genome Alter Gene Expression Without
Changing the DNA Sequence?
- Imprinting, a process mammals use to mark a small
set of genes during gametogenesis so that only
the paternal (??) or maternal (??) copy will be
transcribed and the other allele at the same
locus will remain silent. - As of 2002, there are more than 20 mammalian
genes are known to be imprinted.
41What Is the Fifth Base in DNA?
- Methyl-Cytosine
- Dnmt1p is an enzyme that adds a methyl (CH3)
group onto hundreds of thousands but not all
cytosines throughout the genome. - Methylome is a significant component of gene
regulation and thus the proteome. - Direct methylation may block the transcription of
some genes.
42Figure 1.11