The Human Genome - PowerPoint PPT Presentation

About This Presentation
Title:

The Human Genome

Description:

... bmc.uu.se/Courses/Bke2/Exercises/Exercise_answers/pre_mRNA_processing.gif ... Left arm. Right arm. Structure forms by pairing of complementary bases. MicroRNA ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 63
Provided by: alfre88
Category:
Tags: genome | human

less

Transcript and Presenter's Notes

Title: The Human Genome


1
The Human Genome Whats in it? How do we know?
Gary Benson Department of Computer
Science Department of Biology Program in
Bioinformatics Boston University
2
Outline of Talk
  • Protein Genes
  • SNPs
  • Haplotypes
  • Finding a Disease Locus

3
Size of the Genomes
bacteria
yeast
round worm
fruit fly
flowering plant
4
The Human Genome
5
What the letters stand for
  • DNA has four chemical subunits, called nucleotide
    bases abbreviated A, C, G, T.
  • GATTACA

http//en.wikipedia.org/wiki/Nucleotide
6
Whats in the Genome?
  • Chromosomes 23 pairs
  • Genes
  • Protein genes
  • RNA genes
  • MicroRNA genes
  • Repeats
  • Tandem repeats
  • Inverted repeats
  • Transposons
  • Segmental duplications
  • Regulatory regions
  • Promoters
  • Transcription factor binding sites

7
Protein Genes
  • A protein gene contains the genetic code for a
    protein. The production of protein involves
    transcription (copying DNA to RNA) and
    translation (using RNA code to produce a
    protein).

http//www.slic2.wsu.edu82/hurlbert/micro101/imag
es/TransTranscrip.gif
8
Transcription
Translation
http//nobelprize.org/medicine/educational/dna/a/t
ranslation/polysome_em.html
http//users.rcn.com/jkimball.ma.ultranet/BiologyP
ages/M/Miller_Beatty3.jpg
9
Finding Protein Genes
  • Before the sequencing of genomes, protein genes
    were found experimentally. Now, new genes are
    predicted computationally using a gene model.

10
Finding Protein Genes
  • Before the sequencing of genomes, protein genes
    were found experimentally. Now, new genes are
    predicted computationally using a gene model.

11
Finding Protein Genes
  • Before the sequencing of genomes, protein genes
    were found experimentally. Now, new genes are
    predicted computationally using a gene model.

12
Finding Protein Genes
  • Before the sequencing of genomes, protein genes
    were found experimentally. Now, new genes are
    predicted computationally using a gene model.

13
Finding Protein Genes
  • Before the sequencing of genomes, protein genes
    were found experimentally. Now, new genes are
    predicted computationally using a gene model.

14
Building a Gene Model
  • Gene models for prediction are based on the
    structure of genes in DNA and their messenger
    RNAs (mRNAs). This includes exons, introns,
    promoters, and the polyadenylation signal.

http//xray.bmc.uu.se/Courses/Bke2/Exercises/Exerc
ise_answers/pre_mRNA_processing.gif
15
Exons
  • In this example, EXONS are uppercase and introns
    are lowercase. Exons contain the code for a
    protein, introns interrupt the exons. Before
    translation, introns are removed from the
    messenger RNA.
  • DNA
  • ACTGCTACAGtctattgaGAACAACATAGtcacgaacttaacgtgcaGT
    TTAACAGCACGtctcgaagggca
  • RNA (before removal of introns)
  • ACUGCUACAGucuauugaGAACAACAUAGucacgaacuuaacgugcaGU
    UUAACAGCACGucucgaagggca
  • RNA (after removal of introns)
  • ACUGCUACAGGAACAACAUAGGUUUAACAGCACG

16
Finding Exons
  • The sequence of an exon contains codons. Each
    codon is a triplet of nucleotides which codes for
    a single amino acid. Amino acids are the building
    blocks of a protein.

http//en.wikipedia.org/wiki/Genetic_code
17
Genetic Code
  • . Each codon specifies one of twenty amino
    acids. Three codons are stop codons, which
    specify the end of translation.

http//www.emc.maricopa.edu/faculty/farabee/BIOBK/
code.gif
18
Open Reading Frame (ORF)
  • An open reading frame (ORF), is a sequence of
    codons that does not contain a stop codon.

alanine threonine glutamic acid leucine arginine
serine STOP!
http//en.wikipedia.org/wiki/Genetic_code
19
Finding Exons
  • Sequence
  • acggacucuagccuaaugugacgacugacauagguaaauucgcuc
  • Even though this sequence contains stop codons,
    they are not present in all reading frames.
  • frame 1
  • acg gac ucu agc cua aug uga cga cug aca uag gua
    aau ucg cuc
  • frame 2
  • a cgg acu cua gcc uaa ugu gac gac uga cau agg uaa
    auu cgc uc
  • frame 3
  • ac gga cuc uag ccu aau gug acg acu gac aua ggu
    aaa uuc gcu c
  • Very short ORFs are unlikely.

20
Finding Introns
  • Introns usually start at a G T boundary and end
    at an A G boundary.

21
Finding Exons
  • Sequence
  • acggacucuagccuaaugugacgacugacauagguaaauucgcuc
  • A gene can contain open reading frames connected
    across stop codons by an intron
  • frame 1
  • acg gac ucu agc cua aug uga cga cug aca uag gua
    aau ucg cuc
  • frame 3
  • ac gga cuc uag ccu aau gug acg acu gac aua ggu
    aaa uuc gcu c

22
How many genes are there?
  • Estimates
  • pre 2000 100,000 based on estimates of required
    number of genes to account for human
    complexity
  • 2001 30,000 40,000 based on first draft of
    human genome
  • 2003 23,000 24,500 based on gene prediction
    computer programs
  • Why so low?
  • alternate splicing of exons
  • complex regulatory mechanisms
  • inability to predict genes which are unlike
    those seen before

http//www.ornl.gov/sci/techresources/Human_Genome
/faq/genenumber.shtml
23
RNA Genes
  • RNA genes do not code for proteins. Instead, the
    RNA molecule itself is functional in the cell.
  • Examples include
  • Ribosomal RNA these molecules form the major
    component of the protein building machinery
  • Transfer RNA work with ribosomal RNA to insert
    correct amino acids into growing proteins
  • MicroRNA a newly discovered class of RNA which
    helps regulate gene expression.

24
Ribosome
http//www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/R
NA/images/fig_rna12.jpg
25
Transcription
Translation
http//nobelprize.org/medicine/educational/dna/a/t
ranslation/polysome_em.html
http//users.rcn.com/jkimball.ma.ultranet/BiologyP
ages/M/Miller_Beatty3.jpg
26
RNA Genes
  • MicroRNAs are short and show little or no
    conservation of sequence.
  • Unlike protein genes, RNA genes do not contain
    codons or open reading frames. But, they do
    contain inverted repeats.

27
Inverted Repeats (IRs)
  • RNA
  • G A C U U G A
    U C A A G U C

reversed
complemented
Two patterns, one the reverse complement of the
other
28
IR Nomenclature
RNA G A C U U G A
U C A A G U C
Right arm
Left arm
Spacer
29
Stem-Loop Structure
Structure forms by pairing of complementary bases
30
MicroRNA
  • MicroRNAs come from a precursor that contains a
    stem-loop.

http//www.ma.uni-heidelberg.de/apps/zmf/argonaute
/interface/mirna.jpeg
31
Detection of Approximate Inverted Repeats
  • Human Chr. 3 173,291,101
  • AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA
    AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA
    CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA
    TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT
    TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG
    ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG
    AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC
    AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG
    GCATTTCCCC CTACGT

32
Detection of Approximate Inverted Repeats
  • Human Chr. 3 173,291,101
  • AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA
    AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA
    CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA
    TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT
    TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG
    ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG
    AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC
    AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG
    GCATTTCCCC CTACGT

Arms are 72 nt long, spacer is 42bp long
33
The Problem Find the Inverted Repeat
  • Human Chr. 3 173,291,101
  • AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA
    AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA
    CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA
    TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT
    TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG
    ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG
    AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC
    AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG
    GCATTTCCCC CTACGT

34
Single Nucleotide Polymorphisms (SNPs)
  • A SNP is a single position in the genome (a
    locus) that is not the same in all people. Some
    people have one type of nucleotide and other
    people have a different nucleotide. Differences
    in the population at a single locus are called
    polymorphisms and the individual types are called
    alleles.
  • SNPs are found experimentally

a c g t t a t t
a c a t t c c t
SNPs
35
Haplotypes
  • A haplotype is a collection of SNP alleles on a
    single chromosome in an individual.
  • Shown are SNPS on two chromosomes in each
    individual.

a c g t t c a t
a c a t t c a t
t c g t t c a t
a c a g a t a t
a c a t t c c t
a t a g t c c a
a c a g t c c a
a c a t t c c t
t c a t t c a t
a c a t t c a a
36
Haplotypes
  • A haplotype is a collection of SNP alleles on a
    single chromosome in an individual.
  • Homozygous (same alleles)

a c g t t c a t
a c a t t c a t
t c g t t c a t
a c a g a t a t
a c a t t c c t
a t a g t c c a
a c a g t c c a
a c a t t c c t
t c a t t c a t
a c a t t c a a
37
Haplotypes
  • A haplotype is a collection of SNP alleles on a
    single chromosome in an individual.
  • Heterozygous (different alleles)

a c g t t c a t
a c a t t c a t
t c g t t c a t
a c a g a t a t
a c a t t c c t
a t a g t c c a
a c a g t c c a
a c a t t c c t
t c a t t c a t
a c a t t c a a
38
Haplotypes
  • A haplotype is a collection of SNP alleles on a
    single chromosome in an individual.
  • Rare alleles

a c g t t c a t
a c a t t c a t
t c g t t c a t
a c a g a t a t
a c a t t c c t
a t a g t c c a
a c a g a c c a
a c a t t c c t
t c a t t c a t
a c a t t c a a
39
Haplotypes
  • A haplotype is a collection of SNP alleles on a
    single chromosome in an individual.
  • Strong linkage (usually occur together)

a c g t t c a t
a c a t t c a t
t c g t t c a t
a c a g a t a t
a c a t t c c t
a t a g t c c a
a c a g t c c a
a c a t t c c t
t c a t t c a t
a c a t t c a a
40
Linkage Analysis
  • SNPs and haplotypes are used to identify regions
    of the genome that cause disease. The technique
    is called linkage analysis and evidence of a
    connection is called linkage disequilibrium (LD).

recombination and inheritance
a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad
a c a g t c c a
a c a g a c a t
child
41
Linkage Analysis
  • SNPs and haplotypes are used to identify regions
    of the genome that cause disease. The technique
    is called linkage analysis and evidence of a
    connection is called linkage disequilibrium (LD).

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad
a c a g t c c a
a c a g a c a t
recombination in the mothers chromosomes
child
42
Linkage Analysis
  • SNPs and haplotypes are used to identify regions
    of the genome that cause disease. The technique
    is called linkage analysis and evidence of a
    connection is called linkage disequilibrium (LD).

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad
a c a g t c c a
a c a g a c a t
recombination in the fathers chromosomes
child
43
Linkage Analysis
  • SNPs and haplotypes are used to identify regions
    of the genome that cause disease. The technique
    is called linkage analysis and evidence of a
    connection is called linkage disequilibrium (LD).

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad
a c a g t c c a
a c a g a c a t
two to three crossovers per chromosome per
generation
child
44
Linkage Analysis
  • Key point Alleles that are physically close
    together tend to be inherited together because
    the chance of a crossover between them is small.
    They exhibit strong linkage.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad
a c a g t c c a
a c a g a c a t
child
45
Finding an Unknown Disease Locus
  • The location on the genome of many diseases is
    unknown. SNPs and haplotypes are being used to
    search for disease loci using linkage analysis.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad has disease
a c a g t c c a
a c a g a c a t
child has disease
46
Linkage Analysis Dominant Model
  • Assume the disease is caused by a dominant
    allele, meaning one copy is enough to cause the
    disease.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad has disease
a c a g t c c a
a c a g a c a t
SNP alleles in father that are not in mother
child has disease
47
Linkage Analysis Dominant Model
  • Assume the disease is caused by a dominant
    allele, meaning one copy is enough to cause the
    disease.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad has disease
a c a g t c c a
a c a g a c a t
SNP allele in child, inherited from father with
disease
child has disease
48
Linkage Analysis Dominant Model
  • Assume the disease is caused by a dominant
    allele, meaning one copy is enough to cause the
    disease.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad has disease
a c a g t c c a
a c a g a c a t
SNP allele and disease are linked indicating
possible disease locus.
child has disease
49
Linkage Analysis Recessive Model
  • Assume the disease is caused by a recessive
    allele, meaning two copies are required to cause
    the disease.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad has disease
a c a g t c c a
a c a g a c a t
homozygous SNP alleles in father that are
heterozygous in mother
child has disease
50
Linkage Analysis Recessive Model
  • Assume the disease is caused by a recessive
    allele, meaning two copies are required to cause
    the disease.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad has disease
a c a g t c c a
a c a g a c a t
homozygous SNP allele in child, identical to
fathers
child has disease
51
Linkage Analysis Recessive Model
  • Assume the disease is caused by a recessive
    allele, meaning two copies are required to cause
    the disease.

a c a t t c a t
a t a g t c c a
a c a g a t a t
t c a t t c a t
mom
dad has disease
a c a g t c c a
a c a g a c a t
SNP allele and disease are linked indicating
possible disease locus.
child has disease
52
(No Transcript)
53
BMI weight/height2 in kg/m2, BMI gt 25
overweight, BMI gt 30 obese
54
Other Differences Microdeletions
  • A microdeletion is the loss of a small piece of
    DNA, perhaps as small as 1000 bases. These
    pieces can contain genes, parts of genes or
    regulatory regions.

a t g t t t
a c a c t c c t
a c a t t c c t
g c g c a t
microdeletions
55
Other Differences Microdeletions
  • A microdeletion is the loss of a small piece of
    DNA, perhaps as small as 1000 bases. These
    pieces can contain genes, parts of genes or
    regulatory regions.

heterozygous
a t g t t t
a c a c t c c t
a c a t t c c t
g c g c a t
56
Other Differences Microdeletions
  • A microdeletion is the loss of a small piece of
    DNA, perhaps as small as 1000 bases. These
    pieces can contain genes, parts of genes or
    regulatory regions.

homozygous
a t g t t t
a c a c t c c t
a c a t t c c t
g c g c a t
57
Other Differences Microdeletions
  • A microdeletion is the loss of a small piece of
    DNA, perhaps as small as 1000 bases. These
    pieces can contain genes, parts of genes or
    regulatory regions.

miscalled homozygous
a t g t t t
a c a c t c c t
a c a t t c c t
g c g c a t
58
Apparent Inheritance Inconsistency
  • SNPs and haplotypes are used to identify regions
    of the genome that cause disease. The technique
    is called linkage analysis and evidence of a
    connection is called linkage disequilibrium (LD).

a t a a g a a c
c c c a c
a c a t c c a c
c c c t c c a c
mom
dad
c c c a c
a c a t c c a c
child
59
Apparent Inheritance Inconsistency
  • SNPs and haplotypes are used to identify regions
    of the genome that cause disease. The technique
    is called linkage analysis and evidence of a
    connection is called linkage disequilibrium (LD).

a t a a g a a c
c c c a c
a c a t c c a c
c c c t c c a c
mom
dad
c c c a c
a c a t c c a c
a a t t ? a t by Mendelian inheritance
child
60
Apparent Inheritance Inconsistency
  • SNPs and haplotypes are used to identify regions
    of the genome that cause disease. The technique
    is called linkage analysis and evidence of a
    connection is called linkage disequilibrium (LD).

a t a a g a a c
c c c a c
a c a t c c a c
c c c t c c a c
mom
dad
c c c a c
a c a t c c a c
cluster of inconsistencies suggests a
microdeletion.
child
61
Microdeletions
  • Hundreds of microdeletion haplotypes have been
    discovered recently. They may be a major
    contributor to human differences and disease.

62
Resources
  • UCSC Human Genome Browser
  • http//genome.ucsc.edu/cgi-bin/hgGateway
  • National Center for Biotechnology Information
    (NCBI)
  • http//www.ncbi.nlm.nih.gov/
  • PubMed
  • http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbP
    ubMed
Write a Comment
User Comments (0)
About PowerShow.com