Title: Genes
1Genes
- Introduction to Bioinformatics
- BM131/BM511
- Gary J. Schoenhals
- BMB
- University of Southern Denmark
2Definition of a gene
a segment of DNA found on a chromosome that codes
for a particular protein a unit of heredity
genes were formerly called factors
Source HyperDictionary (http//www.hyperdictionar
y.com)
3Structure of DNA
4Sugar backbone - Deoxyribose
5Purines
Base pairing
Base pairing
6Pyrimidines
Base pairing
Base pairing
in RNA
in DNA
7Base-pairing overview
8Base-pairing of nucleotides
3 hydrogen bond pairs
2 hydrogen bond pairs
Oxygen (red) Nitrogen (blue) Carbon (gray)
9DNA vs. RNAstructure
10Sugar backbone - Ribose
DNA
RNA
11Uracil (RNA) vs. Thymine (DNA)
in RNA Uracil
in DNA Thymine
12Second position
U
C
A
G
U
STOP
STOP
STOP
C
Third position
First position
A
G
13Genetic code - observations
- Redundancy most amino acids are encoded by more
than one codon - Mutations in the third position are often
silent (result in the same amino acid) - Mutations in the first or second position, though
not usually silent, often result in substitution
of an amino acid with similar characteristics - The genetic code is robust and resistant to
lethal changes
14Mutations
15tRNA
Amino acid binding
mRNA/ribosome interaction
16Prokaryotic
Eukaryotic
17Gene expression overview
18Prokaryote vs. Eukaryote
Note Localization of specific molecules can be
extremely important, especially in predictive
bioinformatics (e.g. GO database)
19Eukaryotes
20Eukaryotic Gene Structure
21Gene Regulation in Eukaryotes
- Altering rate of transcription
- Altering RNA processing while still in nucleus
- Altering stability of mRNA molecules (degradation
rate) RNA interference - Altering translation of mRNA by ribosomes
- Riboswitches some metabolites bind mRNA,
affecting translation of the transcript, or cause
enhanced transcription termination
22RNA Processing
- Transcription forms pre-mRNA
- Capping with modified guanine at 5 end
- protects against degradation
- Intron removal
- RNA editing in some organisms
- Synthesis of poly-A tail (stretch of adenine
residues) - Export to cytoplasm
23Introns/ExonsOverview
24RNA Processing
25Alternative splicing of RNA
Different proteins from the same gene!
26Characteristics
- Chicken collagen 52 exons
- Human dystrophin 79 exons
- Average exon is 140 nucleotides long, but introns
can be quite large (e.g. 480 kbp!) - Splicing done with spliceosome
- snRNA (small nuclear RNA) molecules plus approx.
145 proteins - approx. 12 different snRNA
- Disorders retinitis pigmentosa, spinal muscular
atrophy - http//www.neuro.wustl.edu/neuromuscular/pathol/sp
liceosome.htm
27Spliceosome
U1 binds to 5' splice site
U2 binds to branchpointrecognition sequence
U4-U5-U6 complexbinds to 5' splice site
U5 binds exon sequencesIntron is removed
U1 is displacedU6 binds to U25'-splice site
near branchpoint
http//www.neuro.wustl.edu/neuromuscular/pathol/sp
liceosome.htm
28Spliceosome contd.
- Exonic splicing enhancers (ESEs) and exonic
splicing silencers (ESSs) are contained in exons
- these help regulate how splicing is done (e.g.
specificity) - Intronic splicing enhancers (ISEs) and intronic
splicing silencers (ISSs) are contained in
introns and function in a similar fashion to
their exonic counterparts
29Enhancers
30Transcription factors modulate gene expression
Silencers are control regions of DNA which may
be far away from the gene, but when transcription
factors bind to them gene expression is repressed.
31Insulators
T-cell receptor for antigen gamma/delta encoding
region
Stretch of DNA that separates genes from one
another shielding them from the effects of
activation or repression of neighboring genes
32Prokaryotes
33Prokaryotic Genes are Arranged in Operons
- Genes arranged in operons
- Polycistronic mRNA
- One promoter but separate ribosome binding sites
- Used in predictive bioinformatics
34Lac Operon Control
cAMP low
cAMP high
cAMP high
cAMP low
CAP catabolite activator protein (binds
cAMP) cAMP cyclic adenosine monophosphate lac
repressor inactivated when bound to lactose
35CAP is a dimer that binds DNA and is the size of
one turn of the helix
Structural bioinformatics
36Inverted repeats important for CAP binding
37Transcription Factor Domains
- Only a few major types
- Bind DNA
- 3D conformation of binding domain recognizes DNA
structure - Interact with RNA polymerase
- Modulate transcription of genes understanding
how to control a cells activities is an
important part of drug discovery and development
38DNA Recognition Domains
- Helix-turn-helix motif
- Zinc finger motif
- Leucine zipper motif
39Restriction Enzymes
- Restriction enzymes recognize specific DNA
sequences - Bind to DNA
- Introduce a cut can be used for cloning
- E.g. BamHI (GGATCC) and HincII (GTCGAC)
?
?
5G GATCC3 3CCTAG G5
5GTC GAC3 3CAG CTG5
overhang generated sticky end
no overhang blunt end
40DNA Methylases
- Enzymes that modify DNA via methylation of bases
- Protects DNA from nucleases
- If methylation occurs at a restriction enzyme
site, cutting could be inhibited - E.g. TaqI methylase methylates TCGA
CH3
5GTCGAC3 3CAGCTG5
Enzyme HincII inhibited (GTCGAC)
CH3
41Bioinformatics Strategies
- Sequence alignment of DNA or proteins
- Used to find homologs
- Orthologs vs. paralogs
- Similarity/identity can imply conserved function
- Depending on the context, it may be better to use
protein sequence rather than DNA - Codon usage
- Gene prediction
- Motif searches
- Consensus sequences
- Secondary structure e.g. hairpin loops
- Presence of protein domains imparting
functionality - Phylogenetic analysis
42Alignments - Protein vs. DNA
Consider the two following DNA sequences
ATG CTT CCC TTG CAT TTT AAA Seq 1 ATG CTG CCG
CTC CAC TTC AAG Seq 2
Translation yields the following protein
sequences
Met Leu Pro Leu His Phe Lys Seq 1translated Met
Leu Pro Leu His Phe Lys Seq 2translated
Both DNAs encode identical protein sequences, but
Seq 1 shares only 14/21 bases with Seq 2 66.7
identity
43Codon Usage
- Use of certain codons to encode amino acids is
non-random - Highly expressed genes use a restricted set of
codons for optimal translational efficiency - Can be used to predict highly expressed genes
- Atypical codon usage implies horizontal gene
transfer - CodonW software can calculate Codon Adaptation
Index (CAI), Codon Bias Index (CBI), etc. - Some tools here
- http//bioweb.pasteur.fr/seqanal/dna/intro-uk.html
44Gene Prediction
- Prediction of open reading frames (ORFs) which
represent the possibly expressed genes - Can then obtain a list of theoretical proteins
encoded by the genome via translation - Some examples of tools for gene prediction
include GlimmerHMM (eukaryotic genes) and Glimmer
(prokaryotic genes) - See The Institute for Genomic Research (TIGR) on
the web at http//www.tigr.org/
45Motif Searches
- Searching for patterns with biological
significance - Examples include promoter sequences, enhancers,
terminators - Hidden Markov models (HMMs) are quite often
employed in these types of searches - Software examples ELPH (motifs), RBSfinder
(ribosome binding sites) - Prosite search engine for protein families and
domains
46E. coli Promoter Consensus Sequences
s Factor Promoter Consensus Sequence
-35 Region
-10 Region s70 TTGACA
TATAAT s32 TCTCNCCCTTGAA CCCCATNTA
s28 CTAAA CCGATAT
-24 Region -12
Region s54 CTGGNA TTGCA
-10 region is also called Pribnow box, after its
discoverer
N any (A, T, C, or G)
E. coli has 5 different sigma factors, including
s38
47Transcription Factor Consensus Sequences
48Phylogenetic Analysis
- Use of conserved sequences to aid in
classification of organisms - Must choose sequences encoding molecules that
have conserved function across species - Evolutionary chronometer concept
- The difference between two sequences can be
proportional to the evolutionary distance between
those organisms - Prokaryotes 16S rRNA, eukaryotes 18S rRNA
49Nucleotide Databases
DNA DataBank of Japan (DDBJ) Ensembl joint
between EBI and Wellcome Trust Sanger
Institute European Bioinformatics Institute
(EBI) European Molecular Biology Laboratory
(EMBL) Japan Biological Information Research
Center (JBIRC) National Center for Biotechnology
Information (NCBI) Additional Proteomics sites
or databases ExPASy Swiss-Prot Many others!
50(No Transcript)
51(No Transcript)
52Page 78 in text
Page 73 in text (2nd Ed.)
53(No Transcript)
54Page 79 in text
Page 75 in text (2nd Ed.)
55Cross-reference
Protein sequence
Page 81 in text
Page 77 in text (2nd Ed.)
56DNA sequence
57Database problems
- Incomplete annotation
- Missing information such as function, keywords,
etc. - Consequence a given search will likely not
return all relevant database entries - Redundancy
- Smaller DNA segments often included in larger
ones (such as chromosome) - ESTs (Expressed sequence tags)
- A database is only as good as the person(s)
maintaining it garbage in garbage out!
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64Scroll to bottom to see what is on next slide.
65(No Transcript)
66Bit score (BLAST)
Colored bars represent matching portion of gene
color indicates functional group
67BLAST results