Title: Molecular Exercise Physiology Bioinformatics Presentation 5 Henning Wackerhage
1Molecular Exercise PhysiologyBioinformatics
Presentation 5Henning Wackerhage
2Learning outcomes
- At the end of this presentation, you should be
able to - Find information on any DNA sequence, gene, RNA,
protein in various species online. This
information includes the position of genes in the
genome, the function of the proteins, their role
in disease. - Carry out blast searches to identify homologous
sequences. - Explain the cause of genetic variability.
- Explain how a microarray experiment is carried
out.
This presentation will be supported by a computer
practical in bioinformatics. Please revise this
presentation carefully before the practical or
otherwise you will struggle.
3Bioinformatics Part 1Why study bioinformatics?
4Introduction
The human and many other genomes have now been
sequenced and this data has been deposited
online. In addition, there is a wealth of
information on genes and their products on
networked computers. Numerous programmes that
allow you to analyse this data do also exist.
Most of this data is freely accessible online via
user-friendly computer programmes. It is easy to
download the DNA sequence for any gene that might
respond to exercise or to find reliable
information on a protein that is involved in the
response to exercise. In this presentation, you
will learn how to find, use and analyse this
information. You will mainly learn by doing and
you will sometimes need to be stubborn and click
numerous button using the trial and error method
to finally get the information you want. It is
not rocket science but it will require stamina
and patience at times!
5What is bioinformatics?
NIH bioinformatics definition Research,
development, or application of computational
tools and approaches for expanding the use of
biological, medical, behavioral or health data,
including those to acquire, store, organize,
archive, analyse, or visualise such data.
6Why study bioinformatics?
Why should a sports biomedicist study
bioinformatics? The DNA encodes all the
information necessary for letting cells develop
into a functioning organism. The DNA thus also
encodes all the organs involved in exercise and
their adaptive response to exercise. In addition,
the differences in the DNA between two
individuals encodes the differences in the
structure and function of the two organisms this
includes differences such as differences in
muscle size, adaptation to training or in the
motor regions of the nervous system.
- Therefore, bioinformatics will help us among
other to - Identify differences in the DNA sequence (i.e.
single nucleotide polymorphisms) between
individuals that correlate with athletic talent
or the extent of adaptation to exercise - Discover the regulatory mechanisms that mediate
the adaptation to exercise - Interpret the results of microarray experiments
where the expression of thousands of genes is
measured in response to exercise.
7Bioinformatics Part 2Genome viewing
8Genomes online
The genomes for many prokaryote, eukaryote,
plant, invertebrate and vertebrate model species
have now been sequenced. The DNA sequences of
these genomes have been posted online. However,
these websites contain much more than just the
naked DNA sequence which has limited use. With
the help of special computer algorithms, genes
(exons, introns) have been identified based by
using available research information and by de
novo prediction. Identified genes have been
linked to various other sites including those
that list information on the same gene in other
species, the gene product (protein databases),
PubMed, disease databases etc. Genome browsers
are therefore powerful tools not only for the
specialist but also for the essay-writing
student. The following website shows an
incomplete tree of sequenced genomes and the
slide thereafter the information available on
genenome browsers.
9Genomes online (incomplete)
http//www.ncbi.nlm.nih.gov/mapview
10Genomes online
Online Mendelian Inheritance in Man (OMIM)
PubMed Reference search
Full-text electronic journals
Nucleotide sequences
3D Structures
Protein sequences MQKLQLCVY
Taxonomy
Maps Genomes
11Genomes online
The by largest project was the human genome
project, the sequencing of our own DNA sequence.
Some findings are surprising
- Human genome size about 3,200 Mb (mega bases).
- Gene numbers human 31,000, yeast 6000, fly
13,000, worm 18,000, plant 26,000. - Only 1.1 to 1.4 of the human sequence encodes
protein. The rest is non-coding. - 28 of the sequence is transcribed into RNA (5
of this is translated into proteins). - Only 94 of 1,278 protein families are specific to
vertebrates. - Why do we differ? Humans differ from another by
about one base pair per thousand single
nucleotide polymorphisms (SNPs).
12Human genome project
- Landmarks
- Watson-Crick structure of DNA published
- 1975 F.Sanger, and independently A. Maxam and W.
Gilbert, develop methods for sequencing DNA - 1981 Human mitochondrial DNA sequenced 16560
base pairs - 1990 International Human Genome Project launched
target horizon 15 years - 1991 J.C. Venter and colleagues identify active
genes via expressed sequence tags (ESTs) - 2000 Joint announcement of complete draft
sequence of human genome - 2003 Completion of human genome
13Major genome browsers
You can browse genome data using one of the
following browsers. We will mainly use Ensembl,
the European and user-friendly version
www.ensembl.org
www.ncbi.nlm.nih.gov
http//genome.cse.ucsc.edu/
Task Enter each of these websites and just click
many buttons and see what information you can
obtain. We will mainly use the Ensembl website.
14Searching for gene information
OK, browsing the genome browsers and clicking on
chromosomes is pretty simple. However, you will
most of the time search for a specific gene where
you do not know the genomic location. In these
cases, you will have to use a search engine and
type the name of the gene or protein in. To do
so, open the Ensembl website (www.ensembl.org)
and click the species, normally human. On the top
of the page it states Search for anything with
and a box follows where you have to type in your
search term. Click Lookup and you will obtain
results. Worked example Type in malate
dehydrogenase and click lookup. Many items
will be listed starting with 9 matches in the
homo sapiens disease index. However, you are
interested in the gene. Therefore scroll down
until you see 170 matches in the Homo sapiens
Gene index. The first entry under this heading
is Malate dehydrogenase, cytoplasmic (EC
1.1.1.37). Other isoforms of this enzyme are
listed as well and you might have to get more
information now in what isoform you are
interested.
15Searching for gene information
On the Ensembl human genome website, enter
troponin into the search box. Find the
following gene among the search results Troponin
C, skeletal muscle Click this and a website with
numerous clickable links will appear. Task 1
Click Export gene data in EMBL, GenBank or
FASTA. Scroll down, select output format text
and export. The DNA sequence of the gene will
appear. You can now analyse the sequence or
design primers for the polymerase chain reaction
(PCR) Task 2 Return to the Troponin C, skeletal
muscle website. Now click MIM (or OMIM). It
stands for (Online) Mendelian Inheritance in
Man. Read the paragraph. It will inform you about
research on the gene. The text on troponin is
very short compared to other texts e.g. on major
disease genes.
16Searching for gene information
- Task 3 Click LocusLink. The following bar will
appear - Click on each window and produce the following
information - Who has carried out a structural analysis of the
human troponin C gene? - There is an ion binding motif on the molecule.
For what ion? - Name a gene that is a neighbour on the
chromosome. - What is the percent homology (similarity of the
DNA sequence) between the human and rat troponin
C genes?
17Bioinformatics Part 3Genetic variability
18Genetic variation
By now, you may have asked yourself the following
question How can they list one human genome
sequence if we are all different? Surely, our
genomes will be different? Good question and
yes, we are different. We differ because of
nature and nurture and the nature bit is due to
differences in the DNA between human beings.
Most of these differences in the DNA sequence do
not occur at random but at fixed positions
approximately all 1300 base pairs (bp). They are
called single nucleotide polymorphisms (SNPs,
pronounced snips). There are roughly 2,500,000
SNPs in the human genome.
Variation in the human species mainly the result
of SNPs.
19Genetic variation
Worked example I have used Ensembl and have
picked the following SNP. During sequencing (each
sequence is sequenced several times), the
investigators note that there is a base pair
which is sometimes sequenced as an adenine (A) or
a thymine (T) with high variability. An ambiguity
code W was used to indicate this in the final
sequence Alleles AT (ambiguity code
W) Sequence Region CACAACTGCTTGGAWAAAACAGGATAG
SNPs are not the only source of genetic
variation. Here is an example for a deletion
mutation with some bases missing Deletion TCAAG
GTATTCTTCA AAAAGGTCCCAACCC Insertion T
CAAGGTATTCTTCAGATTCTAAAAGGTCCCAACCC
20Genetic variation
Do all SNPs lead to a change in phenotype? No!
Remember that only lt2 of human DNA encodes
proteins and that a lot of DNA is non-coding or
intergenic DNA. A SNP or deletion in a
DNA-sequence with no function will probably not
have a noticeable effect. Which of the following
SNPs (1-5) are likely to cause a change in the
expression or structure of the protein encoded by
the gene?
Gene
Start
Termination
Exon
Exon
Intron
Promoter
Enhancer
DNA
SNPs
1 2 3 4 5
21Genetic variation
Start
Termination
Exon
Exon
Intron
Promoter
Enhancer
DNA
SNPs
1 2 3 4 5
Answer SNP1. This SNP could affect the binding
of transcription factors to the enhancer and thus
the expression of the gene. SNP2. This SNP lies
in a non-functional region and will probably have
no effect. It could affect histone binding,
though! SNP3. This SNP could affect the binding
of the transcriptional machinery (esp. RNA
polymerase II) to the promoter SNP4. This SNP is
in an exon and will code an amino acid. However,
it will only have an effect if the change triplet
will encode a different amino acid (e.g. AGA and
AGG both encode arginine). SNP5. This SNP will be
spliced out and therefore it will not have an
effect.
22Find a SNP!
Worked example Find SNPs that lie in the exons
of the myostatin gene, whose protein product is a
potent muscle growth inhibitor.First search for
myostatin. There is another abbreviation for
myostatin which is GDF-8. Click view gene in
genomic location.
Lower on the page you will find a features menu.
Open, cross the SNPs box and close again. The
following window opens and you see the coding,
untranslated (UTR) and intronic SNPs. You can
additionally open human proteins or EMBL
mRNAs to see where the myostatin gene lies.
There are two SNPs in the myostatin Exon.
SNPs
23Find a SNP!
Worked example If you click on a snip, a new
window appears. You will find the SNP in the
genomic sequence GTAARGGCC where R stands for a
AG polymorphism. You also find the following
figure
Myostatin (GDF8) gene (3 exons shown in dark red)
Coding SNPs with R (AG) ambiguity
24Find a SNP!
Task How many SNPs do you find in the exons and
introns of the human histidine decarboxylase (EC
4.1.1.22) gene? By the way, what does the EC
number stand for?
25How to detect genetic variation?
So far, studies investigating the relation
between genetic variation and e.g. disease have
focussed on dramatic mutations like frameshift
mutations, deletion/insertion mutations rather
than the more subtle SNPs. Larger mutations are
easier to detect and the effects are usually more
dramatic.
26How to detect genetic variation?
Method DNA can be obtained from nuclear blood
cells. The correct DNA will be excised and
amplified using the polymerase chain reaction
with so-called primers that will only amplify a
specific DNA sequence. Here, a DNA fragment
either with a deletion (D) or insertion (I)
mutation of the Angiotensin-converting enzyme
(ACE) gene has been amplified and
electrophoresed. Angiotensin II is a known
inducer of cardiac hypertrophy. Because we have
two copies of each gene, the combinations DD, ID
or II are possible. In this study, DD patients
had a larger left ventricle (heart) than ID and
II patients. (figure from Lechin et al. 1995)
27Genetic variation and performance
Figure. Montgomery et al. (1998) measured the
genotype of the angiotensin converting enzyme
gene, where an insertion/deletion mutation
exists. The left shows the PCR results for the
three gentypes DD, ID and II (taken from Lechin
et al. 1995). The right figure shows the relation
between the genotype and the increase in
repetitive elbow flexion in response to a
specific 10 week training programme among British
army recruits. The data suggest that a DD
genotype is associated with low, and ID with
medium and a II genotype with high trainability
for this specific task.
28Actinin genotype and performance
Actinin (ACTN) is an actin-binding protein and
the two ACTN2 and ACTN3 isoforms are found in
skeletal muscle. Yang et al. (2003) reported the
association of a ACTN3-RR and ACTN3-RX genotype
with power athletes (these athletes have more
ACTN3).
29Bioinformatics Part 4Homology searches
30Homologies
Worked example You have sequenced the following
human DNA fragment and you want to know more
about it AAAACATCTATCTTGCTGTGTTTGGACAGGCCAGCCCCTG
AAACATCTTGGGCAATGGAGGGTTAACTTCTCAAAGTTTAATAGGCAAGA
CCAGCAACCATGCAACAAGGTAAATTGTCCTCACGAGAACTCCAAAGACT
ATTTTTCTCTCTCTTTTTTTGAGGCAGGGTCTCGCTATGTTACCCAGGCT
GCTCTCGAACTCTTGGGCTCAAGCAATCCCCCCATCTTAACCTCCCCAGC
AGCTGGGACTACAGCCACGCGCCACTGCACCCAGCTGACTTTTCCTTCTA
AGCATCTTTGGCTGGGCGTGGTGGCTCATGCCTGTAATCCCTGCACTTTG
GGAGGCCAAGGTGGGTAGATCACTGGAGGTCAGGAGTTCTAGACCAGCCT
GGCCAACATGGTGAAACCTCATCTCTACTAAAAATACAAAAAAATTAGCT
GGGCATGGTGGCAGGTGCCTGTAATCCTAGCTACTCGGGAGGCTGAAGCA
GGAGAATTGCTTGAACCCAGGAGGTAGAGGTTGCAGTGACCCAAGATTGT
GCCACTGCACTCCAGCCTGGGTACACAGCGAGTCTGTCTAAAAAAGAAAA
AAAAAAAAGGAAGAGAGAGCATCTTTATCTTCATTTTCTAACCTTTAAGT
GTTACTTTCTCCCAGTAACATTTTGCCCAGAAAGAGGTGATGAATATAGA
TTTAAGAATAAGATTTTCCCCATGTTGCTGCCTTTCCAGAACAAGTGAGT
TCATTCTCATTTGTCTTTCTTCAGAAATCTTTTATCTGTCTTTCTCCCAT
TAGCTGGAATGGGTGCTCCATGAGAATAAAGACTTGGGTTCCATTCTTCC
TATTGTCCCCAGAGCCTACATACTGGCTGGCATTGAGTAGCAATTGAACA
GTTTTCTGAATGAATGAATGAATGAATGCTCAAATAAGCACATGAATTAA
TTATCACTTTCCTTTGAATCTCTCCATTCTTCTTCCTCACCCAATGGGGC
TCGATCCTTATACACAGAAGATACTCTATAAATGATGATTCAATGAATGC
CAAGCCCTGTTCTATGCACTGAAGACCAAAAGAAATAAAAGACATCATTC
CTGCTCTGTAAGAA
31Homologies
Worked example To do so, you have to carry out a
Blast search. Enter http//www.ensembl.org/Homo_s
apiens/blastview Paste the sequence into the
large box, select homo sapiens as the database
to search against and blastn for a nucleotide
search. Blastx does searches for DNA against
protein (amino acid sequence), blastp for
protein against protein.
32Homologies
Worked example After you have started the
search, click retrieve and the programme will
display a view button. Click the view button
and the programme will display a list of matches
with a score and a identity. There is one match
with 100 identity on chromosome 10 (red arrow on
the chromosome). Clicking A yields a
graphical display of the homology
AAACATCTATCTTGCTGTGTTTGGACAGGCCAGCCCCTGAAACATCTTG
GGCAAT
AAACATCTATCTTGCTGTGTTTGGACAGGCCAGCC
CCTGAAACATCTTGGGCAAT If there is not 100
homology, then the alignment looks as
follows CTCATGCCTGTAATCCCTGCACTTTGGGAGGCCAAGGTGGG
TAGATCACTGGAGG xxx
xxxx CTCATGCCTGTAATCCTAGTACTTTGG
GAGGCCAAGGTGAGCAGATCACCTGAGG The x indicate a
difference between both sequences.
33Homologies
Task I have selected a mouse DNA sequence and
your task is to see whether there is a homologous
human sequence. GTGTCTTGCACAGTAATAGACCGCAGAGTCCTCA
GATGTCAGGCTGCTGAGCTGCATGTA GGCTGTGCTGGAGGATGTGTCTA
CAGTCAATGTGGCCTTGCCCTTGAACTTTTGATTGTA
GTTAGTATAGCTATCAGAAGGATCAATCTCTCCGATCCACTCAAGGCCCT
GTCCAGGCCT CTGTTTTACCCACTGCATCCAGTAGCTGGTGAAGGTGTA
GCCAGAAGCCTTGCAGGACAG CTTCACTGAAGCCCCAGGCTTCACAAGC
TCAGCCCCAGGCTGCTGCAGTTGGACCTGAGA
GTGGACACCTGTGGAGAGAAAGGCAGAGTGGATGTCATTGTCACTCAAGT
GTATGGCCAG ACATCGAGCCTGCTACTGTGAGCCCCTTACCTGTAGCTG
TTGCTACCAAGAAGAGGATGA TACAGCTCCATCCCATGGCGAGGTCCTG
TGTGCTCAGTAACTGTAAAGAGAACAGTGATC
TCATGTTTTTCTGTGTGTGGTATAGACAACCCTATATTTACCATGTAGAC
TCACAGGATT TGCATATTCATGAGCAGGATACATATTAGATGAGCACCT
ACTCCTGCAGGAGAAGAAGAG ACACCTGGGTCAGGAATCAGGATGCTGA
AACCCAAGTCATAGTCTTGTCTGAGGTAATTC
ATCCCATACCTCATCCCTGAACCTTGTGTTGAGGCTATGGATGTAACATT
ATAGCCTGTG CACTAAAAAGATTTGCATCCTGAGACAGTGGCCCCACTT
GTGACACAGTTGACAGATGGA
34Bioinformatics Part 4Microarrays
35Microarrays
Microarrays or biochips are a technique
increasingly used by leading research groups in
exercise physiology. Microarrays are used to
compare the mRNA levels in two samples, e.g.
control (no exercise) versus exercise.
Importantly, this comparison is done for nearly
all mRNAs that can be found in a tissue (e.g. all
genes expressed in skeletal muscle).
36Microarrays
The method works by printing thousands DNA dots
that code for the genes of the organism onto a
slide. The experimenter then converts the mRNA
into DNA that is labelled with a fluorescent
marker, usually green for the control sample and
red for the experimental sample. The labelled
control and exercise samples are allowed to
hybridise (stick to) the complimentary DNA that
is printed onto the slide. If a dot appears
green, then there was more control mRNA in the
sample (mRNA goes down during exercise). If a dot
is red, then the mRNA went up in response to
exercise. Yellow dots mean that the amount of
mRNA was roughly equal in the control and
exercise sample i.e. the genes expression is
not affected by exercise. No fluorescence
indicates that this gene is not expressed in
muscle (e.g. brain gene). The following slide
schematically shows what has just been said.
37Microarrays
Normal mRNA
Disease mRNA
Informatics Image processing, DBMS, WWW,
bioinformatics, data mining and visualization
RT/PCR Label with fluorescent dye
Labelled DNA from mRNA
Combine equal amounts
Hybridise probe to microarray
Scan
38Microarray example
No mRNA mRNA only expressed in control mRNA only
expressed in response to disease/exercise Expressi
on in control and disease/exercise.
39Microarray analysis
Microarray experiments usually show the
differential expression of hundreds or thousands
of genes. Task Assume the following two genes
are expressed at higher levels in response to 1 h
of cycling exercise.
HSPD13982_i_at Cathepsin D (lysosomal aspartyl
protease) NM_006457_r_at LIM protein (similar to
rat protein kinase C- binding enigma) a) What
is the function of these genes? b) Is there any
link to exercise (e.g. changes in similar
proteins in response to exercise?
40The End