Molecular Exercise Physiology Bioinformatics Presentation 5 Henning Wackerhage - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Molecular Exercise Physiology Bioinformatics Presentation 5 Henning Wackerhage

Description:

Bioinformatics Presentation 5 Henning Wackerhage Bioinformatics Part 1 Why study bioinformatics? Bioinformatics Part 2 Genome viewing Human genome size: about 3,200 ... – PowerPoint PPT presentation

Number of Views:107

Avg rating:3.0/5.0

Slides: 41

Provided by: Computing118

Category:

more less

Transcript and Presenter's Notes

Title: Molecular Exercise Physiology Bioinformatics Presentation 5 Henning Wackerhage

1
Molecular Exercise PhysiologyBioinformatics
Presentation 5Henning Wackerhage
2
Learning outcomes

At the end of this presentation, you should be
able to
Find information on any DNA sequence, gene, RNA,
protein in various species online. This
information includes the position of genes in the
genome, the function of the proteins, their role
in disease.
Carry out blast searches to identify homologous
sequences.
Explain the cause of genetic variability.
Explain how a microarray experiment is carried
out.

This presentation will be supported by a computer
practical in bioinformatics. Please revise this
presentation carefully before the practical or
otherwise you will struggle.
3
Bioinformatics Part 1Why study bioinformatics?
4
Introduction
The human and many other genomes have now been
sequenced and this data has been deposited
online. In addition, there is a wealth of
information on genes and their products on
networked computers. Numerous programmes that
allow you to analyse this data do also exist.
Most of this data is freely accessible online via
user-friendly computer programmes. It is easy to
download the DNA sequence for any gene that might
respond to exercise or to find reliable
information on a protein that is involved in the
response to exercise. In this presentation, you
will learn how to find, use and analyse this
information. You will mainly learn by doing and
you will sometimes need to be stubborn and click
numerous button using the trial and error method
to finally get the information you want. It is
not rocket science but it will require stamina
and patience at times!
5
What is bioinformatics?
NIH bioinformatics definition Research,
development, or application of computational
tools and approaches for expanding the use of
biological, medical, behavioral or health data,
including those to acquire, store, organize,
archive, analyse, or visualise such data.
6
Why study bioinformatics?
Why should a sports biomedicist study
bioinformatics? The DNA encodes all the
information necessary for letting cells develop
into a functioning organism. The DNA thus also
encodes all the organs involved in exercise and
their adaptive response to exercise. In addition,
the differences in the DNA between two
individuals encodes the differences in the
structure and function of the two organisms this
includes differences such as differences in
muscle size, adaptation to training or in the
motor regions of the nervous system.

Therefore, bioinformatics will help us among
other to
Identify differences in the DNA sequence (i.e.
single nucleotide polymorphisms) between
individuals that correlate with athletic talent
or the extent of adaptation to exercise
Discover the regulatory mechanisms that mediate
the adaptation to exercise
Interpret the results of microarray experiments
where the expression of thousands of genes is
measured in response to exercise.

7
Bioinformatics Part 2Genome viewing
8
Genomes online
The genomes for many prokaryote, eukaryote,
plant, invertebrate and vertebrate model species
have now been sequenced. The DNA sequences of
these genomes have been posted online. However,
these websites contain much more than just the
naked DNA sequence which has limited use. With
the help of special computer algorithms, genes
(exons, introns) have been identified based by
using available research information and by de
novo prediction. Identified genes have been
linked to various other sites including those
that list information on the same gene in other
species, the gene product (protein databases),
PubMed, disease databases etc. Genome browsers
are therefore powerful tools not only for the
specialist but also for the essay-writing
student. The following website shows an
incomplete tree of sequenced genomes and the
slide thereafter the information available on
genenome browsers.
9
Genomes online (incomplete)
http//www.ncbi.nlm.nih.gov/mapview
10
Genomes online
Online Mendelian Inheritance in Man (OMIM)
PubMed Reference search
Full-text electronic journals
Nucleotide sequences
3D Structures
Protein sequences MQKLQLCVY
Taxonomy
Maps Genomes
11
Genomes online
The by largest project was the human genome
project, the sequencing of our own DNA sequence.
Some findings are surprising

Human genome size about 3,200 Mb (mega bases).
Gene numbers human 31,000, yeast 6000, fly
13,000, worm 18,000, plant 26,000.
Only 1.1 to 1.4 of the human sequence encodes
protein. The rest is non-coding.
28 of the sequence is transcribed into RNA (5
of this is translated into proteins).
Only 94 of 1,278 protein families are specific to
vertebrates.
Why do we differ? Humans differ from another by
about one base pair per thousand single
nucleotide polymorphisms (SNPs).

12
Human genome project

Landmarks
Watson-Crick structure of DNA published
1975 F.Sanger, and independently A. Maxam and W.
Gilbert, develop methods for sequencing DNA
1981 Human mitochondrial DNA sequenced 16560
base pairs
1990 International Human Genome Project launched
target horizon 15 years
1991 J.C. Venter and colleagues identify active
genes via expressed sequence tags (ESTs)
2000 Joint announcement of complete draft
sequence of human genome
2003 Completion of human genome

13
Major genome browsers
You can browse genome data using one of the
following browsers. We will mainly use Ensembl,
the European and user-friendly version
www.ensembl.org
www.ncbi.nlm.nih.gov
http//genome.cse.ucsc.edu/
Task Enter each of these websites and just click
many buttons and see what information you can
obtain. We will mainly use the Ensembl website.
14
Searching for gene information
OK, browsing the genome browsers and clicking on
chromosomes is pretty simple. However, you will
most of the time search for a specific gene where
you do not know the genomic location. In these
cases, you will have to use a search engine and
type the name of the gene or protein in. To do
so, open the Ensembl website (www.ensembl.org)
and click the species, normally human. On the top
of the page it states Search for anything with
and a box follows where you have to type in your
search term. Click Lookup and you will obtain
results. Worked example Type in malate
dehydrogenase and click lookup. Many items
will be listed starting with 9 matches in the
homo sapiens disease index. However, you are
interested in the gene. Therefore scroll down
until you see 170 matches in the Homo sapiens
Gene index. The first entry under this heading
is Malate dehydrogenase, cytoplasmic (EC
1.1.1.37). Other isoforms of this enzyme are
listed as well and you might have to get more
information now in what isoform you are
interested.
15
Searching for gene information
On the Ensembl human genome website, enter
troponin into the search box. Find the
following gene among the search results Troponin
C, skeletal muscle Click this and a website with
numerous clickable links will appear. Task 1
Click Export gene data in EMBL, GenBank or
FASTA. Scroll down, select output format text
and export. The DNA sequence of the gene will
appear. You can now analyse the sequence or
design primers for the polymerase chain reaction
(PCR) Task 2 Return to the Troponin C, skeletal
muscle website. Now click MIM (or OMIM). It
stands for (Online) Mendelian Inheritance in
Man. Read the paragraph. It will inform you about
research on the gene. The text on troponin is
very short compared to other texts e.g. on major
disease genes.
16
Searching for gene information

Task 3 Click LocusLink. The following bar will
appear
Click on each window and produce the following
information
Who has carried out a structural analysis of the
human troponin C gene?
There is an ion binding motif on the molecule.
For what ion?
Name a gene that is a neighbour on the
chromosome.
What is the percent homology (similarity of the
DNA sequence) between the human and rat troponin
C genes?

17
Bioinformatics Part 3Genetic variability
18
Genetic variation
By now, you may have asked yourself the following
question How can they list one human genome
sequence if we are all different? Surely, our
genomes will be different? Good question and
yes, we are different. We differ because of
nature and nurture and the nature bit is due to
differences in the DNA between human beings.
Most of these differences in the DNA sequence do
not occur at random but at fixed positions
approximately all 1300 base pairs (bp). They are
called single nucleotide polymorphisms (SNPs,
pronounced snips). There are roughly 2,500,000
SNPs in the human genome.
Variation in the human species mainly the result
of SNPs.
19
Genetic variation
Worked example I have used Ensembl and have
picked the following SNP. During sequencing (each
sequence is sequenced several times), the
investigators note that there is a base pair
which is sometimes sequenced as an adenine (A) or
a thymine (T) with high variability. An ambiguity
code W was used to indicate this in the final
sequence Alleles AT (ambiguity code
W) Sequence Region CACAACTGCTTGGAWAAAACAGGATAG
SNPs are not the only source of genetic
variation. Here is an example for a deletion
mutation with some bases missing Deletion TCAAG
GTATTCTTCA AAAAGGTCCCAACCC Insertion T
CAAGGTATTCTTCAGATTCTAAAAGGTCCCAACCC
20
Genetic variation
Do all SNPs lead to a change in phenotype? No!
Remember that only lt2 of human DNA encodes
proteins and that a lot of DNA is non-coding or
intergenic DNA. A SNP or deletion in a
DNA-sequence with no function will probably not
have a noticeable effect. Which of the following
SNPs (1-5) are likely to cause a change in the
expression or structure of the protein encoded by
the gene?
Gene
Start
Termination
Exon
Exon
Intron
Promoter
Enhancer
DNA
SNPs
1 2 3 4 5
21
Genetic variation
Start
Termination
Exon
Exon
Intron
Promoter
Enhancer
DNA
SNPs
1 2 3 4 5
Answer SNP1. This SNP could affect the binding
of transcription factors to the enhancer and thus
the expression of the gene. SNP2. This SNP lies
in a non-functional region and will probably have
no effect. It could affect histone binding,
though! SNP3. This SNP could affect the binding
of the transcriptional machinery (esp. RNA
polymerase II) to the promoter SNP4. This SNP is
in an exon and will code an amino acid. However,
it will only have an effect if the change triplet
will encode a different amino acid (e.g. AGA and
AGG both encode arginine). SNP5. This SNP will be
spliced out and therefore it will not have an
effect.
22
Find a SNP!
Worked example Find SNPs that lie in the exons
of the myostatin gene, whose protein product is a
potent muscle growth inhibitor.First search for
myostatin. There is another abbreviation for
myostatin which is GDF-8. Click view gene in
genomic location.
Lower on the page you will find a features menu.
Open, cross the SNPs box and close again. The
following window opens and you see the coding,
untranslated (UTR) and intronic SNPs. You can
additionally open human proteins or EMBL
mRNAs to see where the myostatin gene lies.
There are two SNPs in the myostatin Exon.
SNPs
23
Find a SNP!
Worked example If you click on a snip, a new
window appears. You will find the SNP in the
genomic sequence GTAARGGCC where R stands for a
AG polymorphism. You also find the following
figure
Myostatin (GDF8) gene (3 exons shown in dark red)
Coding SNPs with R (AG) ambiguity
24
Find a SNP!
Task How many SNPs do you find in the exons and
introns of the human histidine decarboxylase (EC
4.1.1.22) gene? By the way, what does the EC
number stand for?
25
How to detect genetic variation?
So far, studies investigating the relation
between genetic variation and e.g. disease have
focussed on dramatic mutations like frameshift
mutations, deletion/insertion mutations rather
than the more subtle SNPs. Larger mutations are
easier to detect and the effects are usually more
dramatic.
26
How to detect genetic variation?
Method DNA can be obtained from nuclear blood
cells. The correct DNA will be excised and
amplified using the polymerase chain reaction
with so-called primers that will only amplify a
specific DNA sequence. Here, a DNA fragment
either with a deletion (D) or insertion (I)
mutation of the Angiotensin-converting enzyme
(ACE) gene has been amplified and
electrophoresed. Angiotensin II is a known
inducer of cardiac hypertrophy. Because we have
two copies of each gene, the combinations DD, ID
or II are possible. In this study, DD patients
had a larger left ventricle (heart) than ID and
II patients. (figure from Lechin et al. 1995)
27
Genetic variation and performance
Figure. Montgomery et al. (1998) measured the
genotype of the angiotensin converting enzyme
gene, where an insertion/deletion mutation
exists. The left shows the PCR results for the
three gentypes DD, ID and II (taken from Lechin
et al. 1995). The right figure shows the relation
between the genotype and the increase in
repetitive elbow flexion in response to a
specific 10 week training programme among British
army recruits. The data suggest that a DD
genotype is associated with low, and ID with
medium and a II genotype with high trainability
for this specific task.
28
Actinin genotype and performance
Actinin (ACTN) is an actin-binding protein and
the two ACTN2 and ACTN3 isoforms are found in
skeletal muscle. Yang et al. (2003) reported the
association of a ACTN3-RR and ACTN3-RX genotype
with power athletes (these athletes have more
ACTN3).
29
Bioinformatics Part 4Homology searches
30
Homologies
Worked example You have sequenced the following
human DNA fragment and you want to know more
about it AAAACATCTATCTTGCTGTGTTTGGACAGGCCAGCCCCTG
AAACATCTTGGGCAATGGAGGGTTAACTTCTCAAAGTTTAATAGGCAAGA
CCAGCAACCATGCAACAAGGTAAATTGTCCTCACGAGAACTCCAAAGACT
ATTTTTCTCTCTCTTTTTTTGAGGCAGGGTCTCGCTATGTTACCCAGGCT
GCTCTCGAACTCTTGGGCTCAAGCAATCCCCCCATCTTAACCTCCCCAGC
AGCTGGGACTACAGCCACGCGCCACTGCACCCAGCTGACTTTTCCTTCTA
AGCATCTTTGGCTGGGCGTGGTGGCTCATGCCTGTAATCCCTGCACTTTG
GGAGGCCAAGGTGGGTAGATCACTGGAGGTCAGGAGTTCTAGACCAGCCT
GGCCAACATGGTGAAACCTCATCTCTACTAAAAATACAAAAAAATTAGCT
GGGCATGGTGGCAGGTGCCTGTAATCCTAGCTACTCGGGAGGCTGAAGCA
GGAGAATTGCTTGAACCCAGGAGGTAGAGGTTGCAGTGACCCAAGATTGT
GCCACTGCACTCCAGCCTGGGTACACAGCGAGTCTGTCTAAAAAAGAAAA
AAAAAAAAGGAAGAGAGAGCATCTTTATCTTCATTTTCTAACCTTTAAGT
GTTACTTTCTCCCAGTAACATTTTGCCCAGAAAGAGGTGATGAATATAGA
TTTAAGAATAAGATTTTCCCCATGTTGCTGCCTTTCCAGAACAAGTGAGT
TCATTCTCATTTGTCTTTCTTCAGAAATCTTTTATCTGTCTTTCTCCCAT
TAGCTGGAATGGGTGCTCCATGAGAATAAAGACTTGGGTTCCATTCTTCC
TATTGTCCCCAGAGCCTACATACTGGCTGGCATTGAGTAGCAATTGAACA
GTTTTCTGAATGAATGAATGAATGAATGCTCAAATAAGCACATGAATTAA
TTATCACTTTCCTTTGAATCTCTCCATTCTTCTTCCTCACCCAATGGGGC
TCGATCCTTATACACAGAAGATACTCTATAAATGATGATTCAATGAATGC
CAAGCCCTGTTCTATGCACTGAAGACCAAAAGAAATAAAAGACATCATTC
CTGCTCTGTAAGAA
31
Homologies
Worked example To do so, you have to carry out a
Blast search. Enter http//www.ensembl.org/Homo_s
apiens/blastview Paste the sequence into the
large box, select homo sapiens as the database
to search against and blastn for a nucleotide
search. Blastx does searches for DNA against
protein (amino acid sequence), blastp for
protein against protein.
32
Homologies
Worked example After you have started the
search, click retrieve and the programme will
display a view button. Click the view button
and the programme will display a list of matches
with a score and a identity. There is one match
with 100 identity on chromosome 10 (red arrow on
the chromosome). Clicking A yields a
graphical display of the homology
AAACATCTATCTTGCTGTGTTTGGACAGGCCAGCCCCTGAAACATCTTG
GGCAAT
AAACATCTATCTTGCTGTGTTTGGACAGGCCAGCC
CCTGAAACATCTTGGGCAAT If there is not 100
homology, then the alignment looks as
follows CTCATGCCTGTAATCCCTGCACTTTGGGAGGCCAAGGTGGG
TAGATCACTGGAGG xxx
xxxx CTCATGCCTGTAATCCTAGTACTTTGG
GAGGCCAAGGTGAGCAGATCACCTGAGG The x indicate a
difference between both sequences.
33
Homologies
Task I have selected a mouse DNA sequence and
your task is to see whether there is a homologous
human sequence. GTGTCTTGCACAGTAATAGACCGCAGAGTCCTCA
GATGTCAGGCTGCTGAGCTGCATGTA GGCTGTGCTGGAGGATGTGTCTA
CAGTCAATGTGGCCTTGCCCTTGAACTTTTGATTGTA
GTTAGTATAGCTATCAGAAGGATCAATCTCTCCGATCCACTCAAGGCCCT
GTCCAGGCCT CTGTTTTACCCACTGCATCCAGTAGCTGGTGAAGGTGTA
GCCAGAAGCCTTGCAGGACAG CTTCACTGAAGCCCCAGGCTTCACAAGC
TCAGCCCCAGGCTGCTGCAGTTGGACCTGAGA
GTGGACACCTGTGGAGAGAAAGGCAGAGTGGATGTCATTGTCACTCAAGT
GTATGGCCAG ACATCGAGCCTGCTACTGTGAGCCCCTTACCTGTAGCTG
TTGCTACCAAGAAGAGGATGA TACAGCTCCATCCCATGGCGAGGTCCTG
TGTGCTCAGTAACTGTAAAGAGAACAGTGATC
TCATGTTTTTCTGTGTGTGGTATAGACAACCCTATATTTACCATGTAGAC
TCACAGGATT TGCATATTCATGAGCAGGATACATATTAGATGAGCACCT
ACTCCTGCAGGAGAAGAAGAG ACACCTGGGTCAGGAATCAGGATGCTGA
AACCCAAGTCATAGTCTTGTCTGAGGTAATTC
ATCCCATACCTCATCCCTGAACCTTGTGTTGAGGCTATGGATGTAACATT
ATAGCCTGTG CACTAAAAAGATTTGCATCCTGAGACAGTGGCCCCACTT
GTGACACAGTTGACAGATGGA
34
Bioinformatics Part 4Microarrays
35
Microarrays
Microarrays or biochips are a technique
increasingly used by leading research groups in
exercise physiology. Microarrays are used to
compare the mRNA levels in two samples, e.g.
control (no exercise) versus exercise.
Importantly, this comparison is done for nearly
all mRNAs that can be found in a tissue (e.g. all
genes expressed in skeletal muscle).
36
Microarrays
The method works by printing thousands DNA dots
that code for the genes of the organism onto a
slide. The experimenter then converts the mRNA
into DNA that is labelled with a fluorescent
marker, usually green for the control sample and
red for the experimental sample. The labelled
control and exercise samples are allowed to
hybridise (stick to) the complimentary DNA that
is printed onto the slide. If a dot appears
green, then there was more control mRNA in the
sample (mRNA goes down during exercise). If a dot
is red, then the mRNA went up in response to
exercise. Yellow dots mean that the amount of
mRNA was roughly equal in the control and
exercise sample i.e. the genes expression is
not affected by exercise. No fluorescence
indicates that this gene is not expressed in
muscle (e.g. brain gene). The following slide
schematically shows what has just been said.
37
Microarrays
Normal mRNA
Disease mRNA
Informatics Image processing, DBMS, WWW,
bioinformatics, data mining and visualization
RT/PCR Label with fluorescent dye
Labelled DNA from mRNA
Combine equal amounts
Hybridise probe to microarray
Scan
38
Microarray example
No mRNA mRNA only expressed in control mRNA only
expressed in response to disease/exercise Expressi
on in control and disease/exercise.
39
Microarray analysis
Microarray experiments usually show the
differential expression of hundreds or thousands
of genes. Task Assume the following two genes
are expressed at higher levels in response to 1 h
of cycling exercise.
HSPD13982_i_at Cathepsin D (lysosomal aspartyl
protease) NM_006457_r_at LIM protein (similar to
rat protein kinase C- binding enigma) a) What
is the function of these genes? b) Is there any
link to exercise (e.g. changes in similar
proteins in response to exercise?
40
The End

Write a Comment

User Comments (0)