STBC2023 - PowerPoint PPT Presentation

About This Presentation
Title:

STBC2023

Description:

M. Firdaus Raih Room 1166, Bangunan Sains Biologi Office Hours: Wednesdays Phone: 0389215961 Email: firdaus_at_mfrlab.org Ver. 23-01-09-1 STBC2023 Introduction to ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 77
Provided by: M12166
Learn more at: http://www.mfrlab.org
Category:

less

Transcript and Presenter's Notes

Title: STBC2023


1
STBC2023 Introduction to
BioinformaticsAnalyses Predictive Methods
Using Nucleotide Sequences
  • M. Firdaus Raih
  • Room 1166, Bangunan Sains Biologi
  • Office Hours Wednesdays
  • Phone 0389215961 Email firdaus_at_mfrlab.org

Ver. 23-01-09-1
2
Guide
  • This is a electronic self study and self
    assessment module which is based on the lectures
    which cover Topic 4 Analysis at Nucleotide
    Level of the STBC2023 Introduction to
    Bioinformatics course.
  • To navigate this module, use the buttons provided
    mostly on the bottom right hand corner of the
    page or in some slides, the bottom left hand
    corner. The Home icon button will automatically
    set the slide back to the key questions which we
    are trying to answer with this course material.
    Several pages have hyperlinks which navigate
    immediately to either specific slides OR navigate
    away from this module via the default web
    browser. To return, simply click back this file.
    Not clicking on the buttons properly will result
    in normal powerpoint slideshow mode progression
    of the slides as opposed to navigating to the
    directed pages.
  • Practicals and self assessment questions to gauge
    your comprehension of a given practical session
    are also provided throughout. Please attempt the
    practicals and the questions on your own before
    resorting to the solutions or answers provided.

3
Pre-session Questions
  • What are nucleic acids?
  • What types of nucleic acids are there?
  • What functions do nucleic acids have?
  • What sort of information do nucleotide sequences
    carry?
  • What can be done with DNA sequences?
  • What can be done with RNA sequences?
  • Is molecular structure important for RNA
    sequences?
  • What is a sequence alignment?
  • What is the relationship of an alignment with
    regard to biological function?
  • Is extracting the encoded information for protein
    synthesis the only sequence analysis which can be
    done?

4
Learning objectives
  • Know the basic chemistry and able to understand
    the diverse functions of nucleic acids.
  • Able to generally list potential analyses for
    nucleic acid sequence data and the applications
    for those analyses based on an understanding of
    the functions of nucleic acids.
  • Able to formulate a strategy and present
    processes involved in the analysis of nucleic
    acid sequence data.
  • Able to comprehend the basic concepts involved in
    sequence alignments in general and aligning
    nucleic acids specifically as well as the
    relationship between an alignment to a sequences
    biological function.

5
Nucleic Acids Chemistry and Molecular Structure
  • What are nucleic acids?

6
Nucleic Acids Chemistry and Molecular Structure
  • What are nucleic acids?
  • Nucleic acids polymer of nucleotides ? 2 types
  • DNA deoxyribonucleic acids
  • RNA ribonucleic acids

7
Nucleic Acids Chemistry and Molecular Structure
  • What are nucleic acids?
  • Nucleic acids polymer of nucleotides ? 2 types
  • DNA deoxyribonucleic acids
  • RNA ribonucleic acids
  • What is a nucleotide?

8
Nucleic Acids Chemistry and Molecular Structure
  • What are nucleic acids?
  • Nucleic acids polymer of nucleotides ? 2 types
  • DNA deoxyribonucleic acids
  • RNA ribonucleic acids
  • What is a nucleotide?
  • Nucleotide nucleoside 1 phosphate group
  • Nucleoside nitrogenous base sugar (ribose)

9
Nucleic Acids Chemistry and Molecular Structure
What is the basic difference between RNA and DNA
(in terms of chemistry)?
RNA
DNA
Click here for animation
10
Nucleic Acids Chemistry and Molecular Structure
How can the nucleotide polymer be represented?
11
Nucleic Acids Chemistry and Molecular Structure
How can the nucleotide polymer be represented?
Hydrogen bonded base interactions and base
stacking interactions result in stable structures
of DNA / RNA.
Seq 1. ACTG Seq 2. TGAC
What can be done with such sequence data? How is
the analysis related to biological function?
12
Nucleic Acids Biological Functions
  • What is/are the function(s) of DNA?
  • What is/are the function(s) of RNA?

13
Nucleic Acids Biological Functions
  • What is the function of DNA?
  • Storage of genetic information
  • Proteins such as transcription factors also
    interact directly with DNA as part of regulatory
    pathways
  • Total genetic content of an organism genome
  • Genes are part of genomes
  • So what is a gene?

14
Nucleic Acids Biological Functions
  • What is the function of DNA?
  • Storage of hereditary information in genes.
  • What is a gene?

While sequencing of the human genome surprised us
with how many protein-coding genes there are, it
did not fundamentally change our perspective on
what a gene is. In contrast, the complex patterns
of dispersed regulation and pervasive
transcription uncovered by the ENCODE project,
together with non-genic conservation and the
abundance of noncoding RNA genes, have challenged
the notion of the gene. To illustrate this, we
review the evolution of operational definitions
of a gene over the past century--from the
abstract elements of heredity of Mendel and
Morgan to the present-day ORFs enumerated in the
sequence databanks. We then summarize the current
ENCODE findings and provide a computational
metaphor for the complexity. Finally, we propose
a tentative update to the definition of a gene A
gene is a union of genomic sequences encoding a
coherent set of potentially overlapping
functional products. Our definition side-steps
the complexities of regulation and transcription
by removing the former altogether from the
definition and arguing that final, functional
gene products (rather than intermediate
transcripts) should be used to group together
entities associated with a single gene. It also
manifests how integral the concept of biological
function is in defining genes.
15
Nucleic Acids Biological Functions
  • What are the functions of RNA?

16
Nucleic Acids Biological Functions
  • What are the functions of RNA?
  • Information storage and transfer
  • Genomes of RNA viruses
  • mRNA
  • Protein synthesis
  • tRNA
  • Peptidyl transferase
  • Catalysis
  • ribozymes
  • Regulatory
  • Small ncRNAs / microRNAs
  • Riboswitches
  • Also see The RNA World hypothesis first coined
    by Walter Gilbert 1986, Nature

17
DNA (Genes) From Sequence to Function
  • How does a gene sequence correlate to biological
    function?

18
DNA (Genes) From Sequence to Function
  • How does a gene sequence correlate to biological
    function?
  • Lets first look at
  • Information about the amino acid sequence is
    contained within the nucleic acids sequence.
  • Is that the only analysis that can be done for
    DNA sequences?
  • What other analyses, if any, can be done for DNA
    sequences?

19
Potential Analyses for DNA Sequences
  • What can be done with DNA sequences?
  • Genome projects DNA sequencing data need to be
    assembled into complete genomes.
  • Genes need to be identified / predicted.
  • Comparisons of specific nucleotide level
    variations.
  • Identification and analysis of specific
    nucleotide sequence level motifs and patterns.
  • Identification and analysis of polymorphisms.

20
Potential Analyses for DNA Sequences
  • What can be done with DNA sequences?
  • Genome projects DNA sequencing data need to be
    assembled into complete genomes.
  • Genome sequencing generate fragments of sequences
    .
  • These fragments need to be assembled into genes,
    chromosomes and finally the complete genome.
  • Assembly is done by analyzing for contiguous
    sequences (contigs).
  • Contigs are basically found by aligning the short
    DNA sequences to one another and finding where
    there are overlaps.
  • More on this topic will be covered in the
    Genomics course in Year 3.
  • After the genome is assembled, the genes need to
    be identified.

21
Potential Analyses for DNA Sequences
  • What can be done with DNA sequences?
  • From sequence data, genes need to be predicted.
  • Several methods to gene prediction
  • Searching by signal analysis of sequence
    signals which specify a gene.
  • Searching by content analysis of regions
    showing compositional bias that has been
    correlated to coding regions.
  • Homology based prediction comparison against
    known gene sequence. involve sequence
    alignments
  • Comparative gene prediction comparing sequences
    of interest against anonymous genomic sequences.
    involve sequence alignments
  • The prediction of eukaryotic genes from genomic
    DNA data is appreciably more difficult than that
    of prokaryotic. Why?

22
Potential Analyses for DNA Sequences
  • What can be done with DNA sequences?
  • From sequence data, genes need to be predicted.
  • Several methods to gene prediction
  • Searching by signal analysis of sequence
    signals which specify a gene.
  • Searching by content analysis of regions
    showing compositional bias that has been
    correlated to coding regions.
  • Homology based prediction comparison against
    known gene sequence. involve sequence
    alignments
  • Comparative gene prediction comparing sequences
    of interest against anonymous genomic sequences.
    involve sequence alignments
  • For this session, we will focus on methods which
    involve sequence alignments.

23
Potential Analyses for DNA Sequences
  • What can be done with DNA sequences?
  • Comparisons of specific nucleotide level
    variations.
  • Enable differentiation at individual level or
    close relationships ie. Between strains of the
    same species.
  • Phylogenetic analysis (discussed by Dr.
    Khairina).

24
Potential Analyses for DNA Sequences
  • What can be done with DNA sequences?
  • Identification and analysis of specific
    nucleotide sequence level motifs, patterns.
  • This will be discussed further in the following
    lecture.
  • Examples
  • PCR Primer design
  • Searching / mapping restriction sites
  • Go to the corresponding BLAST exercise NOW
  • or proceed to the next slide.

25
Potential Analyses for DNA Sequences
  • What can be done with DNA sequences?
  • Identification and analysis of polymorphisms.
  • This will be discussed further in the following
    lecture.
  • Examples
  • SNPs single nucleotide polymorphisms (more on
    SNPs)
  • Go to the corresponding BLAST exercise NOW
  • or proceed to the next slide.

26
Sequence Alignments
  • What is a sequence alignment?
  • A way of arranging or aligning the
    similarities between sequences.
  • Examples
  • Gaps (-) are inserted to optimize alignments.
  • They represent indel mutations.
  • Easy to align short sequences manually. But what
    about longer sequences? How can those be aligned?
    In order to understand this further, lets look
    at a method which we can visualize and track the
    alignment. This method is called a dot plot.

27
Sequence Alignments
  • What is a dot plot?
  • A plot where two sequences are written along the
    top row and leftmost column of a two-dimensional
    matrix and a dot is placed at any point where the
    characters in the appropriate columns match.
  • Parts of the two sequences where the match is
    continuous can be traced as a diagonal line ?
    region where the sequences are aligned.
  • A sequence can be plotted against itself and
    regions that share significant similarities will
    appear as lines off the main diagonal can occur
    when a protein consists of multiple similar
    structural domains.
  • A dot plot is not able to detect divergence or
    substitutions/mutations which we know can occur.

28
Sequence Alignments
  • Dot plot for two DNA sequences
  • Complete the dot plot for the two DNA sequences
    provided below. An example can be seen below.
  • Seq1 CGATCGCGTAATCGGTGATCGGC
  • Seq2 CGGTATCGGTGATCGATCGCA
  • Questions
  • Which stretch of these sequences can best be
    aligned to each other? (Answer)
  • Can this alignment be extended? (Answer)
  • Can you identify a repetitive sequence of 4 bases
    which keep occurring in both sequences? (Answer)

29
Sequence Alignments
  • Dot plot for two DNA sequences
  • Questions
  • 1. Which stretch of these sequences can best be
    aligned to each other?
  • Answer Longest continuous diagonal line from
    your dot plot. ATCGGTGATCG
  • 2. Can this alignment be extended?
  • Answer Yes, it can be extended as shown below. 2
    nucleotides are not aligned and may possibly be
    substitutions.
  • 3. Can you identify a repetitive sequence of 4
    bases which keep occurring in both sequences?
  • Answer ATCG, this can be deduced from the
    repeating short diagonal lines.
  • 4. What can you attribute all the other plotted
    dots to?
  • Answer They are the result of random sequence
    similarities.

30
Computational Sequence Alignments
  • Weve looked at manual alignments for short
    sequences and the dot plot However, manual
    alignments cannot be done for lengthy and highly
    variable sequences. Therefore for long variable
    sequences, computer aided alignments need to be
    done.
  • How can computer aided alignments be done?
  • To enable computer aided alignment, algorithms
    called dynamic programming algorithms are used.
  • Two common dynamic programming algorithms
    approach alignment differently, via
  • 1. Local alignments Smith-Waterman algorithm
  • 2. Global alignments Needleman-Wunsch algorithm

31
Computational Sequence Alignments
  • What is the difference between global and local
    alignments?
  • Local alignments Smith-Waterman algorithm
  • Global alignments Needleman-Wunsch algorithm
  • The Smith-Waterman algorithm is currently the
    most used because real biological sequences are
    usually similar in localized portions and not
    over entire lengths.
  • Examples
  • genes from different organisms with similar
    exons, different intron structures
  • Proteins share only certain domains
  • Alignments can have gaps which represent
    mutations. The ability to add gaps is required as
    sequence diverge.
  • So how do we know that an alignment is
    meaningful?

32
Computational Sequence Alignments
  • How do we know that an alignment is meaningful?
  • Insertions and deletions are slow evolutionary
    processes, therefore addition of gaps MUST be
    controlled to avoid large proportions of matches
    by inserting large numbers of gaps.
  • Gap penalties are given to control addition of
    gaps. The penalty system can be constant or
    proportional.
  • Scores are given for matches, while penalties are
    given for addition of gaps.
  • The alignment algorithm then carries out
    alignments in order to get the best score.
  • Like the dot plot, a simple system as above does
    not seem to fully consider divergence (ie. point
    mutations) only deletions and insertions seem
    to be considered.
  • How can we get around this problem?

33
Computational Sequence Alignments
  • How do we know that an alignment is meaningful?
    (cont.)
  • Point mutations can result in change as opposed
    to deletion or insertion.
  • A matrix called a substitution matrix can be used
    to model the possible changes and provide
    quantitative values to changes arising from point
    mutations.
  • The values for substitution can
  • take into consideration similarity
  • such as physico-chemical
  • properties for amino acids or
  • transition mutations for nucleic
  • acids.
  • But there is still probability
  • that a search result is random
  • especially for large databases.
  • How can we be certain the
  • alignment achieved is the
  • expected result?

Amino acid substitution matrix example
Nucleic acid substitution matrix example
34
Computational Sequence Alignments
  • How do we know that an alignment is meaningful?
    (cont.)
  • How can we be certain the alignment achieved is
    the expected result?
  • The alignments produced are statistically
    evaluated.
  • As an example, for the BLAST program, a value
    called the Expectation (E) value is given.
  • The number of different alignments with scores
    equivalent to or better than S that are expected
    to occur in a database search by chance.
  • The lower the E value, the more significant the
    score.

35
Sequence Alignments
  • What is the rationale in doing an alignment?
  • Proteins perform most cellular functions.
  • The structure of a protein is an important
    determinant of its function.
  • If proteins share a similar structure, then it
    may also share a similar function.
  • We know that sequences with 30 similarity, share
    a similar fold (Chothia Lesk 1986).

36
Sequence Alignments
  • What is the rationale of doing an alignment?
  • If proteins share a similar function, then it may
    also share a similar structure.

37
Sequence Alignments
  • What is the rationale in doing an alignment?
  • If proteins share a similar structure, then it
    may also share a similar sequence.
  • But our interest here are NUCLEIC ACID sequences
  • So what is the relevance?

38
Sequence Alignments
  • What is the rationale in doing an alignment?
  • If proteins share a similar structure, then it
    may also share a similar sequence.
  • But our interest here are NUCLEIC ACID sequences
  • So what is the relevance?

39
Sequence Database Searching
  • What is a sequence database?
  • What are we searching for, and how do we search
    for something in sequence databases?

40
Sequence Database Searching
  • What is a sequence database?
  • A collection of biological macromolecular
    sequences.
  • Can be sequences organized into organisms,
    protein families, sources etc.
  • Has been covered in Topic 2. Example NCBI
    GenBank.
  • What are we searching for, and how do we search
    for something in sequence databases?

41
Sequence Database Searching
  • What is a sequence database?
  • A collection of biological macromolecular
    sequences
  • Can be sequences organized into organisms,
    protein families, sources etc.
  • Has been covered in Topic 2. Example NCBI
    GenBank
  • What are we searching for, and how do we search
    for something in sequence databases?
  • We are searching for sequence similarity.
  • We can search for sequence similarity by
    comparing an input (query) sequence against
    sequences in the database.
  • This comparison is done by aligning the query
    sequences to the database sequences ? one tool we
    can use is BLAST.
  • How is this alignment relevant biologically?

42
Sequence Database Searching
  • What is BLAST?
  • Basic Local Alignment Search Tool
  • Implements heuristics to approximate the
    Smith-Waterman algorithm and search for high
    scoring alignments.
  • The alignment scores are then statistically
    evaluated one example is the E value discussed
    previously.
  • BLAST is actually a family of programs.

43
Sequence Database Searching
  • What is BLAST?
  • Basic Local Alignment Search Tool
  • Implements heuristics to approximate the
    Smith-Waterman algorithm and search for high
    scoring alignments.
  • The alignment scores are then statistically
    evaluated one example is the E value discussed
    previously.
  • BLAST is actually a family of programs.

44
BLAST
  • How do we use BLAST?

45
BLAST
  • How do we use BLAST?

(1) Select the BLAST program (2) Input the
sequence (query) (3) Choose the database to
search (4) Choose optional parameters Then
click BLAST
46
BLAST
  • Is that it?

47
BLAST
  • Is that it?... YES and NO. Lets look at some
    considerations and strategies for BLAST
    searching.

48
BLAST
  • Some considerations and strategies
  • Input sequence and search database what is it
    that youre really interested in? Finding
    similarity alone or identifying homologs? Finding
    homologs only or perhaps trying to find out if
    genes with similar sequences encode for proteins
    with available structures? The answer to these
    types of questions influence the type of search
    program you should use and the database to search
    in.

Protein vs. Nucleotide?
49
BLAST
  • Some considerations and strategies
  • Are you interested in something quite specific?

50
BLAST
  • Some considerations and strategies
  • Did you forget to turn something on/off?
  • Sequence filters Low-complexity regions have
    fewer sequence characters in them because of
    repeats of the same sequence character or
    pattern. These sequences produce artificially
    high-scoring alignments that do not accurately
    convey sequence relationships in sequence
    similarity searches. Regions of low complexity or
    repetitive sequences may be readily visualized in
    a dot matrix analysis of a sequence against
    itself. Low-complexity regions with a repeat
    occurrence of the same residue can appear on the
    matrix as horizontal and vertical rows of dots
    representing repeated matches of one residue
    position in one copy of the sequence against a
    series of the same residue in the second copy.
    Repeats of a sequence pattern appear in the same
    matrix as short diagonals of identity that are
    offset from the main diagonal. Such sequences
    should be excluded from sequence similarity
    searches.

51
BLAST
  • Some considerations and strategies
  • Did you forget to turn something on/off?
  • Options and parameter settings

52
Output of BLAST Searches
  • What are the components of a BLAST search output
  • Example blastn vs blastx (GenBank AF390557)

blastn
This section overview of the output alignments
blastx
53
Output of BLAST Searches
  • What are the components of a BLAST search output
  • Example blastn vs blastx (GenBank AF390557)

blastx
blastn
This section list of hits (alignments) Read
more about interpreting the output.
54
Output of BLAST Searches
  • What are the components of a BLAST search output
  • Example blastn vs blastx (GenBank AF390557)

blastx
blastn
This section the alignments
55
Output of BLAST Searches
  • To be a significant match, a database sequence
    that is listed in the program output should have
    a small E (expect value) and a reasonable
    alignment with the query sequence (or
    translations of protein-encoding DNA sequences
    should have these same features).
  • The E of the alignment score between the
    sequences gives the statistical chance that an
    unrelated sequence in the database or a random
    sequence could have achieved such a score with
    the query sequence, given as many sequences as
    there are in the database. The smaller the E, the
    more significant the alignment. A cutoff value in
    the range of 0.01-0.05 may be used (Pearson
    1996). In genome comparisons, a more stringent
    cutoff score (10-100-10-20) may be used to find
    sequences that align very well with the query
    sequence. However, the alignment should also be
    examined for absence of repeats of the same
    residue or residue pattern because these patterns
    tend to give false high alignment scores.
  • Filtering of low-complexity regions from the
    query sequence in a database search helps to
    reduce the number of false positives. The
    alignment should also be examined for reasonable
    amino acid substitutions and for the appearance
    of a believable alignment.
  • To gain further confidence that the alignment
    between the query and database sequences is
    significant, either the query sequence or the
    matched database sequence may be shuffled many
    times, and each random sequence may be realigned
    with the other unshuffled sequence to obtain a
    score distribution for a set of unrelated
    sequences. This distribution may then be used to
    evaluate the significance of the true alignment
    score.

.
56
BLAST
  • Carrying out a BLAST search
  • Select and copy the sequence from the GenBank
    database here.
  • Go to the BLAST page and carry out database
    searches using the above sequence.
  • First carry out a search against a nucleotide
    database.
  • Which BLAST programs can you use? Name two
    possibilities. (Answer)
  • Next carry out a search against a protein
    database
  • Which BLAST program should you use? (Answer)
  • (i) Can you further narrow down the search? (ii)
    Also take for example if you were to search for
    genes which code for proteins which have
    representative 3D structures how would you
    conduct such a search? (Answer)

57
BLAST
  • Answers to questions on carrying out a BLAST
    search
  • First carry out a search against a nucleotide
    database.
  • Which BLAST programs can you use? Name two
    possibilities.
  • Answer blastn and tblastx. tblastn is not a
    correct answer because it uses a protein query
    although the database searched is a nucleotide
    database the input sequence AF390557 is a DNA
    sequence.
  • Next carry out a search against a protein
    database
  • Which BLAST program should you use?
  • Answer blastx
  • (i) Can you further narrow down the search? (i)
    Also take for example if you were to search for
    genes which code for proteins which have
    representative 3D structures how would you
    conduct such a search?
  • Answer (ii) Yes, searches returning a very large
    number of hits can still be narrowed down. A
    carefully annotated protein sequence database
    (e.g., PIR, SwissProt) will provide a more
    manageable output list of matched sequences, and
    these proteins have probably been observed in the
    laboratory i.e., the genes do produce a protein
    product in cells. However, investigators may also
    wish to expand the search to include predicted
    genes from gene annotations of genomic sequences
    that are frequently entered into the DNA sequence
    translation databases (e.g., DNA sequences in the
    GenBank DNA sequence databases automatically
    translated into protein sequences and placed in
    the GenPept protein sequence database). To
    compare a protein or predicted protein sequence
    to EST sequences, the ESTs should be translated
    into all six possible reading frames. (ii) Such a
    search can be carried out by choosing PDB as the
    database option. This will limit the blastx
    search to only protein sequences which have known
    3D structures in the PDB.

58
BLAST
  • Carrying out a BLAST search
  • Retrieve the sequence provided and use it for
    your BLAST search.
  • See the GenBank page here for the sequence.
    Change the format of the view to FASTA by
    selecting FASTA from the dropdown menu marked
    Display (see here). Use this sequence for a
    BLAST search.
  • Questions
  • - Identify the sequence which is used. What is
    this DNA usually used for? (Answer)
  • - Search for suitable primers to use for PCR.
    Which program can you use? (Answer)
  • - Identify restriction sites which can be found
    on this DNA. How many fragments will a digestion
    with the restriction enzyme BsaI generate? In
    order to answer this question, you will need to
    draw on any general web skills you already have
    to find the appropriate resources. BLAST is not
    the tool to use in such a case. (Answer)

59
BLAST
  • Carrying out a BLAST search
  • Questions
  • - Identify the sequence which is used. What is
    this DNA usually used for?
  • Answer pBR322 plasmid, It is used a cloning
    vector for protein (IG-lambda) expression.
  • - Search for suitable primers to use for PCR.
    Which program can you use? What is the largest
    product size from a possible primer pair found
    using a default search?
  • Answer The Primer-BLAST program can be used.
    The largest possible product is 986bp.
  • - Identify restriction sites which can be found
    on this DNA. How many fragments will a digestion
    with the restriction enzyme BsaI generate?
  • Answer One such tool which can be used is
    NEBcutter. Cutting the pBR322 sequence with BsaI
    will generate 3 fragments of DNA due to cleavage
    at 2 sites in the sequence.

60
SNPs
  • SNPs (pronounced snips) is a DNA sequence
    variation which occurs when a single nucleotide
    A, T, C, or G in the genome (or other shared
    sequence) differs between members of a species
    (or between paired chromosomes in an individual)
    and they comprise the largest known class of
    human genetic variation.
  • SNPs may occur
  • within coding sequences of genes,
  • non-coding regions of genes, or
  • in the intergenic regions between genes.
  • SNPs within a coding sequence will not
    necessarily change the amino acid sequence of the
    protein that is produced, due to degeneracy of
    the genetic code (refer to the codon table
    discussed earlier) such changes result in silent
    mutations (synonymous).
  • Non-synonymous changes can result in
  • Mis-sense change ? different amino acid coded
  • Nonsense change ? premature STOP codon
  • Why are SNPs important? If the changes result in
    non-functional gene products or no gene products,
    a diseased state may be a possible the end
    result.
  • How can we find SNPS? Methods of discovering SNPs
    in sequence data the easiest and most used
    method is to align two sequences from the DNA of
    two individuals and look for high quality
    sequence differences.

.
61
BLAST
  • Carrying out a BLAST search
  • Select and copy the sequence from this link.
  • Go to the BLAST page and carry out a search for
    SNPs on the above sequence.
  • Observe the output. How is it different from
    previous BLAST searches you have carried out.
    Correlate the output to what you know about SNPs.

62
Ribonucleic Acids
  • RNA molecules play crucial roles in molecular
    biology.
  • Known functions include
  • Information storage
  • Catalysis
  • Regulatory roles
  • Protein synthesis
  • Diversity of functions associated to RNA World
    hypothesis
  • Potential applications
  • Molecular scaffolding (nanotechnology)
  • Drug targets (riboswitches/ribosomes)
  • RNA interference (RNAi)

The Economist, June 16th-22nd 2007
63
RNA From Sequence to Function
  • What is a crucial determinant of functionality
    for functional RNAs?

64
RNA From Sequence to Function
  • What is a crucial determinant of functionality
    for functional RNAs?
  • For functional RNAs, like for proteins, the 3D
    structure is crucial for biological function.

65
RNA Structure
  • What are the major factors involved in
    stabilizing the structure of RNA?
  • Base stacking and hydrogen bonding contribute to
    the stabilization of nucleic acid structure/ RNA
    structure.
  • RNA bases can form hydrogen bonds with each other
    resulting in interactions between
  • complementary pairings in the canonical Watson
    Crick interactions
  • non-canonical interactions
  • Hydrogen bonded base interactions are therefore
    are crucial elements of a nucleic acids 3D
    structure.

66
RNA Base Interactions

32 pairs
eg. Purine-pyrimidine base pairs (10)
after I. Tinoco, Jr. In Appendix 1 of The RNA
World (R. F. Gesteland, J. F. Atkins, Eds.),
Cold Spring Harbor Laboratory Press, 1993, pp.
603-607.
67
RNA Structure
  • Base stacking and hydrogen bonding contribute to
    the stabilization of nucleic acid structure/ RNA
    structure.
  • RNA bases can form hydrogen bonds with each other
    resulting in interactions between
  • complementary pairings in the canonical Watson
    Crick interactions
  • non-canonical interactions
  • Hydrogen bonded base interactions are therefore
    are crucial elements of a nucleic acids 3D
    structure
  • 3 levels of RNA structure
  • Primary sequence, secondary structure, tertiary
    structure.

from the Arabic word Qanun which in context
here is better suited as the word rule as
opposed to the literal meaning of law.
68
RNA Structure
  • How do we get from sequence to structure?
  • How can we predict the structure of RNA?

69
RNA Structure
  • How do we get from sequence to structure?
  • Complex (non helical) RNA structures are not easy
    to predict. Reliable structural information are
    sourced from X-ray crystal structures.
  • Commonly, only the secondary structure level
    interactions are predicted to give some insights
    into what the functional structure may look like.
  • However such methods lack the detail which an
    actual structure model is able to give, such as
    the exact orientation of bases and specific
    atomic interactions which are occurring.
  • Such interaction data is important because we
    know that RNA bases can be involved in
    non-canonical interactions which are different
    from the canonical Watson-Crick interactions.
  • How can we predict the secondary structure of
    RNA?
  • Several programs which calculate the
    thermodynamics of folding (energies of the base
    interactions) can be used.
  • One such program is mfold by Michael Zuker.
  • Assessment of reliability can be done using
    multiple alignments and comparisons to other
    predictions and known structures.

70
RNA Secondary Structure Prediction the mfold
program
  • Predicting the secondary structure of non-coding
    RNA
  • Copy the sequence here as input for the mfold
    program. All other parameters can be left at
    default settings.
  • Questions
  • How many paired bases are you able to observe in
    the predicted structure? (Answer)
  • How many bases are unpaired? (Answer)
  • Name the two types of structures where these
    unpaired bases can be found. What type of
    secondary structure do you think can be observed
    for regions with canonical Watson-Crick base
    pairing? (Answer)
  • Are you able to observe any base pairings which
    are non-canonical (non Watson-Crick)? If yes, how
    many? (Answer)
  • Having answered the previous two questions, are
    you really able to differentiate a canonical vs a
    non-canonical pairing from the secondary
    structure diagram alone? (Answer)

71
RNA Secondary Structure Prediction the mfold
program
  • Predicting the secondary structure of non-coding
    RNA
  • How many paired bases are you able to observe in
    the predicted structure?
  • Answer 29 pairs, 58 paired bases.
  • How many bases are unpaired?
  • Answer 27
  • Name the two types of structures where these
    unpaired bases can be found. What type of
    secondary structure do you think can be observed
    for regions with canonical Watson-Crick base
    pairing?
  • Answer Unpaired bases are found in bulges and
    loops. Regions with canonical pairings as in
    Watson-Crick are most likely helical.
  • Are you able to observe any base pairings which
    are non-canonical (non Watson-Crick)? If yes, how
    many?
  • Answer 4
  • Having answered the previous two questions, are
    you really able to differentiate a canonical vs a
    non-canonical pairing from the secondary
    structure diagram alone?
  • Answer No, not really. Although a GU base pair
    is obviously non-canonical, GC and AU base pairs
    which may possibly be non-canonical cannot be
    determined from the secondary structure alone.

72
Analyses for RNA sequence data
  • Is predicting the secondary structure the only
    analyses we can do for RNA sequence data?

73
Analyses for RNA sequence data
  • Is predicting the secondary structure the only
    analyses we can do for RNA sequence data?
  • NO.
  • Genomic data can be analysed for the presence of
    the numerous types of known non-coding or
    functional RNA as well as possibly novel or yet
    to be discovered functional RNA sequences.
  • This appreciably more difficult than the problem
    of predicting genes. Why?
  • Currently there are no widely used or general use
    methods.
  • Such investigations are still highly exploratory
    and currently remain in the domain of experts in
    the field.

74
Post-session Questions
  • What are nucleic acids?
  • What types of nucleic acids are there?
  • What functions do nucleic acids have?
  • What sort of information do nucleotide sequences
    carry?
  • What can be done with DNA sequences?
  • What can be done with RNA sequences?
  • Is molecular structure important for RNA
    sequences?
  • What is a sequence alignment?
  • What is the relationship of an alignment with
    regard to biological function?
  • Is extracting the encoded information for protein
    synthesis the only sequence analysis which can be
    done?

75
Self Study and Self Assessment
  • The self study module for this series of lectures
    on analyses of nucleotide sequences are available
    for download from SPIN. Format of the file (this
    file) is powerpoint show (.pps).
  • The self assessment quiz is accessible from
    within the SPIN interface.
  • Both these materials are for self assessment and
    self study use and DOES NOT contribute to your
    final grades for this course.
  • Also explore the references and texts listed in
    the course information file and reading list.
  • Explore resources made available via the
    self-study material.

76
Further Reading
  • Recommended Textbook (Lesk, 2nd Ed.)
  • Basics Chapter 1
  • Pages 1-59
  • Sequence alignments Chapter 5, Chapter 1
  • Pages 242-270
  • Pages 21-59
  • Other Textbooks
  • Baxevanis Oullette, 3rd edition
  • Chapters 5-7
  • Pevsner
Write a Comment
User Comments (0)
About PowerShow.com