Genome Databases and Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Genome Databases and Analysis

Description:

Genome Databases and Analysis With the advent of the genome sequencing technology, biological research has now easy and fast access to the complete DNA sequences of ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 30
Provided by: IITB8
Category:

less

Transcript and Presenter's Notes

Title: Genome Databases and Analysis


1
Genome Databases and Analysis
With the advent of the genome sequencing
technology, biological research has now easy and
fast access to the complete DNA sequences of many
organisms. This DNA sequence information, when
stored with the help of databases, can be used
for comparative genomics research.
  • Surabhi Agarwal

2
Master Layout Part 1
1
This animation consists of 3 parts Part 1
Genome sequencing protocol Part 2 Genome
databases Part 3 Genome alignment and its
analysis
Extract the DNA of the organism whose genome is
to be sequenced
2
Fragment the genomic DNA and integrate it with
Bacterial Artificial Chromosome vectors
Genomic DNA
3
Sequence the BAC fragments using DNA sequencing
techniques
DNA fragments
4
GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGA
GAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTC
ATCAAAAACAGAAATGGACAGCAGGACAAAGAGCAAGGATTACTGCAAAG
TAATATTTCCATATGAGGCACAGAATGATGATGAATTGACAATCAAAGAA
GGAGATATAGTCACTCTCATCAATAAGGACTGCATCGACGTAGGCTGGTG
GGAAGGAGAGCTGAACGGCAGACGAGGCGTGTTCCCCGATAACTTCGTGA
AGTTACTTCCACCGGACTTTGAAAAGGAAGGGAATAGACCCAAGAAGCCA
CCGCC
Protein sequences determined and stored in
databases for future usage
Generate a detailed physical map of the genome
with clones derived from each chromosome
organized in a series of contigs
5
Completed DNA sequence
3
Definitions of the componentsPart 1 Genome
sequencing protocol
1
1. Genome The complete hereditary information of
an organism is referred to as the genome. 2.
Restriction Enzyme An enzyme that cleaves
double-stranded or single-stranded DNA into
smaller fragments at specific recognizable DNA
sequences called restriction sites. 3.
Bacterial Artificial Chromosomes (BACs)
Bacterial Artificial Chromosomes are DNA
constructs that are useful for cloning purposes.
These cloning vectors can carry DNA inserts of
around 150-350 kbp and have been extremely useful
in genome sequencing projects carried out.
4.Sticky Ends A cohesive or sticky end of DNA
refers to those DNA molecules having a 3 or 5
overhang region after they have been cleaved by
the restriction enzyme. These overhangs possess
nucleotide sequences that are complementary to
the cloning vector and can therefore easily
anneal with the cloning vector. 5. DNA ligase
Enzyme that is involved in repairing or joining
single stranded breaks or discontinuities in
double stranded DNA. 6. Recombinant BAC Those
BAC vectors that possess the recombinant DNA,
i.e., plasmid DNA integrated with the foreign DNA
to be cloned.
2
3
4
5
4
Definitions of the componentsPart 1 Genome
sequencing protocol
1
7. Contigs Set of over-lapping common DNA
fragments that are derived from a single genetic
source. These contigs are mapped to deduce the
complete chromosome sequence. 8. Shotgun
Sequencing A DNA sequencing method in which a
long DNA fragment is first broken down into
smaller fragments. Each small fragment is then
sequenced using established DNA sequencing
protocols such as Sangers chain termination
method. This fragmentation-sequencing protocol is
repeated several times with enzymes of different
specificities to obtain multiple reads.
Overlapping ends of different reads are then
arranged using automated computerized
programs. 9. Pyrosequencing A DNA sequencing
strategy that makes use of the real time
detection of pyrophosphate generated by the
addition of a nucleotide to a growing DNA strand
based on its corresponding DNA template. 10.
Physical Map Maps that provide the DNA-base pair
distances from one nucleotide to another are
known as physical maps.
2
3
4
5
5
Step 1 Genome Sequencing
1
v
DNA DIGESTED INTO FRAGMENTS
DNA FRAGMENTS WITH APPROPRIATE STICKY ENDS
GENOMIC DNA
2
RECOMBINANT BAC
E. coli CELLS
DNA LIGASE


3
ELECTROPORATION
BAC VECTOR TREATED WITH SAME RESTRICTION
ENDONUCLEASE
RESTRICTION ENZYME
TRANSFORMED E. coli CELLS
4
Action
Audio Narration
Description of the action
The genomic DNA is cleaved into fragments by
restriction enzymes that cut the DNA at specific
sequences known as restriction sites. The genomic
DNA breaks into smaller fragments . The BAC
vector is cleaved at its restriction site using
the same restriction endonuclease. The DNA
fragment having suitable sticky ends is then
integrated with the BAC vector and annealed using
DNA ligase. This recombinant DNA is then
incorporated into bacterial cells such as E. coli.
Sequential steps of an Experimental Process
Follow the steps in the animation. Animator needs
to re-draw all figures in the final animation.
The pink curve Restriction Enzyme is shown to
attach to the Genomic DNA. Show the Genomic
DNA getting broken into fragments. Chose one
fragment and attach it to ring shaped figure
BAC Vector. The attached figure integrates to
The cell. Follow it with the last figure.
5
Biochemistry by A.L.Lehninger et al., 3rd edition
6
Step 2 Genome Sequencing
1
CONTIGS ARE IDETIFIED AND MAPPED
GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGA
GAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTC
ATCAAAAACAG
2
BAC TO BE SEQUENCED IS FRAGMENTED
SEQUENCE OVERLAPS REVEAL FINAL SEQUENCE
3
FRAGMENTS ARE SEQUENCED AT RANDOM
Action
Audio Narration
4
Description of the action
The genomic DNA fragments of the library are then
organized into a physical map and aligned as
contigs, after which a particular contig is
identified for further sequencing. The BAC
selected for sequencing is fragmented and then
subjected to methodologies such as Sangers
method and pyrosequencing. The sequence of the
clone is then deduced by aligning them based on
their overlapping regions. The entire genomic
sequence is then obtained once each BAC is
sequenced in this manner. For a detailed study of
the various methods of sequencing , refer to the
OSCAR animation titled Genomics
Sequential steps of an Experimental Process
Show figure 1 which has a cluster of aligned
fragments. Select one fragment and break it
further into smaller units. Show the unit getting
sequenced by one of the 3 technologies. This is
followed by a the nest figure which shows the
overlap in sequences. This is followed by the
fully sequenced peptide
5
Biochemistry by A.L.Lehninger et al., 3rd edition
7
Master Layout Part 2
1
This animation consists of 3 parts Part 1
Genome sequencing protocol Part 2 Genome
Databases Part 3 Genome alignment and its
analysis
GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGA
GAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTC
ATCAAAAACAGAAATGGACAGCAGGACAAAGAGCAAGGATTACTGCAAAG
TAATATTTCCATATGAGGCACAGAATGATGATGAATTGACAATCAAAGAA
GGAGATATAGTCACTCTCATCAATAAGGACTGCATCGACGTAGGCTGGTG
GGAAGGAGAGCTGAACGGCAGACGAGGCGTGTTCCCCGATAACTTCGTGA
AGTTACTTCCACCGGACTTTGAAAAGGAAGGGAATAGACCCAAGAAGCCA
CCGCC
2
Establish, Maintain and Disseminate the Genomic
data of various organisms
3
4
Organization, Search and Retrieval of Genomic Data
5
8
Definitions of the componentsPart 2 Genome
databases
1
  • Nucleotide Database A collection of records of
    the nucleotide sequences that are related to the
    DNA of an organism. This includes gene sequences,
    genome sequences, Expression Sequence Tags (EST)
    etc.
  • Accession Number This is a unique identification
    number that is given to each of the sequence
    entries in biological databases that provide easy
    access directly to the sequence of interest.
    These accession numbers are modified every time
    the sequence gets updated. The identifiers also
    vary with each database.
  • ENTREZ It is an integrated search portal that
    has features which enable the user to search many
    distinct biological databases simultaneously.
  • International Nucleotide Sequence Database
    Collaboration (INSDC) INSDC is an International
    collaboration that has been established for
    exchanging and sharing Nucleotide Sequence Data.
    This includes collection and dissemination of all
    DNA and RNA sequences generated by the members of
    INSDC.
  • Word length The minimum length of the initial
    set of nucleotides, which needs to be matched
    completely, before alignment extension of the two
    sequences can be initiated. Sensitivity and speed
    of the search can be regulated by increasing or
    decreasing the word-size.
  • Threshold It refers to the expected number of
    matches between nucleotide bases that can occur
    by chance. The statistical significance of the
    results can be judged based on this parameter.
    The default value for most cases are 10, which
    implies that in a random model, 10 such matches
    are expected to be found merely by chance.

2
3
4
5
9
Definitions of the componentsPart 2 Genome
databases
1
  • Gap Penalty and Gap Extension During an
    alignment of two or more given nucleotide
    sequences, a gap is introduced wherever a base
    mismatch occurs. In this context, Gap penalty
    refers to a deduction in the overall alignment
    score on introduction of a gap while the Gap
    Extension is for extending an already existing
    gap.
  • Alignment Score This is also referred to as the
    Bit Score and provides a comparative
    quantification of the quality of alignment. The
    score, increases when a higher number of residue
    matches and lower number of mismatches are
    encountered. The alignment having a higher bit
    score is a better match.
  • Match-Mismatch Scores During alignment of
    nucleotide sequences, the scoring system used
    adds a Reward score for matching bases and
    subtracts a Penalty score for mismatching
    bases. These scores are represented as pairs of
    values in the BLAST algorithm.
  • Percentage Identity This indicates the
    percentage of nucleotide bases that are an
    identical match to each other during the
    comparison of two sequences.
  • E-value E-value provides a quantification of any
    chance alignment between two or more sequences
    instead of them being a biologically significant
    match. For similarity match against a database,
    this value is dependant on the size of the
    database against which the sequence is compared.
    The closer the e-value is to zero, the higher is
    the biological significance of the match.

2
3
4
5
10
Step 1.a Submit a Sequence
1
NUCLEOTIDE DATABASE
2
SUBMIT A SEQUENCE
SEARCH THE DATABASE
ANALYSIS TOOLS
Submit your sequence here
3
VERIFYING
Albumin_S TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAA
GTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCA
GGGGTGTGTTTCGTCGAGATACACGTAAGAATTCTAGTTTTCAATTGTTC
AACTTTTCTTTCCTAGACAAGAATGTTTCAGTAAACCTTGAATCATTAAT
GA
SUBMIT
4
Action
Audio Narration
Description of the action
To submit a sequence in a nucleotide database, it
must be entered in any one of the sites of the
members of the International Nucleotide Sequence
Database Collaboration consisting of NCBI,
EBI-EMBL and DDBJ. Upon submission, these entries
are verified for their source of retrieval and
publication details.
Entries in Web-server
Animator needs to Re-draw all the images. Show
the layout of the database at first. Then show
clicking effect on SUBMIT A SEQUENCE. Show the
input of sequence in white box followed by
Clicking effect on SUBMIT. Show the
VERIFYING sign in a waiting mode. While the
Verifying goes on, show the diagram on the next
slide.
5
11
Step 1.b Submit a Sequence
1
VERIFYING
Information exchange between databases takes
place for verification process.
National Centre for Biotechnology Information
2
NCBI
3
European Bioinformatics Institute
DNA Data bank of Japan
EBI
DDBJ
4
Action
Audio Narration
Description of the action
Entries in Web-server
The newly entered sequences are exchanged
between the three member servers on a daily basis
and verified by them. This helps in keeping track
of the updates in sequencing information and
sharing data that is useful for research.
Re-create all the images and screen-shots. This
is the image that flashes in front of the screen
after the previous slide while the VERIFYING
button is in the wait mode.
5
12
Step 1.c Submit a Sequence
1
NUCLEOTIDE DATABASE
2
SUBMIT A SEQUENCE
SEARCH THE DATABASE
ANALYSIS TOOLS
Nucleotide Sequence Database
Submit your sequence here
Albumin TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGT
GGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGG
GGTGTGTTTCGTCGAGATACACGTAAGAATTCTAGTTTTCAATTGTTCAA
CTTTTCTTTCCTAGACAAGAATGTTTCAGTAAACCTTGAATCATTAATGA
iInsulin A ID SSSSSSG1 ACGTAAGAATTCTAGTTTTCAATTGTT
CAACTTTTCTTTCCTAG Insulin B IDSG2 CTTTGGCACAATGAA
GTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCC
Albumin IDSG3 TATCTTTTCTATCAACCCCACAAAACTTTGGCACA
ATGAAGTGGGTGACTTTTATTT
VERIFIED
3
SUBMIT
4
Action
Audio Narration
Description of the action
The verified sequence is then given an accession
number or a gene ID, which acts as the primary
key for identifying this entry in the database in
future.
Entries in Web-server
Re-create all the images. The right most part of
this screen is appended to the screen on 10th
slide after the flash image of the 11th slide
disappears.
5
13
Step 2. Search Database
1
NUCLEOTIDE DATABASE
SUBMIT A SEQUENCE
SEARCH THE DATABASE
ANALYSIS TOOLS
2
Submit your query term
Select the Database
Nucleotide Sequence Database
NUCLEOTIDE
Serum Albumin
Albumin LOCUS 9291 LENGTH 24158 ORGANISM Homo
Sapiens GENE NAME ALB LOCATION Chromosome
4 JOURNAL Journal of Science SEQUENCE TATCTTTTCT
ATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTT
GENE GENOME EST SNP NUCLEOTIDE GEO DATASETS
SUBMIT
3
Gene Expression Omnibus repository stores the
curated gene expression DataSets as well as
original Series and Platform records.
Searches the term in the whole genome profiles.
These genomes are divided into 6 organism groups
Searches the database of Single Nucleotide
Polymorphisms
Selects the term in the set of genes stored in
the database
Contains sequences of Expressed Sequence Tags
or single-pass cDNA sequences
Collection of all nucleotide sequences from a
variety of sources.
Action
Audio Narration
4
Description of the action
Retrieval from Web-server
  • To search the database for a given gene, genome
    or nucleotide, the user can enter the query term
    in the search box. The query term can be the gene
    name or identifier for the gene. The user needs
    to select the database from which sequence has
    to be retrieved. These databases include
  • Gene ltNarrate content in the yellow boxgt
  • Genome ltNarrate content in the yellow boxgt
  • EST ltNarrate content in the yellow boxgt
  • SNP ltNarrate content in the yellow boxgt
  • NUCLEOTIDE ltNarrate content in the yellow boxgt
  • GEO DATASETS ltNarrate content in the yellow boxgt
  • Once the user clicks on SUBMIT, the nucleotide
    sequence is shown along with a summary of the
    result.

Re-create all the images and screen-shots. Yellow
boxes are the audio narration for each section.
Do not display the yellow box as they are not a
part of the database animation. The content of
yellow boxes needs to be narrated as mentioned in
the audio narration. Follow the steps as shown in
the animation
5
14
Step 3.a Analysis Tools - Nucleotide Sequence
Identification
1
NUCLEOTIDE DATABASE
SUBMIT A SEQUENCE
SEARCH THE DATABASE
ANALYSIS TOOLS
2
Enter sequence 1
28
Word Size
TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCA
ATGCTTATATACCCAAAAAT ATTTTACAATTA
Threshold
10
3
SELECT A DATABASE
NUCLEOTIDE
Existence 5, Extension 2
Gap penalty
NUCLEOTIDE GENE GEO EST SNP
1, -2
Match-Mismatch Score
ALIGNMENT ALGORITHM (BLAST)
4
Action
Audio Narration
Description of the action
Re-create all the images and screen-shots. Follow
the steps as shown in animation. Show the click
on Analysis Tools. Follow it with input of the
sequence and selection of Nucleotide against
the downlink, SELECT DATABASE. Follow it with
input of rest of the parameters. Show clicking
effect on BLAST tool
An unknown nucleotide sequence can be identified
by searching it against a suitable nucleotide
database. Input the sequence, and then select the
database against which the match search is to be
performed. Fill the parameter values and then
click on the blast tool.
Analysis from database servers
5
15
Step 3.b Analysis Tools - Nucleotide Sequence
Identification
1
2
Identifies name of the gene and the type of
nucleotide
Shows the alignment of the query nucleotide with
the sequence of the identified nucleotides
NUCLEOTIDE
Shows the alignment of the two sequences by
chance event. Nearer this value is to 0, more is
the biological significance of the match
Bit score for alignment which is a normalized
measure to compare scores with other hits
Shows the number of bases that matched in the
query sequence and the hit
3
Percentage of residues substituted by a Gap
IDENTIFICATION OF GENE
SEQUENCE ALIGNMENT
ALIGNMENT SCORE
Query Start Position
Query End Position
437 bits
E-Value
Gaps
2e-118
Percentage Identity
Homo sapiens afamin (AFM), mRNA
Subject End Position
Subject Start Position
0
100
4
Action
Audio Narration
Description of the action
Re-create all the images and screen-shots. Follow
the steps as shown in animation. Each output must
be displayed separately.
Sequence identification through BLAST provides
various results after alignment such as
identification, alignment views, alignment score,
e-value, percentage identity and gaps.
Analysis from database servers
5
16
Step 3.c Analysis Tools - Nucleotide Sequence
Alignment
1
NUCLEOTIDE DATABASE
SUBMIT A SEQUENCE
SEARCH THE DATABASE
ANALYSIS TOOLS
2
Enter sequence 1
3
Word Size
TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCA
ATGCTTATATACCCAAAAAT ATTTTACAATTA
Threshold
10
3
Gap penalty
Enter sequence 2
Existence 11, Extension 1
AGTATATTAGTGCTAATTTCCCTCCGTTTGTCCTAGCTTTTCTCTTCTGT
CAACCCCACACGCCTTTGGCACAATGAAGTGGGTAACCTTTATTTCCCTT
C
1, -2
Match-Mismatch Score
ALIGNMENT ALGORITHM (BLAST)
4
Action
Audio Narration
Description of the action
Analysis from database servers
Re-create all the images and screen-shots. Follow
the steps as shown in animation. Show the
clicking effect on the 3rd tab ANALYSIS TOOLS.
Input 2 sequences one-bye one. Follow this by
inputting the parameters one at a time. Show the
clicking effect n BLAST tool
Alignment can also be performed between two given
nucleotide sequences. To align two sequences,
enter them in the input boxes. Enter the
necessary parameters, whose values will vary
according to query. Then click on the alignment
tool.
5
17
Step 3.d Analysis Tools - Nucleotide Sequence
Alignment
1
2
The figure for alignment of the 2 sequences
3
Measure for alignment occurring by chance event
Percentage of residues substituted by Gaps
Comparative Measure for quality of alignment
Gap
Alignment
E-value
Gaps
Score
1e-64
5
241 bits
4
Action
Audio Narration
Description of the action
Pair-wise alignment gives various kinds of
results after alignment. These are alignment
views, alignment score, dot-plot, e-value,
percentage identity amongst many theirs.
Analysis from database servers
Re-create all the images and screen-shots. Follow
the steps as shown in animation. Each output must
be displayed separately along with its definition
box.
5
18
Master Layout Part 3
1
This animation consists of 3 parts Part 1
Genome sequencing protocol Part 2 Genome
Databases Part 3 Genome alignment and its
analysis
2
Tools available for comparing two genomes
3
Applications of comparing genomes
Seq 1
4
Seq 2
Seq 3
5
19
Definitions of the componentsPart 3 Genome
alignment and its analysis
1
  • Orthologs A single identical gene that is
    present in two different species, are known as
    orthologs.
  • Paralogs Paralogs refers to two genes present in
    a single organism, of which one of them is
    produced by the duplication of the other but has
    gathered several mutations such that it performs
    separate functions.
  • Homologs Genes that have a common origin in
    evolution and perform similar functions are
    called homologs.
  • Gene Order Gene order refers to the sequential
    arrangement of genes within an organisms genome.
  • Gene Cluster Genes that are involved in a common
    functional aspect of the organism, tend to
    cluster together and are referred to as gene
    clusters. For example, genes related to certain
    metabolic pathways.
  • GC-content The GC content is a measure of the
    number of Guanine and Cytosine bases in the
    genome of an organism and provides a useful
    method to compare two given genomes.

2
3
4
5
20
Definitions of the componentsPart 3 Genome
alignment and its analysis
1
  • 7. Basic Local Alignment Search Tool (BLAST)
    This is the algorithm that is used to compare two
    given sequences or one sequence against a
    database. The BLAST version used for nucleotide
    comparison is Nucleotide Blast.
  • 8. FAST-All (FASTA) This is another
    algorithm that was also developed to compare two
    given sequences based on gapped local alignments.
  • 9. Phylogeny The study of evolutionary
    relatedness of different groups of organisms
    which is analyzed from molecular sequencing data
    such as nucleotide and protein data.

2
3
4
5
21
Step 1 Comparative genomics - Tools
1
Server for aligning two genomes and producing
Percent Identity Plots. It uses a version of
BLAST, which has modified parameters for
aligning entire genomes, known as BLASTZ.
Tool for rapidly aligning whole genomes.
2
Web-based interactive computational tool to
compare the order of genes in two genomes
3
Developed at MIT, GenScan is an online program to
identify complete gene structures in genomic DNA.
AVID-VISTA is collection of programs and
databases for comparative analysis of genomic
sequences. VISTA also has pre-computed
whole-genome alignments of different species.
4
Twinscan is a web-based tool by Washington
University for gene-structure prediction.
5
22
Step 1 Comparative genomics - Tools
1
Action
Audio Narration
Description of the action
Schematic for Comparative Genomics Tools
Re-create all the images and screen-shots. Follow
the steps as shown in animation. Replace the
Comparative Genomics Tools In the previous
slides, with these tabs one by by and narrate the
explanation given on this slide
  • Here, we present a brief summary of comparative
    genomic tools
  • PipMaker Server for aligning two genomes and
    producing Percent Identity Plots. It uses a
    version of BLAST, which has modified parameters
    for aligning entire genomes, known as BLASTZ.
  • GenScan - Developed at MIT, GenScan is an online
    program to identify complete gene structures in
    genomic DNA.
  • Twinscan -Twinscan is a web-based tool by
    Washington University for gene-structure
    prediction.
  • AVID - AVID-VISTA is collection of programs and
    databases for comparative analysis of genomic
    sequences. VISTA also has pre-computed
    whole-genome alignments of different species.
  • GeneOrder Web-based interactive computational
    tool to compare the order of genes in two genomes
  • MUMmer is - Tool for rapidly aligning whole
    genomes.

2
3
4
  • http//genes.mit.edu/GENSCAN.html
  • http//mblab.wustl.edu/software/twinscan/
  • http//mummer.sourceforge.net/
  • http//genome.lbl.gov/vista/index.shtml
  • http//pipmaker.bx.psu.edu/pipmaker/
  • http//binf.gmu.edu8080/GeneOrder3.0/

5
23
Step 2 Comparative genomics
1
NUCLEOTIDE DATABASE
SUBMIT A SEQUENCE
SEARCH THE DATABASE
ANALYSIS TOOLS
2
Enter genome 1
Enter genome 2
AGTATATTAGTGCTAATTTCCCTCCGTTTGTCCTAGCTTTTCTCTTCTGT
CAACCCCACACGCCTTTGGCACAATGAAGTGGGTAACCTTTATTTCCCTT
C
TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCA
ATGCTTATATACCCAAAAAT ATTTTACAATTA
3
COMPARITIVE GENOMICS TOOLS
4
Action
Audio Narration
Description of the action
Analysis from database servers
Re-create all the images and screen-shots. Follow
the steps as shown in animation. Show input of
two sequences and then follow it up with the
clicking effect on Comparative Genomics Tools
For comparative genome analysis, extract the full
genome sequences of interest. The servers of the
comparative genomics tools have text boxes to
upload these sequences. Thereafter, user needs to
click on the submit button for the tool .
5
24
Step 3 Comparative genomics Analysis
1
2
Percent Identity Plot is the visualization of
the alignments retrieved from PipMaker for
similar regions in two DNA sequences.
The entire genomes of two organisms can be
aligned using tools such as MUMmer and AVID-VISTA
Comparative Genomics Tools can also predict Exons
and Introns on the aligned genomes
3
WHOLE GENOME ALIGNMENT
PREDICTED EXONS AND INTRONS
PipMaker dot plot
4
Action
Audio Narration
Description of the action
Schematic for Tool Output
Re-create all the images and screen-shots. Follow
the steps as shown in animation. Each output must
be displayed separately along with the definition
box.
The output of the various comparative genomics
tools varies with the type of tool used. This may
be Dot-Plot from PipMaker, whole genome alignment
and predicted exons and introns in the alignment.
For detailed analysis of these results, users
must visit the respective sites as mentioned in
references.
5
  • http//genes.mit.edu/GENSCAN.html,
    http//mblab.wustl.edu/software/twinscan/,http//m
    ummer.sourceforge.net/
  • http//genome.lbl.gov/vista/index.shtml,
    http//pipmaker.bx.psu.edu/pipmaker/,
    http//binf.gmu.edu8080/GeneOrder3.0/

25
Interactivity option 1 Align the genomes of
Potato Spindle Tuber Viroid and Hop Latent
Viroid sequences
1
In the options to select the databases, opt for
Genome databases 2/3
Click on the GenBank ID for the two organisms
under study 6
Input the 2 genomes in any genome alignment
server of your choice 8
2
Open the NCBI Homepage on a web-browser 1
In the summary section, check for the source
organism of the sequence 5
Obtain the alignment of the two genomes 9
3
Enter the term Viroids in the search box
3/2
Click on the search button. Obtain a list of
completely sequenced Viroids 4
Click on the FASTA tab for the respective
entries. Obtain the complete genome sequence in
FASTA format 7
4
Results
Boundary/limits
Interactivity Type Options
Remove the step number mentioned in red from
the bottom of the tab. Show all the steps in the
mixed order. The user must click on the tabs
order wise. If the user clicks at a tab which is
not in the right order, then flash a message
saying try again
All the tabs must be arranged in right order.
Arrange the steps in the order to be performed
Step 2 and 3 can be permitted in either order
5
26
Questionnaire
1
1. Which amongst these is NOT a nucleotide
database? Answers a) NCBI b)PDB c) EMBL d)
DDBJ 2. PipMaker compares the two genomes by
finding? Answers a) Gene Order b) Cluster of
Orthologous Genes c) Percent Identity Plots
d) All of the Above 3. Which is the tool
for Whole Genome Alignment? Answers a) MUMmer
b) PpMaker c) Both d) Neither 4. Exons can be
predicted using which tool? Answers a) Genscan
b) FASTA c) BLAST d) None of the above 5. Which
is the last step in PCR reaction? Answers a)
Annealing b) Elongation c) Denaturation d) None
of the above
2
3
4
5
27
Links for further reading
  • Reference websites
  • http//www.ebi.ac.uk/embl/
  • http//www.ddbj.nig.ac.jp/
  • http//www.ncbi.nlm.nih.gov
  • http//blast.ncbi.nlm.nih.gov/
  • www.icgeb.res.in/whotdr/presentation/comp-genomics
    .ppt
  • http//genome.crg.es/software/sgp2/
  • http//genes.mit.edu/GENSCAN.html
  • http//mblab.wustl.edu/software/twinscan/
  • http//mummer.sourceforge.net/
  • http//genome.lbl.gov/vista/index.shtml
  • http//pipmaker.bx.psu.edu/pipmaker/
  • http//binf.gmu.edu8080/GeneOrder3.0/

28
Links for further reading
  • Following URLs are used for animations
  • http//genes.mit.edu/GENSCAN.html
  • http//mblab.wustl.edu/software/twinscan/
  • http//mummer.sourceforge.net/
  • http//genome.lbl.gov/vista/index.shtml
  • http//pipmaker.bx.psu.edu/pipmaker/
  • http//binf.gmu.edu8080/GeneOrder3.0/
  • http//www.ebi.ac.uk/embl/
  • http//www.ddbj.nig.ac.jp/
  • http//www.ncbi.nlm.nih.gov
  • http//blast.ncbi.nlm.nih.gov/

29
Links for further reading
  • Books
  • Bioinformatics-Sequence and Genome Analysis y
    David. W. Mount
  • Biochemistry by A.L.Lehninger et al., 3rd edition
Write a Comment
User Comments (0)
About PowerShow.com