Title: What is bioinformatics?
1What is bioinformatics?
- Long Definition The study of the application of
computer and statistical techniques to the
management of biological information, including
development of methods to search databases
quickly, to analyze DNA sequence information, and
to predict protein sequence and structure from
DNA sequence data. - Short Definition The management, analysis, and
visualization of molecular, cellular, and genomic
information.
2Computational Biology
Molecular Biology
Bioinformatics
Genomics
Computer Science
3- Genomics-what is it?
- Development and application of genetic mapping,
sequencing, and computation (bioinformatics) to
analyze the genomes of organisms. - Sub-fields of genomics
- Structural genomics-genetic and physical mapping
of genomes. - Functional genomics-analysis of gene function
(and non-genes). - Comparative genomics-comparison of genomes across
species. - Includes structural and functional genomics.
- Evolutionary genomics.
4COMPARATIVE GENOMICS
5Definition
- A comparison of gene numbers , gene locations
biological functions of gene, in the genomes of
different organisms, one objective being to
identify groups of genes that play a unique
biological role in a particular organism.
6Few Terminologies
- Homology - Homology is the relationship of any
two characters ( such as two proteins that have
similar sequences ) that have descended, usually
through divergence, from a common ancestral
character. Homologues are thus components or
characters (such as genes/proteins with similar
sequences) that can be attributed to a common
ancestor of the two organisms during evolution.
7Homologoues can either be orthologues, paralogues
or xenologues.
- Orthologues are homologues that have evolved from
a common ancestral gene by speciation. They
usually have similar functions. - Paralogues are homologues that are related or
produced by duplication within a genome followed
by subsequent divergence. They often have
different functions. - Xenologues are homologous that are related by an
interspecies (horizontal transfer) of the genetic
material for one of the homologues. The functions
of the xenologues are quite often similar.
8Analogues
- Analogues are non-homologues genes/proteins that
have descended convergently from an unrelated
ancestor. They have similar functions although
they are unrelated in either sequence or
structure.
9Comparative Genomics
Two very large problems
are immediately apparent in undertaking the
sequencing of entire genomes. First, the vast
numbers of species and the much larger size of
some genomes makes the entire sequencing of all
genomes a non-optimal approach for understanding
genome structure. Second, within a given species
most individuals are genetically distinct in a
number of ways. What does it actually mean, for
example, to "sequence a human genome"? The
genomes of two individuals who are genetically
distinct differ with respect to DNA sequence by
definition. These two problems, and the
potential for other novel applications, have
given rise to new approaches which, taken
together, constitute the field of comparative
genomics.
10Because all modern genomes have arisen from
common ancestral genomes, the relationships
between genomes can be studies with this fact in
mind. This commonality means that information
gained in one organism can have application in
other even distantly related organisms.
Comparative genomics enables the application of
information gained from facile model systems to
agricultural and medical problems. The nature and
significance of differences between genomes also
provides a powerful tool for determining the
relationship between genotype and phenotype
through comparative genomics and morphological
and physiological studies.
11The Role of Bioinformatics in Identification of
Drug Targets from Bacterial and Fungal
GenomesDr. Andrew E. DePristo, Director of
Bioinformatics, Genome Therapeutics
CorporationBacterial genomes are appearing at an
ever-increasing rate, with a September 1999
listing by NCBI indicating 16 completed, 10 being
annotated, and 55 being sequenced. Fungal genomes
and proteomes are less prevalent with one
complete, a few nearly complete, and large
collections of cDNA sequences available for about
five organisms. This presentation will discuss
use of this bacterial and fungal genomic
diversity, along with high-throughput
bioinformatics tools, to attach confidence to
certain functional predictions and to allow
identification and targeting of essential genes
that are unique to specific organisms.
12Methods (WET) Introduction A DNA walk of a
genome represents how the frequency of each
nucleotide of a pairing nucleotide couple changes
locally. This analysis implies measurement of the
local distribution of Gs in the content of GC and
of Ts in the content of TA. Lobry was the first
to propose this analysis (1996, 1999). Two
complementary representations can be derived from
the DNA walk the cumulative TA- and the GC-skew
analysis. Aim By reading these description of
the algorithm, a reader not trained in genomics
is able to redraw our graphs, using the basic
genometric data file that is posted on our web
resource for each organism as a zip file (.zip).
131) DNA walk 1.1) Drawing a DNA walk by reading a
sequence file nucleotide by nucleotide. A simple
algorithm is used to draw a DNA walk by simply
assigning a direction to each nucleotide. We
propose the following assignment, slightly
different from Lobry's to T, C, A, and G
correspond the E(ast), S(outh), W(est), and
N(orth) directions, respectively (Lobry, 1999).
Reading the nucleotide sequence nucleotide by
nucleotide, and following the rule, a path
clearly emerges on the graph Figure 1.
                                                                                                                                                                                           Â
Figure 1 DNA walk of the sequence  GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC Starting from the bottom left (bold blue line), the curve end at the bottom left (pink line)
141.2) Drawing a DNA walk by slicing a sequence
file nucleotide into small windows A simple way
to draw quickly this kind of graph is suggested
by Lobry (1996) by cutting a genome into windows
of equal length.
                                           Â
Figure 2 DNA walk of the same sequence as the one presented in Figure 1 GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC The sequence was sliced into 5-nucleotide windows. Only the fifth nucleotide per window is plotted. We can also work with the mean values of the window
Comment this method is not as precise as the
first one. We could use it with a spreadsheet
software without affecting the final resolution
of the curve at the genome level. Â
152) The cumulative TA- and the GC-skew
analyses. 2.1) Drawing a cumulative TA- or a
GC-skew analysis by reading a sequence file
nucleotide by nucleotide. Cumulative TA-skew
analysis Assign to each nucleotide the following
direction to A, T, C, and G correspond the S, N,
nd (no direction), and nd directions,
respectively. On the graph, after the reading of
one nucleotide, the pointer has to go one step
eastward. If a A, or T, is read, a further step
is added, southward, or northward, respectively.
                                          Â
16Cumulative GC-skew analysis Assign to each
nucleotide the following direction to A, T, C,
and G correspond the nd, nd, S, and N directions,
respectively. On the graph, after reading one
nucleotide, the pointer has to move one step
eastward. If a C, or G, is read, a further step
is added, southward, or northward, respectively.
                                             Â
17Methods (dry)
- Bioinformatics.
- Its tools (software)
18Computational analysis in drug target discovery
- Shannon entropy is a measure of variation or
change over a time series.Genes that exhibit
significant changes are regarded as good target
candidates. - Clustering is a method for grouping patterns by
similarities in their shapes.
19(No Transcript)
20(No Transcript)
21GCG History (tools)
Founded in 1982 as a service of the Department
of Genetics at the University of Wisconsin, GCG
became a private company in 1990 and was acquired
by Oxford Molecular Group in 1997. The company
was one of the pioneers of bioinformatics and its
Wisconsin Package sequence analysis tools are
widely used and well regarded throughout the
pharmaceutical and biotechnology industries and
in academia. To support enterprise bioinformatics
efforts, GCG developed SeqStore, its Oracle-based
data management system. Desktop solutions are
delivered to bench scientists through products
such as MacVector and OMIGA
22GCG Wisconsin Package
Molecular biologists worldwide use the GCG
Wisconsin Package as their software of choice
for comprehensive sequence analysis. The
Wisconsin Package meets research needs across
disciplines, project teams, and labs to provide
an enterprise-wide solution. Based on published
algorithms from the fields of mathematical and
computational biology, the Package includes tools
for
                                                                                                                        SeqLab, free with the Wisconsin Package, provides a graphical interface to the Package's analysis tools plus project management capabilities. SeqLab's Editor (shown above) enables you to enter sequences, view multiple sequence alignments, as well as select the sequence ranges to analyze.
Comparison Database Searching and Retrieval
DNA/RNA Secondary Structure Editing and
Publication Evolution Fragment Assembly Gene
Finding and Pattern Recognition Importing and
Exporting Mapping Primer Selection Protein
Analysis Translation
- Molecular biologists worldwide use the GCG
Wisconsin Package as their software of choice
for comprehensive sequence analysis. The
Wisconsin Package meets research needs across
disciplines, project teams, and labs to provide
an enterprise-wide solution. Based on published
algorithms from the fields of mathematical and
computational biology, the Package includes tools
for - Comparison
- Database Searching and Retrieval
- DNA/RNA Secondary Structure
- Editing and Publication
- Evolution
- Fragment Assembly
- Gene Finding and Pattern Recognition
- Importing and Exporting
- Mapping
- Primer Selection
- Protein Analysis
- Translation
23PAUP version 4.0 is a major upgrade and new
release of the software package for inference of
evolutionary trees, for use in Macintosh,
Windows, UNIX/VMS, or DOS-based formats. The
influence of high-speed computer analysis of
molecular, morphological and/or behavioral data
to infer phylogenetic relationships has expanded
well beyond its central role in evolutionary
biology, now encompassing applications in areas
as diverse as conservation biology, ecology, and
forensic studies. The success of previous
versions of PAUP Phylogenetic Analysis Using
Parsimony has made it the most widely used
software package for the inference of
evolutionary trees
24Target Validation
- Target validation involves taking steps to prove
that a DNA, RNA, or protein molecule is directly
involved in a disease process and is therefore a
suitable target for development of a new
therapeutic compound. - Genes that do not belong to an established
family are critical to many disease processes and
also need to be validated as potential drug
targets.
25Target validation identification
- Computer based Drug- design- Beginning with the
protein engineering and analysis tools we can
identify and evaluate the target. Then, with that
information we may attack the target with a
variety of tools to identify new and novel drug
candidates. The complete suite of software
products provides for a seamless environment to
work more efficiently quickly.
26Target validation identification
- Computational component analyzes genomic
sequences resulting in 3D and functional
annotations. Once annotated, sequences can be
identified as potential drug targets for
development. - X-ray crystallography has become a central tool
in modern drug and target discovery. - These annotations, made from knowledge of
predicted protein structure, are an important
component in identifying potential targets,
thereby facilitating successful and competitive
drug discovery.
27Outcomes/ Benefits
- Provides first pass information on the function
of the putative protein based on the existence of
conserved protein sequence motifs. - Advancements in computer software technologies
(Bioinformatics) has made comparative analysis of
genomes an extremely powerful approach for
functional genomics too. - These studies can also reveal insights into the
recruitment of enzymes in a pathway
28Outcomes/ Benefits
- It will help us to understand the genetic
basis of diversity in organisms, both speciation
variation, events that are important aspects of
evolutionary biology. - Comparative genomics provides a powerful way
in which to analyze sequence data. - Indeed, there is already a long list of
'model' organisms, which allow comparative
analyses in a variety of ways.
29Outcomes/ Benefits
- The very small vertebrate genome of the
pufferfish provides a simple and economical way
of comparing sequence data from mammals and fish,
representing a large evolutionary divergence and
so permitting the identification of essential
elements that are still present in both species. - These elements include genes and the associated
machinery that controls their expression
elements that, in many cases, have survived the
test of time