Bioinformatics Methods and Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics Methods and Applications

Description:

Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc. – PowerPoint PPT presentation

Number of Views:2627
Avg rating:3.0/5.0
Slides: 38
Provided by: Lai120
Learn more at: https://hongyu.org
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Methods and Applications


1
BioinformaticsMethods and Applications
  • Dr. Hongyu Zhang
  • Ceres Inc.

2
Goals of the talk
  • The major battle fields in Bioinformatics
    research
  • The most popular weapons used in the battle

3
History
  • Human genome project
  • Overlapping with other branches
  • Computational Biology
  • Biocomputing
  • Biostatistics
  • Cheminfomatics

4
The Central Dogma ofMolecular Biology
Transcription
Translation
DNA
RNA
Protein
5
Major battle fields in bioinformatics
  • DNA
  • Genome sequencing
  • Gene discovery
  • mRNA
  • Micro-array analysis
  • Sequencing
  • Protein
  • Structure modeling and prediction
  • Proteomics

6
Major weapons
  • Computational algorithm
  • Hash method
  • Dynamic algorithm
  • String and Tree (binary, suffix)
  • Clustering
  • Probability and Statistical theory and methods
  • Bayesian theorem, Markov chain (HMM), Principle
    component
  • Monte Carlo simulation
  • Neural Network
  • Physical chemistry
  • Functions to describe the physical chemistry
    interactions in bio-molecules
  • Molecular mechanics, Molecular dynamics algorithm
  • Data storage and access
  • Database Oracle, MySQL etc.
  • Web interface

7
Genome sequencing Celera shotgun assemblyVenter
et al. 2001
8
Gene discoverybased on sequence comparison
  • Finding new genes based on their sequence
    similarity and evolution relationship with known
    genes
  • Methods
  • Hash-based database search method, like BLAST
    (PSI-BLAST), FASTA, BLAT etc.
  • Sequence alignment using Dynamic Programming
    algorithm

9
BLAST database search (http//www.ncbi.nih.gov/BL
AST/)
Query sequence
Database sequences
Query
database
10
Sequence alignment
  • Example

BLAST BLA-T
  • Programs
  • CLUSTALW
  • DIALIGN

11
Dynamics algorithm
Sequence A (A1, A2, , Ai, ..., Am) Sequence B
(B1, B2, , Bj, , An)
12
Ab initio gene prediction methods
  • Statistics based gene prediction
  • Nucleotides distribution frequencies in the
    coding regions
  • Exon/Intron boundary signal
  • Examples
  • GenScan, Burge and Karlin 1997
  • Fgenesh, Solovyev and Salamov 1994

13
Hybrid gene prediction method
  • Example Celera Otto program
  • BLAST against Refseq database
  • BLAST against EST database, other genomic
    sequences etc.
  • Genscan, Fgenesh

14
Problems in Gene discovery
  • Example Given a cDNA sequence, find its true
    location in the genome map among lots of
    alternatives

Query transcript/protein
Genomic component
15
Two-step solution
  1. BLAST search of the cDNA sequence against the
    whole genome map
  2. Using an LIS algorithm to find the correct
    genomic component hit

16
Phylogenetic analysis
  • Goal study the function and evolution
    relationship among a group of genes
  • Divide homologous genes into function families
  • Find the evolution relationship between the
    ortholog genes belonging to different species
    (e.g., the theory of Out of Africa)
  • Methods
  • Hierarchical Clustering
  • Neighbore-joining etc.
  • PHYLIP program, Univ. of Washington

17
(No Transcript)
18
Micro-array analysis
  • Expression-genomics
  • Primary goals
  • Look for the genes with different expression
    levels between experiments, which are candidates
    of functional genes
  • Look for the group of genes that have correlated
    gene expression levels, which could suggest that
    they are in the same biological pathway

19
  • Methods
  • General probability and statistics methods
  • Dimension reduction
  • Principle components
  • Lowess
  • Clustering
  • Tools
  • S-plus, R

20
Example
  • Herbicide
  • Plants was treated with herbicide to observe the
    gene expression profiles in a series of time
    steps.
  • The genes that appeared right before plant dies
    (12 hours) are the possible death genes
  • If we knock down the death genes in the normal
    plants, they could last longer time than the
    herbs.

21
Protein structure prediction
  • Why is protein structure important?
  • The functions of a gene depend on its translated
    protein structure
  • Protein binding with its ligands
  • Protein-protein interactions
  • A protein molecule usually keeps one stable
    structure under normal physiological conditions
    (Anfinson, 1960es)
  • Drug design
  • Docking and high throughput drug screening.

22
Sequence
Bioinformatics
Protein structure
Function
23
Protein structure prediction methods
24
Homology modeling procedure
Protein sequence
Database search
Select template structure
Sequence alignment
Build conserved regions first
Loop modeling
Build side-chains
Optimizing
25
Homology modeling programs
  • Academic software
  • MODELER, Sali A.
  • COMPOSER, Blundell T.
  • SWISS-MODEL
  • Rasmol (graphics)
  • Commercial software
  • QUANTA, MSI inc.
  • SYBYL, TRIPOS inc.

26
Threading
  • Find the best fold candidates among a limited
    number of choices
  • Add 3D information to the score function of
    dynamic programming

27
Ab initio protein structure principle
28
  • Threading programs
  • Topits, Eisenberg D.
  • Threader, Jones D.
  • ProSup, Sipple M
  • 123D, Alexandra N.
  • Ab initio programs
  • Rosetta, David Baker

29
Current status in the protein structure
prediction field
  • Moult J., CASP (Critical Assessment of Techniques
    for Protein Structure Prediction).
  • Homology modeling is very mature already
  • Threading and Ab initio method have been used in
    industry
  • Structure genomics

30
Large scale computing platform
  • Hardware
  • Super-computers
  • Cray/SGI
  • DEC/Compaq
  • Intel
  • Linux clusters
  • Blade
  • Software
  • Parallel computing (MPP, PVM etc.)
  • Linux
  • Grid computing the Globus Project

31
Linux clusters
32
Data storage and access
  • Bioinformatics is producing huge amount of data
    each day
  • How to organize and store data
  • How to access data
  • Database software
  • Commercial
  • Oracle, DB2, Sybase
  • Freeware
  • MySQL, PostgreSQL

33
Data store and access
  • Bioinformatics is producing huge amount of data
    each day
  • How to organize and store data
  • How to access data
  • Database software
  • Commercial
  • Oracle, DB2, Sybase
  • Freeware
  • MySQL, PostgreSQL
  • Current popular database
  • DNA, protein sequence, like Genbank, SwisProt,
    PIR etc.
  • Protein structure, like PDB, Scop
  • DNA, mRNA, protein function, like GO, PFAM

34
Database example Gene Ontology (GO)
Molecular function
Biological process
Cellular component
35
Data access
  • Web interface
  • Protocol
  • CGI, JSP, ASP
  • Computer languages
  • Perl, Java, C/C, Visual Basic, Visual C

36
Forth looking
  • Where are the markets
  • Develop new programs
  • Assemble current programs to build more efficient
    data mining pipelines
  • Data storage and access
  • Integrate the current database to use them more
    effectively
  • Computing platform, including hardware, software
    support, consulting etc.
  • What we can offer
  • Multi-talents
  • Team work
  • Networking

37
http//www.hongyu.org/paper/bioinformatics.ppt
Write a Comment
User Comments (0)
About PowerShow.com