Bioinformatics - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Bioinformatics

Description:

Scientists access GenBank directly over the Web. The Human Genome Project ... Cancer Genome Anatomy Project (CGAP) gene expression profiles of normal, pre ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 38
Provided by: Emr62
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics


1
Bioinformatics
  • A short introduction

2
What is Bioinformatics?
  • Precise definition is a matter of debate.
  • Computational Biology
  • Computational Molecular Biology
  • Development of databases to store manipulate
    genetic information
  • Not usually used to describe biology above the
    cellular level

3
Molecular Biology Background
  • The central concepts are not tricky
  • Terminology can be a problem
  • Look it up!

4
DNA
Deoxyribonucleic Acid 4 letter alphabet A,T,C,G
5
Transcription
6
Amino Acids
20 Letter Alphabet
7
Proteins
  • Sequences of amino acids
  • Primary, Secondary Tertiary structure
  • No easy way to predict 3D structure from
    sequence.
  • Compare with known sequences
  • Modular nature
  • Substitutions

8
Protein Functions
  • Enzymatic catalysis (helping control molecular
    reaction pathways)
  • Structural support
  • Generation of motion
  • Reception of signals between cells
  • Transduction of forces (light, pressure, etc)
    into chemical signals
  • Many more their functions combine within a cell
    to create a living apparatus

9
What is a Gene?
  • For every 2 biologists, you get 3 definitions
  • A DNA sequence that encodes a heritable
    trait.
  • The unit of heredity
  • Is it an abstract concept, or something you can
    isolate in a tube or print on your screen?
  • Classic vs.. modern understanding of
    molecular biology

10
Clustering (Phylogenetics)
11
Molecular Biology Summary
  • DNA 4 letter alphabet
  • Sequences of 3 letters code for an amino acid.
  • 4 cubed is 64 and there are only 20 amino acids,
    so there are multiple codons for each amino acid.
  • Amino acid sequences are peptides (short) or
    proteins (long gt50).

12
Complications
  • Start stop sequences
  • Noncoding regions
  • Introns exons
  • Promoter regulatory regions
  • Not always present
  • No straightforward method to accurately predict

13
What is Bioinformatics?
  • The study of two information flows in molecular
    biology
  • The central dogma of molecular biology.
  • DNA sequences are transcribed into mRNA sequences
    that are translated into proteins.
  • Protein sequences fold into 3D structures that
    have functions.
  • Bioinformatic applications in this first domain
    address any stage of the central dogma.

14
What is Bioinformatics?
  • Bioinformatic applications in this first domain
    address any stage of the central dogma.
  • Sequence databases (GenBank)
  • Structure databases (PDB)
  • Sequence alignment
  • Database searching
  • Predictive Methods

15
What is Bioinformatics?
  • Assisting the scientific method.
  • Hypotheses about biological activity are created.
  • Experiments are designed to test hypotheses.
  • Results are evaluated and may extend or modify
    the hypotheses.
  • Bioinformatic applications in this second domain
    address any stage of the scientific method.

16
What is Bioinformatics?
  • Bioinformatic applications in this second domain
    address any stage of the scientific method.
  • Systems which generate hypotheses
  • Design experiments
  • Store organize data from experiments in
    databases
  • Test compatibility of data with models
  • Modify hypotheses

17
Why Bioinformatics?
  • Explosion of data in Biology
  • DNA sequencing
  • PCR amplification
  • Mass spectrometry
  • X-Ray crystallography
  • Microarrays

18
Large Databases
  • Once upon a time, GenBank sent out sequence
    updates on CD-ROM disks a few times per year.
  • Now GenBank is over 70 Gigabytes
  • (20 billion bases)
  • Most biocomputing sites update their copy of
    GenBank every day over the internet.
  • Scientists access GenBank directly over the Web

19
The Human Genome Project
  • Started in early 1990s
  • Sequence 3 billion base pairs.
  • Was not expected to be complete before 2005
  • April 2003, 99 complete

20
Raw Genome Data
21
Mass spectrometry
  • Samples are bombarded with electrons resulting
    in the acquisition of a positive charge.
  • The samples are accelerated and subjected to a
    magnetic field. Samples will interact with the
    receptor based on their mass.
  • The masses are then interpreted to infer the
    amino acid sequence.

22
ProteinIdentification
23
Microarrays
24
Microarrays
25
Bioinformatics Tools
  • GenBank - genes, proteins, genomes
  • Similarity Search tools BLAST, VAST
  • Alignment CLUSTAL
  • Protein families Pfam, ProDom
  • Protein Structures PDB, RasMol
  • Whole Genomes UCSC, Entrez Genomes
  • Human Mutations OMIM
  • Biochemical Pathways KEGG
  • Integrated tools Biology Workbench, BCM
    SearchLauncher

26
Sequence Alignments
  • Global vs. Local
  • Substitutions/Deletions/Insertions
  • Dynamic Programming
  • Optimal path can be found by incrementing the
    optimal subpaths
  • Substitution scoring
  • PAM250, BLOSUM62
  • Performed tens of thousands of times per day.

27
Global vs. Local Alignments
28
PAM250 (Point accepted mutation)
29
BLAST Algorithm
30
Databases for use with Blast
  • nr Nonredundant GeneBank
  • est Expressed Sequence Tags
  • sts Sequence tagged sites
  • gss Genome survey sequences
  • more

31
GenBank Sections
  • In addition to DNA sequences of genes GenBank
    has a number of other sections including
  • Protein sequences (translated from DNA)
  • Short RNA fragments (ESTs)
  • Cancer Genome Anatomy Project (CGAP) gene
    expression profiles of normal, pre-cancer, and
    cancer cells from a wide variety of tissue types
  • Single Nucleotide Polymorphisms (SNPs) which
    represent genetic variations in the human
    population
  • Online Mendelian Inheritance in Man (OMIM) a
    database of human genetic disorders

32
Entrez Databases contain more than just DNA
protein sequences
33
VAST
  • Vector Alignment Search Tool
  • Protein structure neighbors are determined by
    direct comparison of 3D protein structures.
  • Each of the more than 87,804 domains in MMDB is
    compared to every other one.

34
Entrez/GenBank
  • National Center for Biological Information (NCBI)
  • Visit
  • Blast for
  • GATGCCATAGAGCTGTAGTCGTACCCT

35
(No Transcript)
36
Research Ideas
  • Sequence search algorithms
  • Structure search algorithms
  • Predictive algorithms
  • Database methodologies
  • Structure visualization
  • Second Domain Apps

37
Bibliography
  • Bioinformatics, A Practical Guide to the Analysis
    of Genes Protiens, Baxevanis Ouellette,
    Wiley, 2001.
  • Structural Bioinformatics, Bourne Weissig,
    Wiley, 2003.
  • BioTeach, http//www.bioteach.ubc.ca/index.htm
  • ASM Bioinformatics workshop, http//www.med.nyu.ed
    u/rcr/ASM/
  • Bioinformatics in Support of Molecular Medicine,
    http//smi-web.stanford.edu/pubs/SM
    I_Abstracts/SMI-98-0731.html
  • More links at http//www.cs.utsa.edu/ebrookes/cs7
    123/
Write a Comment
User Comments (0)
About PowerShow.com