Bioinformatics Databases Overview - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Bioinformatics Databases Overview

Description:

EBI EMBL (UK) CIB DDBJ (Japan) Purposes. public research data ... address the ethical, legal, and social issues (ELSI) that may arise from the project ... – PowerPoint PPT presentation

Number of Views:254
Avg rating:3.0/5.0
Slides: 21
Provided by: chen126
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Databases Overview


1
Bioinformatics Databases Overview
  • Microarray Database Systems

2
Biological Database
  • NCBI a large, organized body of persistent
    data, usually associated with computerized
    software designed to update, query, and retrieve
    components of the data stored within the system.
  • http//www.ncbi.nlm.nih.gov/About/primer/bioinform
    atics.html

3
Background
  • Complete genome sequence (structural)
  • Human Genome Project
  • Functional genomics
  • bioinformatics data mining, data integration
  • Application
  • new discoveries
  • gene function, disease gene
  • drugs
  • new diagnosis methods
  • new treatments

4
Background (2)
  • Major Databases
  • NCBI GenBank (US)
  • EBI EMBL (UK)
  • CIB DDBJ (Japan)
  • Purposes
  • public research data submission and sharing
  • data retrieval and analysis resources

5
Types of Databases
  • Nucleotide
  • GenBank, EMBL NSD
  • Protein
  • SWISS-PROT
  • Molecular Structure
  • Protein NCBI, EBI MSD
  • Gene Expression
  • EBI ArrayExpress, NCBI GEO
  • Regulation Network
  • KEGG
  • Species-specific
  • mouse, fly, etc.

6
Human Genome Project
  • identify all the approximately 30,000 genes in
    human DNA
  • determine the sequences of the 3 billion chemical
    base pairs that make up human DNA
  • store this information in databases
  • improve tools for data analysis
  • transfer related technologies to the private
    sector
  • address the ethical, legal, and social issues
    (ELSI) that may arise from the project

7
Human Genome Project (2)
  • History
  • BERAC Recommendation, 1987
  • 1990, US DOE and NIH
  • working draft 2001, with Celera Genomics
  • original timeline 2005
  • new timeline 2003, complete sequence
  • Phases
  • draft, 90, 1 error
  • complete, gt 99, 0.01 error

8
Celera Genomics
  • Craig Venter
  • Shotgun method
  • Celera/Science agreement
  • not to reproduce, redistribute, re-package, adapt
    or prepare derivative works of Celera data for
    third party, in any form whatsoever, for any
    purpose
  • prevent large scale downloads and incorporation
    of this data into GenBank/EMBL/DDBJ

9
NCBI
  • National Center for Biotechnology Information
  • 1988, US Congress
  • DOE, NIH
  • System for data storage and analysis
  • Advancing methods for data retrieval and analysis
  • Facilitation of use

10
NCBI (2)
  • Database
  • GenBank
  • Database Retrieval
  • Entrez, PubMed, Locus Link, Texonomy Browser
  • Data Analysis
  • BLAST, Electronic PCR, ORF Finder, UniGene, Human
    MapViewer

11
GenBank
12
GenBank (2)
  • History
  • NIGMS/NIH, since 1982
  • Los Alamos National Lab
  • Annotation and organization
  • BBN Lab Inc.
  • Maintenance, distribution
  • IntelliGenetics, 1987
  • Operation
  • NCBI, 1988
  • Main Purpose
  • data submission, data repository

13
GenBank (2)
  • Flat file format
  • Data Divisions
  • 17 divisions
  • Organismal Divisions
  • 1. PRI - primate sequences  2. ROD - rodent
    sequences  3. MAM - other mammalian seq  4. VRT
    - other vertebrate seq  5. INV - invertebrate
    sequences  6. PLN - plant, fungal, and algal 
    7. BCT - bacterial sequences  8. VRL - viral
    sequences  9. PHG - bacteriophage seq10. SYN -
    synthetic sequences11. UNA - unannotated
    sequences
  • Functional Divisions
  • 12.EST - EST sequences (expressed sequence tags)
  • 13. PAT - patent sequences
  • 14. STS - STS sequences (sequence tagged sites)
  • 15. GSS - GSS sequences (genome survey sequences)
  • 16. HTG - HTGS sequences (high throughput genomic
    sequences)
  • 17. HTC - unfinished high-throughput cDNA
    sequencing

14
EST (Expressed Sequence Tag)
  • Partial cDNA sequences of genes expressed in
    different tissues
  • 5 and 3 partial sequencing

15
EMBL
16
EMBL (2)
  • EBI (European Bioinformatics Institute)
  • building, maintaining and providing biological
    databases and information services to support
    data deposition and exploitation.
  • Format

17
ArrayExpress
18
ArrayExpress (2)
  • a public repository of microarray based gene
    expression data
  • MIAME recommendations
  • Minimum Information About a Microarray Experiment
  • Data submission in MAGE-ML
  • XML format

19
PDB
20
PDB (2)
  • Protein Data Bank
  • http//www.rcsb.org/pdb/
  • processing and distribution of 3-D biological
    macromolecular structure data
Write a Comment
User Comments (0)
About PowerShow.com