GSBS Core Module 7 Bioinformatics - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

GSBS Core Module 7 Bioinformatics

Description:

... pictures? Systems ... year history of X-ray crystallography. 1971: Seven protein ... X-ray, NMR and other advanced methods employed. Highly curated and ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 21
Provided by: bruce71
Category:

less

Transcript and Presenter's Notes

Title: GSBS Core Module 7 Bioinformatics


1
GSBS Core Module 7Bioinformatics
  • February 8, 2008
  • Bruce Byrne, PhD
  • Informatics Institute of UMDNJ
  • http//informatics.umdnj.edu

2
What to Expect?
  • Class Notes
  • http//informatics.umdnj.edu/bioinformatics/course
    s/CORE/
  • Interactive style of teaching
  • Dont be shy
  • Ask questions
  • Self-testing exercises prepare you for the exam
  • Not graded, but answers supplied
  • Exam, February 20
  • Part I 25
  • Solve an individualized problem, on your own,
    before coming into the examination. I will
    distribute those problems on February 18th.
  • Part II 75
  • A written exam, executed on February 20
  • Definitions, explanation of facts, and
    interpretations of results.
  • Bring any notes you have taken or other documents
    for reference.
  • You may not use computers or wireless devices
    during the exam.

3
What Constitutes a Good Answer?
  • For an exam
  • A definition?
  • An explanation?
  • An interpretation?
  • For a scientific inquiry?
  • A good question
  • Good data
  • Good interpretation
  • Knowledge and use of the literature
  • Novel or solid use of analytical skills

4
Criteria for AnswersClass Discussion 2/8/08
  • Good Answers
  • To the point
  • Factual based on evidence
  • Best answer
  • Apply concept pointers.
  • See the flow of reasoning in an explanation or
    interpretation
  • Well said
  • Bad Answers
  • Rambling
  • Wrong
  • Unclear or too clear! No copy/paste!

5
Goals for the Bioinformatics Unit
  • Appreciate the scope of a dynamic field
  • Know next year when things you learn in the next
    few sessions will be available
  • Understand that the resources are continuously
    changing and be prepared to adapt to those
    changes
  • Associate available data sets and analytical
    tools to research problems
  • Find sequence, gene, and genomic data
  • Evaluate data quality
  • Compare sequences
  • Important areas not covered in sufficient detail
  • Molecular modeling and visualization or
    Structural Bioinformatics
  • Measures of gene activity Functional Genomics
    and/or Proteomics

6
Goals for the First Lecture
  • What is Bioinformatics?
  • Review fundamental bits of biology necessary to
    the unit as a whole
  • Consider the types of data we could be looking at
    and how a biologist adds value to the
    bioinformatics team
  • Take these issues in some historical perspective
    to see where we came to this point and how far we
    have to go

7
The Scope of Bioinformatics
  • References
  • National Center for Biotechnology Information.
  • http//www.ncbi.nlm.nih.gov/
  • UCSC Genome Browser
  • http//genome.ucsc.edu/
  • Molecular Cell Biology
  • Lodish, et al., 2008. Freeman and Company, NY.
  • Genomes. T.A. Brown, Second Ed.
  • http//www.ncbi.nlm.nih.gov/books/bv.fcgi?ridgeno
    mes
  • Nature Human Genome Collection (May, 2006).
    Nature Supplements/Collections Human Genome
  • http//www.nature.com/nature/supplements/collectio
    ns/humangenome/index.html
  • Human Genome Project
  • http//www.ornl.gov/sci/techresources/Human_Genome
    /home.shtml

8
Biology Review
  • Genetics of Disease
  • Lodish Section 5.4
  • Genomic markers
  • RFLP
  • SNP
  • Association studies
  • Simple patterns of inheritance
  • Complex patterns of susceptibility

9
Biology Review
  • Eukaryotic Gene Structure
  • Lodish Section 6.3, 8.1, 8.2
  • Introns, exons, and splicing

10
Overview of Bioinformatics
  • Large number of disciplines and techniques
  • Large and complex problems and data sets
  • Practitioners
  • biologists, chemists, librarians, computer
    scientists, statisticians, engineers

11
Why is Bioinformatics Required?
  • Massive quantities of information
  • are not easily comprehended by inspection
  • must be created, stored, managed, retrieved, and
    disseminated efficiently
  • stored in databases
  • analyzed by computer programs (applications) that
    distinguish patterns.
  • The patterns can be used to compare two
    sequences, structures, or images to approximate
    similarity with each other or similarity to a
    recognized sequence, structure or image.
  • What about the human?
  • Endpoints are statistical analyses and
    visualizations
  • What does statistics do for you?
  • What is visualization?

12
What Is Your Kind of Problem?
  • Detail
  • interested in exact and detailed understanding
    of problems?
  • Big pictures?
  • Systems Biology
  • unites the sub-disciplines directed at genes,
    proteins, protein expression, metabolism and
    physiology to model larger entities in the
    analysis of networked information from organs to
    organisms and populations.

13
Data Types
  • Sequences
  • Nucleic acids
  • Protein
  • Molecular structures
  • Proteins
  • Nucleic acids
  • Small molecules (ligands/drugs)
  • Complexes of large and small molecules
  • Indicators of cellular function
  • Micorarray profiles
  • Proteomic profiles

14
A Brief History of Sequences
  • Sequences
  • 1951
  • Pauling and others
  • Margaret Dayhoff and colleagues began to assemble
    family trees as reflected by sequence similarity
  • Not all differences are equal.
  • Since the substitution of biochemically similar
    amino acids is likely functionally less
    disruptive than the substitution of dissimilar
    amino acids, comparisons needed to be weighted
  • PAM (Percent Accepted Mutation) accounts for
    observed changes among closely related sequences
    of proteins.
  • PAM1 matrix derived from sequences having only 1
    sequence dissimilarity
  • DNA sequencing
  • Chain Termination Method (Frederick Sanger)
  • PhiX-174 genome sequenced 1977 5400 bp
  • DNA sequence databases
  • Los Alamos / GenBank
  • Conceived in 1979 (Walter Goad)
  • In production 1982 1992
  • Transferred to NCBI in 1992
  • Coordinated with international databases
  • 6.3 billion bases in 80 million sequence files
    (Dec., 2007)
  • complete, structured output of all sequences from
    one organism constitutes the sequence of the
    genome

15
Sequence Data
  • How is sequence data obtained?
  • What is the nature of the data that is produced
    using current technology?
  • What are the weaknesses and strengths of these
    data?
  • Reference
  • http//en.wikipedia.org/wiki/DNA_sequencing
  • Commentary Wikipedia

16
Genomes
  • Human
  • 1985 Proposal gains strength at Department of
    Energy (DOE)
  • 1987 Congress approves 15-year Human Genome
    Project NIH begins funding projects
  • 1988 HUGO (European Effort) takes hold
  • 1989 Sequence Tagged Sites (STS) strategy
    developed
  • 1990 15-year plan begins
  • 1994 5-year goals beat target
  • 1999 First complete chromosome (22)
  • 2000 Working Draft of complete genome by
    International Human Genome Sequencing Consortium
  • 2003 Chromosomes 6, 7, Y and 14
  • 2004 Chromosomes 5, 9,10, 13 and 19
  • Gene count estimate set at 25,000
  • 2005 Chromosomes 2, 4, and X
  • 2006 Chromosomes 1, 3, 8, 11, 12, 15, and 17
  • 2007 and on no one has gotten around to listing
    later milestones!
  • Non-human
  • 1995 First non-viral genome sequenced
    (Haemophilus influenzae)
  • 1996 First eukaryote Sacchromyces
  • 1997 E. coli

17
Molecular Structures
  • 100-year history of X-ray crystallography
  • 1971 Seven protein structures
  • 1988 Rutgers University, University of Wisconsin,
    National of Standards and Technology and the San
    Diego Supercomputer Center
  • Approaching 49,000 structures
  • X-ray, NMR and other advanced methods employed
  • Highly curated and structured database
  • Each structure described the a pattern in an XYZ
    matrix describing the position of each heavy atom
    in the protein

18
Gene Activity
  • Northern Blot (PubMed 1980)
  • mRNA separated by gel electrophoresis,
    transferred to membrane, probed with DNA
  • highly labor intensive
  • DNA Micorarray (PubMed 1991-1999)
  • Short oligonucleotides or cDNAs bound to
    substrate in a matrix
  • Population of mRNA extracted from experimental
    material and controls
  • reverse transcribed to fluorescent-tagged cDNA
  • hybridized to matrix
  • fluorescent pattern representative of up- or
    down-regulated genes
  • output said to be a profile of the transcriptome
  • Proteomics (PubMed 1997)
  • Population of proteins extracted from
    experimental material and controls
  • Proteins partially digested to peptides and
    separated by gel electrophoresis into a pattern
    of peptides
  • Peptides generate characteristic patterns during
    mass spectrometry analyses which, in mature
    systems, can be associated with specific protein
    sequences and gene
  • Output said to be a profile of the proteome.

19
Review of goals for this lecture
  • What is Bioinformatics?
  • What biology must we remember
  • What kinds of data will bioinformaticists look
    at?
  • What good is the biologist to bioinformatics
  • Whats left to do?

20
BREAK
  • Next Genes and Genomes
Write a Comment
User Comments (0)
About PowerShow.com