Teaching Bioinformatics to Undergraduates http:www.med.nyu.edurcrASM - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Teaching Bioinformatics to Undergraduates http:www.med.nyu.edurcrASM

Description:

... 1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. Length = 369 ... to change it - to modify the characteristics of organisms and people in a wide variety of ways ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 62
Provided by: NBIF
Category:

less

Transcript and Presenter's Notes

Title: Teaching Bioinformatics to Undergraduates http:www.med.nyu.edurcrASM


1
Teaching Bioinformatics to Undergraduateshttp//
www.med.nyu.edu/rcr/ASM
Stuart M. Brown Research Computing, NYU School of
Medicine
2
  • I. What is Bioinformatics?
  • II. Challenges of teaching bioinformatics to
    undergraduates
  • III. Common bioinformatics tools that you can
    use for teaching
  • IV. The limits of knowledge
  • V. Resources for the teacher

3
I. What is Bioinformatics?
  • The use of information technology to collect,
    analyze, and interpret biological data.
  • The use of software tools that deal with
    biological sequences, genome analysis, molecular
    structures, gene expression, regulatory and
    metabolic modeling
  • Computational biology - the design of new
    algorithms and software to support biology
    research
  • The routine use of computers in all phases of
    biology and medicine

4
The Human Genome Project
5
A Genome Revolution in Biology and Medicine
  • We are in the midst of a "Golden Era" of biology
  • The Human Genome Project has produced a huge
    storehouse of data that will be used to change
    every aspect of biological research and medicine
  • The revolution is about treating biology as an
    information science, not about specific
    biochemical technologies.

6
The job of the biologist is changing
  • As more biological information becomes available
    and laboratory equipment becomes more automated
    ...
  • The biologist will spend more time using
    computers
  • The biologist will spend more time on
    experimental design and data analysis (and less
    time doing tedious lab biochemistry)
  • Biology will become a more quantitative science
    (think how the periodic table affected
    chemistry)

7
II. Why teach bioinformatics in undergraduate
education?
  • Demand for trained graduates from the biomedical
    industry
  • Bioinformatics is essential to understand current
    developments in all fields of biology
  • We need to educate an entire new generation of
    scientists, health care workers, etc.
  • Use bioinformatics to enhance the teaching of
    other subjects genetics, evolution, biochemistry

8
Biochemistry Protein Structures
  • "Hands-on graphics is a powerful enhancement to
    learning, particularly individualized learning.
    There is powerful synergy in learning about
    proteins and learning simultaneously about how to
    represent and manipulate them with computer
    graphics. When students learn to use graphics
    they see proteins and other complex biomolecules
    in a new and vivid way, and discover personal
    solutions to the problem of "seeing" new
    structural concepts."

Molecular Graphics Manifesto Gale Rhodes Chemist
ry Department, Univ. of Southern Maine
9
Challenges of presenting bioinformatics to
undergraduates
  • Requires a deep understanding of molecular
    biology - lots of prerequisites
  • Training users or makers of these tools?
  • A good bioinformatics program will require
    substantially more math and statistics than most
    existing molecular biology and computer science
    curricula.
  • Who will teach?

10
Different Programs, Different Goals
  • Integrate into existing biology courses
  • genetics, molecular biology, microbiology
  • Make one or a few cross-disciplinary courses
  • jointly taught by biology and computing faculty
  • open to both biology and computing students
  • Create a curriculum for a true bioinformatics
    major (is this a double major?)
  • Are you training for employment or providing the
    fundamentals for advanced training?

11
Shallow End
  • This workshop will focus on faculty skills needed
    at the shallow end of the continuum (a few
    lectures or a short course).
  • Use bioinformatics to teach biological concepts
  • Evolution
  • Genetics
  • Protein structure and function

12
How much Computing skills?
  • Bioinformatics can be seen as a tool that the
    biologist needs to use - like PCR
  • Or should biologists be able to write their own
    programs and build databases?
  • it is a big advantage to be able to design
    exactly the tool that you want
  • this may be the wave of the future
  • Is your school going to train "bioinformatics
    professionals" or biologists with informatics
    skills?"

13
Designing a Curriculum
  • To really master bioinformatics, students need to
    learn a lot of molecular biology and genetics as
    well as become competent programmers.
  • Then they need to learn specific bioinformatics
    skills - dealing with sequence databases,
    similarity algorithms, etc.
  • How can students learn this much material and
    still manage a well rounded education?
  • Graduates of these programs will become
    scientists and managers. Writing and presentation
    skills are essential components of their
    education.

14
Different Schools have Different Biases
  • There are still only a handful of bioinformatics
    undergraduate programs
  • Many more schools offer a single course or a
    "specialized track" similar to a biotechnology
    major
  • You can generally predict the bias according to
    what school/department hosts the program
  • Computer Science vs. biology
  • Biomedical engineering
  • Medical informatics (library science)

15
Teaching the Teachers
  • There are more graduate level bioinformatics
    programs, but they are all very new.
  • Graduates of these programs will have many
    opportunities as more schools gear up to offer
    bioinformatics training
  • The reality is that most schools will draft
    existing faculty - often jointly from Bio and
    CompSci departments
  • We need to train an entire generation of
    existing faculty in a new discipline

16
Teaching Tips
  • Strike a balance between theory and practical
    experience
  • early bioinformatics training should be about
    what you can do with the tools
  • deeper training can focus on how they work
  • Balance the "click here" tutorials against
    letting them figure it out for themselves
  • it will be different when they look at it next
    time
  • real bioinformatics work involves finding ways to
    overcome frustrations with balky computer systems

17
Training "computer savvy" scientists
  • Know the right tool for the job
  • Get the job done with tools available
  • Network connection is the lifeline of the
    scientist
  • Jobs change, computers change, projects change,
    scientists need to be adaptable

18
III. Bioinformatics Tools You Can Use
  • GenBank - genes, proteins, genomes
  • Similarity Search tools BLAST
  • Alignment CLUSTAL
  • Protein families Pfam, ProDom
  • Protein Structures PDB, RasMol
  • Whole Genomes UCSC, Entrez Genomes
  • Human Mutations OMIM
  • Biochemical Pathways KEGG
  • Integrated tools Biology Workbench, BCM
    SearchLauncher

19
Large Databases
  • Once upon a time, GenBank sent out sequence
    updates on CD-ROM disks a few times per year.
  • Now GenBank is over 40 Gigabytes
  • (11 billion bases)
  • Most biocomputing sites update their copy of
    GenBank every day over the internet.
  • Scientists access GenBank directly over the Web

20
Finding Genes in GenBank
  • These billions of G, A, T, and C letters would be
    almost useless without descriptions of what genes
    they contain, the organisms they come from, etc.
  • All of this information is contained in the
    "annotation" part of each sequence record.

21
(No Transcript)
22
Entrez is a Tool for Finding Sequences
  • GenBank is managed by the NCBI (National Center
    for Biotechnology Information) which is a part of
    the US National Library of Medicine.
  • NCBI has created a Web-based tool called Entrez
    for finding sequences in GenBank.
  • http//www.ncbi.nlm.nih.gov
  • Each sequence in GenBank has a unique accession
    number.
  • Entrez can also search for keywords such as gene
    names, protein names, and the names of orgainisms
    or biological functions

23
(No Transcript)
24
Entrez Databases contain more than just DNA
protein sequences
25
Type in a Query term
  • Enter your search words in the
  • query box and hit the Go button

26

27
Refine the Query
  • Often a search finds too many (or too few)
    sequences, so you can go back and try again with
    more (or fewer) keywords in your query
  • The History feature allows you to combine any
    of your past queries.
  • The Limits feature allows you to limit a query
    to specific organisms, sequences submitted during
    a specific period of time, etc.
  • Many other features are designed to search for
    literature in MEDLINE

28
Related Items
  • You can search for a text term in sequence
    annotations or in MEDLINE abstracts, and find all
    articles, DNA, and protein sequences that mention
    that term.
  • Then from any article or sequence, you can move
    to "related articles" or "related sequences".
  • Relationships between sequences are computed with
    BLAST
  • Relationships between articles are computed with
    "MESH" terms (shared keywords
  • Relationships between DNA and protein sequences
    rely on accession numbers
  • Relationships between sequences and MEDLINE
    articles rely on both shared keywords and the
    mention of accession numbers in the articles.

29
(No Transcript)
30
Database Search Strategies
  • General search principles - not limited to
    sequence (or to biology)
  • Use accession numbers whenever possible
  • Start with broad keywords and narrow the search
    using more specific terms
  • Try variants of spelling, numbers, etc.
  • Search all relevant databases
  • Be persistent!!

31
(No Transcript)
32
  • gbBE588357.1BE588357 194087 BARC 5BOV Bos
    taurus cDNA 5'.
  • Length 369
  • Score 272 bits (137), Expect 4e-71
  • Identities 258/297 (86), Gaps 1/297 (0)
  • Strand Plus / Plus


  • Query 17 aggatccaacgtcgctccagctgctcttgacgactccac
    agataccccgaagccatggca 76


  • Sbjct 1 aggatccaacgtcgctgcggctacccttaaccact-cgc
    agaccccccgcagccatggcc 59


  • Query 77 agcaagggcttgcaggacctgaagcaacaggtggagggg
    accgcccaggaagccgtgtca 136


  • Sbjct 60 agcaagggcttgcaggacctgaagaagcaagtggagggg
    gcggcccaggaagcggtgaca 119


  • Query 137 gcggccggagcggcagctcagcaagtggtggaccaggcc
    acagaggcggggcagaaagcc 196

33
Sample Multiple Alignment
34
Protein domains (from ProDom database)
35
Limits on best Matched Annotation Inheritance
result from many things including multi domain
proteins transitivity.
New sequence
Closest database annotated entry
Original studied protein from which
annotation was inherited.
36
Protein Structure
  • It is not really possible to predict protein
    structure from just amino acid sequence
  • PDB is a database of know protein structures
    (determined by X-ray crystallography and NMR)
  • There are also very handy structure viewers such
    as RasMol that are free for any computer

37
(No Transcript)
38
(No Transcript)
39
Genome Browsers
  • Scientists need to work with a lot of layers of
    information about the genome
  • coding sequence of known genes and cDNAs
  • computer-predicted genes
  • genetic maps (known mutations and markers)
  • gene expression
  • cross species homology

40
(No Transcript)
41
UCSC
42
Ensembl at EBI/EMBL
43
Human Alleles
  • The OMIM (Online Mendelian Inheritance in Man)
    database at the NCBI tracks all human mutations
    with known phenotypes.
  • It contains a total of about 2,000 genetic
    diseases and another 11,000 genetic loci with
    known phenotypes - but not necessarily known gene
    sequences
  • It is designed for use by physicians
  • can search by disease name
  • contains summaries from clinical studies

44
(No Transcript)
45
KEGG Kyoto Encylopedia of Genes and Genomes
  • Enzymatic and regulatory pathways
  • Mapped out by EC number and cross-referenced to
    genes in all known organisms
  • (wherever sequence information exits)
  • Parallel maps of regulatory pathways

46
(No Transcript)
47
(No Transcript)
48
Integrated Online Tools
  • National Database/Sequence Analsysis Servers
  • NCBI, EMBL/EBI, DDBJ
  • Tools for specific types of data or problems
  • Expasy (Protein, Mass Spec, 2-D PAGE)
  • 3-D Protein Structures PDB, Predict Protein
    Server
  • Education oriented tools
  • Biology Workbench
  • Collections of links to other servers
  • BCM SearchLauncher

49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
The Limits of our Knowledge
  • Bioinformatics is a very dynamic discipline
  • Teachers can't know everything in the field
  • The databases are clumsily built
  • Biology is vastly more complex than our software
  • Lots of our current bioinformatics programs don't
    work well
  • We don't have even theoretical solutions for Gene
    prediction, alternative splicing, protein
    structure function prediction, regulatory
    networks

55
What is a Gene?
  • For every 2 biologists, you get 3 definitions
  • A DNA sequence that encodes a heritable
    trait.
  • The unit of heredity
  • Is it an abstract concept, or something you can
    isolate in a tube or print on your screen?
  • Classic vs.. modern understanding of
    molecular biology

56
Genome Confusion
  • The sequence of a gene in the genome includes
  • protein coding sequence
  • introns and exons
  • 5' and 3' untranslated regions on the mRNA
  • promoter and 5' transcription factor binding
    sites
  • enhancers??
  • What about alternative splicing?
  • Multiple cDNAs with different sequences (that
    produce different proteins) can be transcribed
    from the same genomic locus

57
V. Teaching Resources
  • The Biology Student WorkBench
  • http//peptide.ncsa.uiuc.edu/bioswb.html
  • RasMol/Chime/Protein Explorer
  • http//www.umass.edu/microbio/rasmol/
  • http//www.umass.edu/microbio/chime/
  • http//www.umass.edu/microbio/chime/explorer/
  • Bioinformatics.org
  • Other courses - It ain't cheating to learn from
    your peers

58
  • Terri Attwood's Web Biocomputing tutorials
  • http//www.biochem.ucl.ac.uk/bsm/dbbrowser/c32/ind
    ex.html
  • http//www.biochem.ucl.ac.uk/bsm/dbbrowser/jj/pref
    acefrm.html
  • Sequence Analysis on the Web
  • Christian Büschking and Chris Schleiermacher
  • http//bibiserv.techfak.uni-bielefeld.de/sadr/
  • Online Lectures on Bioinformatics
  • Hannes Luz, Max Planck Institute for Molecular
    Genetics
  • http//lectures.molgen.mpg.de/
  • Using Computers in Molecular Biology
  • Stuart Brown, NYU School of Medicine
  • http//www.med.nyu.edu/rcr/rcr/course/index.html

  • Teach Yourself Bioinformatics on the Web
  • http//www.med.nyu.edu/rcr/rcr/btr/index.html

59
Long Term Implications
  • A "periodic table for biology" will lead to an
    explosion of research and discoveries - we will
    finally have the tools to start making systematic
    analyses of biological processes (quantitative
    biology).
  • Understanding the genome will lead to the
    ability to change it - to modify the
    characteristics of organisms and people in a wide
    variety of ways

60
Genomics in Medical Education
  • The explosion of information about the new
    genetics will create a huge problem in health
    education. Most physicians in practice have had
    not a single hour of education in genetics and
    are going to be severely challenged to pick up
    this new technology and run with it."
  • Francis Collins

61
Stuart M. Brown, Ph.D.stuart.brown_at_med.nyu.eduww
w.med.nyu/rcr
Bioinformatics A Biologist's Guide to
Biocomputing and the Internet
Write a Comment
User Comments (0)
About PowerShow.com