Title: Teaching Bioinformatics to Undergraduates http:www.med.nyu.edurcrASM
1Teaching Bioinformatics to Undergraduateshttp//
www.med.nyu.edu/rcr/ASM
Stuart M. Brown Research Computing, NYU School of
Medicine
2- I. What is Bioinformatics?
- II. Challenges of teaching bioinformatics to
undergraduates
- III. Common bioinformatics tools that you can
use for teaching
- IV. The limits of knowledge
- V. Resources for the teacher
3I. What is Bioinformatics?
- The use of information technology to collect,
analyze, and interpret biological data.
- The use of software tools that deal with
biological sequences, genome analysis, molecular
structures, gene expression, regulatory and
metabolic modeling - Computational biology - the design of new
algorithms and software to support biology
research
- The routine use of computers in all phases of
biology and medicine
4The Human Genome Project
5A Genome Revolution in Biology and Medicine
- We are in the midst of a "Golden Era" of biology
- The Human Genome Project has produced a huge
storehouse of data that will be used to change
every aspect of biological research and medicine
- The revolution is about treating biology as an
information science, not about specific
biochemical technologies.
6The job of the biologist is changing
- As more biological information becomes available
and laboratory equipment becomes more automated
...
- The biologist will spend more time using
computers
- The biologist will spend more time on
experimental design and data analysis (and less
time doing tedious lab biochemistry)
- Biology will become a more quantitative science
(think how the periodic table affected
chemistry)
7II. Why teach bioinformatics in undergraduate
education?
- Demand for trained graduates from the biomedical
industry
- Bioinformatics is essential to understand current
developments in all fields of biology
- We need to educate an entire new generation of
scientists, health care workers, etc.
- Use bioinformatics to enhance the teaching of
other subjects genetics, evolution, biochemistry
8Biochemistry Protein Structures
- "Hands-on graphics is a powerful enhancement to
learning, particularly individualized learning.
There is powerful synergy in learning about
proteins and learning simultaneously about how to
represent and manipulate them with computer
graphics. When students learn to use graphics
they see proteins and other complex biomolecules
in a new and vivid way, and discover personal
solutions to the problem of "seeing" new
structural concepts."
Molecular Graphics Manifesto Gale Rhodes Chemist
ry Department, Univ. of Southern Maine
9Challenges of presenting bioinformatics to
undergraduates
- Requires a deep understanding of molecular
biology - lots of prerequisites
- Training users or makers of these tools?
- A good bioinformatics program will require
substantially more math and statistics than most
existing molecular biology and computer science
curricula. - Who will teach?
10Different Programs, Different Goals
- Integrate into existing biology courses
- genetics, molecular biology, microbiology
- Make one or a few cross-disciplinary courses
- jointly taught by biology and computing faculty
- open to both biology and computing students
- Create a curriculum for a true bioinformatics
major (is this a double major?)
- Are you training for employment or providing the
fundamentals for advanced training?
11Shallow End
- This workshop will focus on faculty skills needed
at the shallow end of the continuum (a few
lectures or a short course).
- Use bioinformatics to teach biological concepts
- Evolution
- Genetics
- Protein structure and function
12How much Computing skills?
- Bioinformatics can be seen as a tool that the
biologist needs to use - like PCR
- Or should biologists be able to write their own
programs and build databases?
- it is a big advantage to be able to design
exactly the tool that you want
- this may be the wave of the future
- Is your school going to train "bioinformatics
professionals" or biologists with informatics
skills?"
13Designing a Curriculum
- To really master bioinformatics, students need to
learn a lot of molecular biology and genetics as
well as become competent programmers.
- Then they need to learn specific bioinformatics
skills - dealing with sequence databases,
similarity algorithms, etc.
- How can students learn this much material and
still manage a well rounded education?
- Graduates of these programs will become
scientists and managers. Writing and presentation
skills are essential components of their
education.
14Different Schools have Different Biases
- There are still only a handful of bioinformatics
undergraduate programs
- Many more schools offer a single course or a
"specialized track" similar to a biotechnology
major
- You can generally predict the bias according to
what school/department hosts the program
- Computer Science vs. biology
- Biomedical engineering
- Medical informatics (library science)
15Teaching the Teachers
- There are more graduate level bioinformatics
programs, but they are all very new.
- Graduates of these programs will have many
opportunities as more schools gear up to offer
bioinformatics training
- The reality is that most schools will draft
existing faculty - often jointly from Bio and
CompSci departments
- We need to train an entire generation of
existing faculty in a new discipline
16Teaching Tips
- Strike a balance between theory and practical
experience
- early bioinformatics training should be about
what you can do with the tools
- deeper training can focus on how they work
- Balance the "click here" tutorials against
letting them figure it out for themselves
- it will be different when they look at it next
time
- real bioinformatics work involves finding ways to
overcome frustrations with balky computer systems
17Training "computer savvy" scientists
- Know the right tool for the job
- Get the job done with tools available
- Network connection is the lifeline of the
scientist
- Jobs change, computers change, projects change,
scientists need to be adaptable
18III. Bioinformatics Tools You Can Use
- GenBank - genes, proteins, genomes
- Similarity Search tools BLAST
- Alignment CLUSTAL
- Protein families Pfam, ProDom
- Protein Structures PDB, RasMol
- Whole Genomes UCSC, Entrez Genomes
- Human Mutations OMIM
- Biochemical Pathways KEGG
- Integrated tools Biology Workbench, BCM
SearchLauncher
19Large Databases
- Once upon a time, GenBank sent out sequence
updates on CD-ROM disks a few times per year.
- Now GenBank is over 40 Gigabytes
- (11 billion bases)
- Most biocomputing sites update their copy of
GenBank every day over the internet.
- Scientists access GenBank directly over the Web
20Finding Genes in GenBank
- These billions of G, A, T, and C letters would be
almost useless without descriptions of what genes
they contain, the organisms they come from, etc.
- All of this information is contained in the
"annotation" part of each sequence record.
21(No Transcript)
22Entrez is a Tool for Finding Sequences
- GenBank is managed by the NCBI (National Center
for Biotechnology Information) which is a part of
the US National Library of Medicine.
- NCBI has created a Web-based tool called Entrez
for finding sequences in GenBank.
- http//www.ncbi.nlm.nih.gov
- Each sequence in GenBank has a unique accession
number.
- Entrez can also search for keywords such as gene
names, protein names, and the names of orgainisms
or biological functions
23(No Transcript)
24Entrez Databases contain more than just DNA
protein sequences
25Type in a Query term
- Enter your search words in the
- query box and hit the Go button
26 27Refine the Query
- Often a search finds too many (or too few)
sequences, so you can go back and try again with
more (or fewer) keywords in your query
- The History feature allows you to combine any
of your past queries.
- The Limits feature allows you to limit a query
to specific organisms, sequences submitted during
a specific period of time, etc.
- Many other features are designed to search for
literature in MEDLINE
28Related Items
- You can search for a text term in sequence
annotations or in MEDLINE abstracts, and find all
articles, DNA, and protein sequences that mention
that term. - Then from any article or sequence, you can move
to "related articles" or "related sequences".
- Relationships between sequences are computed with
BLAST
- Relationships between articles are computed with
"MESH" terms (shared keywords
- Relationships between DNA and protein sequences
rely on accession numbers
- Relationships between sequences and MEDLINE
articles rely on both shared keywords and the
mention of accession numbers in the articles.
29(No Transcript)
30Database Search Strategies
- General search principles - not limited to
sequence (or to biology)
- Use accession numbers whenever possible
- Start with broad keywords and narrow the search
using more specific terms
- Try variants of spelling, numbers, etc.
- Search all relevant databases
- Be persistent!!
31(No Transcript)
32- gbBE588357.1BE588357 194087 BARC 5BOV Bos
taurus cDNA 5'.
- Length 369
- Score 272 bits (137), Expect 4e-71
- Identities 258/297 (86), Gaps 1/297 (0)
- Strand Plus / Plus
-
- Query 17 aggatccaacgtcgctccagctgctcttgacgactccac
agataccccgaagccatggca 76
-
- Sbjct 1 aggatccaacgtcgctgcggctacccttaaccact-cgc
agaccccccgcagccatggcc 59
-
- Query 77 agcaagggcttgcaggacctgaagcaacaggtggagggg
accgcccaggaagccgtgtca 136
-
- Sbjct 60 agcaagggcttgcaggacctgaagaagcaagtggagggg
gcggcccaggaagcggtgaca 119
-
- Query 137 gcggccggagcggcagctcagcaagtggtggaccaggcc
acagaggcggggcagaaagcc 196
33Sample Multiple Alignment
34Protein domains (from ProDom database)
35Limits on best Matched Annotation Inheritance
result from many things including multi domain
proteins transitivity.
New sequence
Closest database annotated entry
Original studied protein from which
annotation was inherited.
36Protein Structure
- It is not really possible to predict protein
structure from just amino acid sequence
- PDB is a database of know protein structures
(determined by X-ray crystallography and NMR)
- There are also very handy structure viewers such
as RasMol that are free for any computer
37(No Transcript)
38(No Transcript)
39Genome Browsers
- Scientists need to work with a lot of layers of
information about the genome
- coding sequence of known genes and cDNAs
- computer-predicted genes
- genetic maps (known mutations and markers)
- gene expression
- cross species homology
40(No Transcript)
41UCSC
42Ensembl at EBI/EMBL
43Human Alleles
- The OMIM (Online Mendelian Inheritance in Man)
database at the NCBI tracks all human mutations
with known phenotypes.
- It contains a total of about 2,000 genetic
diseases and another 11,000 genetic loci with
known phenotypes - but not necessarily known gene
sequences - It is designed for use by physicians
- can search by disease name
- contains summaries from clinical studies
44(No Transcript)
45KEGG Kyoto Encylopedia of Genes and Genomes
- Enzymatic and regulatory pathways
- Mapped out by EC number and cross-referenced to
genes in all known organisms
- (wherever sequence information exits)
- Parallel maps of regulatory pathways
46(No Transcript)
47(No Transcript)
48Integrated Online Tools
- National Database/Sequence Analsysis Servers
- NCBI, EMBL/EBI, DDBJ
- Tools for specific types of data or problems
- Expasy (Protein, Mass Spec, 2-D PAGE)
- 3-D Protein Structures PDB, Predict Protein
Server
- Education oriented tools
- Biology Workbench
- Collections of links to other servers
- BCM SearchLauncher
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54The Limits of our Knowledge
- Bioinformatics is a very dynamic discipline
- Teachers can't know everything in the field
- The databases are clumsily built
- Biology is vastly more complex than our software
- Lots of our current bioinformatics programs don't
work well
- We don't have even theoretical solutions for Gene
prediction, alternative splicing, protein
structure function prediction, regulatory
networks
55What is a Gene?
- For every 2 biologists, you get 3 definitions
- A DNA sequence that encodes a heritable
trait.
- The unit of heredity
- Is it an abstract concept, or something you can
isolate in a tube or print on your screen?
- Classic vs.. modern understanding of
molecular biology
56Genome Confusion
- The sequence of a gene in the genome includes
- protein coding sequence
- introns and exons
- 5' and 3' untranslated regions on the mRNA
- promoter and 5' transcription factor binding
sites
- enhancers??
- What about alternative splicing?
- Multiple cDNAs with different sequences (that
produce different proteins) can be transcribed
from the same genomic locus
57V. Teaching Resources
- The Biology Student WorkBench
- http//peptide.ncsa.uiuc.edu/bioswb.html
- RasMol/Chime/Protein Explorer
- http//www.umass.edu/microbio/rasmol/
- http//www.umass.edu/microbio/chime/
- http//www.umass.edu/microbio/chime/explorer/
- Bioinformatics.org
- Other courses - It ain't cheating to learn from
your peers
58- Terri Attwood's Web Biocomputing tutorials
- http//www.biochem.ucl.ac.uk/bsm/dbbrowser/c32/ind
ex.html
- http//www.biochem.ucl.ac.uk/bsm/dbbrowser/jj/pref
acefrm.html
- Sequence Analysis on the Web
- Christian Büschking and Chris Schleiermacher
- http//bibiserv.techfak.uni-bielefeld.de/sadr/
- Online Lectures on Bioinformatics
- Hannes Luz, Max Planck Institute for Molecular
Genetics
- http//lectures.molgen.mpg.de/
- Using Computers in Molecular Biology
- Stuart Brown, NYU School of Medicine
- http//www.med.nyu.edu/rcr/rcr/course/index.html
- Teach Yourself Bioinformatics on the Web
- http//www.med.nyu.edu/rcr/rcr/btr/index.html
59Long Term Implications
- A "periodic table for biology" will lead to an
explosion of research and discoveries - we will
finally have the tools to start making systematic
analyses of biological processes (quantitative
biology). - Understanding the genome will lead to the
ability to change it - to modify the
characteristics of organisms and people in a wide
variety of ways
60Genomics in Medical Education
- The explosion of information about the new
genetics will create a huge problem in health
education. Most physicians in practice have had
not a single hour of education in genetics and
are going to be severely challenged to pick up
this new technology and run with it." - Francis Collins
61Stuart M. Brown, Ph.D.stuart.brown_at_med.nyu.eduww
w.med.nyu/rcr
Bioinformatics A Biologist's Guide to
Biocomputing and the Internet