Computers and Programming for Biologists - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Computers and Programming for Biologists

Description:

... its easy for the programmers Open Source Bioinformatics Almost all of the bioinformatics software that you need to do ... multiple alignment Phylip ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 37
Provided by: ResearchC59
Category:

less

Transcript and Presenter's Notes

Title: Computers and Programming for Biologists


1
Computers and Programmingfor Biologists
2
What is Bioinformatics?
  • The use of information technology to collect,
    analyze, and interpret biological data.
  • An ad hoc collection of computing tools that are
    used by molecular biologists to manage research
    data.
  • Computational algorithms
  • Database schema
  • Statistical methods
  • Data visualization tools

3
The Human Genome Project
4
A Genome Revolution in Biology and Medicine
  • We are in the midst of a "Golden Era" of biology
  • The Human Genome Project has produced a huge
    storehouse of data that will be used to change
    every aspect of biological research and medicine
  • The revolution is about treating biology as an
    information science, not about specific
    biochemical technologies.

5
The job of the biologist is changing
As more biological information becomes available
and laboratory equipment becomes more automated
...
  • The biologist will spend more time using
    computers
  • on experimental design and data analysis (and
    less time doing tedious lab biochemistry)
  • Biology will become a more quantitative science
    (think how the periodic table affected chemistry)

6
What are the Tools?
  • Alignment
  • Similarity string matching
  • Pattern search
  • Hash tables and substitution matrices
  • Clustering
  • Genome assembly and annotation

7
Align by hand
  • GATGCCATAGAGCTGTAGTCGTACCCT lt
  • gt CTAGAGAGC-GTAGTCAGAGTGTCTTTGAGTTCC

Somebody should make a computer program for this
kind of thing
8
Global vs. Local Alignments
9
BLAST Algorithm
10
gtZFISH9GNL-TI fi72b02.y1 Length
724 Score 307 bits (786), Expect 8e-82
Identities 145/200 (72), Positives 166/200
(82), Gaps 1/200 (0) Frame 3 Query 45
VLLKEYRVILPVSVDEYQVGQLYSVAEASKNXXXXXXXXXXXXXXPYEK-
DGEKGQYTHK 103 LKERLPVSVEYQVGQLYS
VAEASKN PYEK DGEKGQYTHK Sbjct 123
MLIKEFRIVLPVSVEEYQVGQLYSVAEASKNETGGGDGVEVLKNEPYEKE
DGEKGQYTHK 302 Query 104 IYHLQSKVPTFVRMLAPEGALNI
HEKAWNAYPYCRTVITNEYMKEDFLIKIETWHKPDLG 163
IY LQSKVPFVRLAP AL IHEKAWNAYPYCRTVTNEYMKF
LI IETWHKPDLG Sbjct 303 IYRLQSKVPSFVRLLAPSSALIIHE
KAWNAYPYCRTVLTNEYMKDNFLIMIETWHKPDLG 482 Query
164 TQENVHKLEPEAWKHVEAVYIDIADRSQVLSKDYKAEEDPAKFKSI
KTGRGPLGPNWKQE 223 QENVH L E WK VE
IDIADRSQV KDYK EDPA FKS KTGRGPLGPWKE Sbjct
483 EQENVHNLDSERWKQVEVIHIDIADRSQVDTKDYKPDEDPATFKSQ
KTGRGPLGPDWKKE 662 Query 224 LVNQKDCPYMCAYKLVTVK
F 243 L DCPMCAYK VTV F Sbjct 663
LPQKRDCPHMCAYKXVTVNF 722
11
(No Transcript)
12
Clustering (Phylogenetics)
13
Genome Assembly
14
Raw Genome Data
15
UCSC
16
The Challenge of New Data Types
  • Gene expression microarrays
  • thousands of genes, imprecise measurements
  • huge images, private file formats
  • Proteomics
  • high-throughput Mass Spec
  • protein chips protein-protein interactions
  • Genotyping
  • thousands of alleles, thousands of individuals

17
cDNA spotted microarrays
18
(No Transcript)
19
High-Throughput Genotyping
20
BioinformaticsBeyond Using Websites
  • You can do a lot of sophisticated bioinformatics
    using public websites
  • But at some point you may be faced with a LOT of
    data - thousands of searches, annotations, etc.
  • The only solution is to have your own
    bioinformatics computer, database, and custom
    programs.
  • Needs more processor power and more hard drive
    space than a typical desktop personal computer

21
(No Transcript)
22
(No Transcript)
23
Bioinformatics Requires Powerful Computers
  • One definition of bioinformatics is "the use of
    computers to analyze biological problems.
  • As biological data sets have grown larger and
    biological problems have become more complex, the
    requirements for computing power have also grown.
  • Computers that can provide this power generally
    use the Unix operating system - so you must learn
    Unix be a computational biologist

24
Stable and Efficient
  • Unix is very stable - computers running Unix
    almost never crash
  • Unix is very efficient
  • it gets maximum number crunching power out of
    your processor (and multiple processors)
  • it can smoothly manage extremely huge amounts of
    data
  • it can give a new life to otherwise obsolete Macs
    and PCs
  • Most new bioinformatics software is created for
    Unix - its easy for the programmers

25
Open Source Bioinformatics
  • Almost all of the bioinformatics software that
    you need to do complex analyses is free for UNIX
    computers
  • The Open Source software ethic is very strong
    among biologists
  • Bioinformatics.org
  • Bioperl.org
  • Open-bio.org
  • New algorithms generally appear first as free
    software (a publication requirement)

26
Free Software
  • Linux operating system, mySYQL database
  • Perl - programming language
  • Blast and Fasta - similarity search
  • Clustal - multiple alignment
  • Phylip - phylogenetics
  • Phred/Phrap/Consed - sequence assembly and SNP
    detection
  • EMBOSS - a complete sequence analysis package
    created by the EMBL (like GCG)

27
Computer Hardware is not Free
  • However, you can build a powerful Linux cluster
    for 20-50K (depending on how much power you
    need)
  • The real cost is for a person to manage the
    machines, install the software, and train
    scientists to use it.
  • Small schools can join together or affiliate with
    a larger neighbor.

28
Do Biologists have to become Programmers?
  • No, but it can give you a big advantage.
  • More and more of biology is becoming computer
    aided design of experiments, automated equipment,
    and computational analysis of the results.
  • I just want to say one word to you ...
    Databases

29
Why teach bioinformatics in undergraduate
education?
  • Demand for trained graduates from the biomedical
    industry
  • Bioinformatics is essential to understand current
    developments in all fields of biology
  • We need to educate an entire new generation of
    scientists, health care workers, etc.
  • Use bioinformatics to enhance the teaching of
    other subjects genetics, evolution, biochemistry

30
Genomics in Medical Education
  • The explosion of information about the new
    genetics will create a huge problem in health
    education. Most physicians in practice have had
    not a single hour of education in genetics and
    are going to be severely challenged to pick up
    this new technology and run with it."
  • Francis Collins

31
Becoming a Unix Power User
  • Learn more Unix commands
  • Use the shell to execute simple programs
  • Write scripts - automate repetitive tasks
  • Download and install the latest bioinformatics
    software
  • Drive your system manager crazy or get your own
    Unix machine
  • (Linux on an Intel machine or Mac OS-X)

32
BioPerl
  • Why re-invent the wheel?
  • Lots of common bioinformatics tasks have already
    been programmed as modules in Perl.
  • Grab sequences from GenBank, extract e-values and
    annotation from Blast results, etc.
  • Download from www.bioperl.org

33
Resources
  • Notes for Lincoln Steins course on
  • Genome Informatics
  • http//stein.cshl.org/genome_informatics/index.htm
    l
  • BioPerl.org http//bio.perl.org/
  • PERL for biologists (Kurt Stüber)
  • http//caliban.mpiz-koeln.mpg.de/stueber/perl/
  • Why Biologists Want to Program Computers
  • by James Tisdall http//www.oreilly.com/news/perl
    bio_1001.html

34
Resources for Bio-Computing
35
Stuart M. Brown, Ph.D.stuart.brown_at_med.nyu.eduww
w.med.nyu/rcr
Bioinformatics A Biologist's Guide to
Biocomputing and the Internet
Essentials of Medical Genomics
36
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com