Title: Introduction to Computational Biosciences and Bioinformatics
1Introduction to Computational Biosciences and
Bioinformatics
- Alex Ropelewski
- ropelews_at_psc.edu
- Pittsburgh Supercomputing Center
- National Resource for Biomedical Supercomputing
- http//staff.psc.edu/ropelews/jsu/Begin_CS_Jackson
_State_Intro_Computational_BioScience.ppt - http//compbio.jsums.edu/awareness/week1.html
2Computational Biosciences
- The application of
- computer science, engineering,
- physical science and mathematics
- to the way in which plants, animals and humans
function
3Computational Bioscience Fields
- Bioinformatics
- Structural biology
- Genetic databases
- Quantitative ecology
- Physiological modeling
- Medical informatics
- Image processing and visualization
- Medical imaging
- Biomedical instrumentation
- Biomathematics
- Neuroscience
- Telemedicine
- Biomedical engineering
- Other related areas
4Bioinformatics
- The interdisciplinary science of using
computational approaches to analyze, classify,
collect, represent and store biological data with
the goal of accelerating and enhancing the
understanding of DNA, RNA and Protein sequences.
5Structural Biology
The branch of the sciences concerned with the
molecular structure of biological macromolecules
such as proteins and nucleic acids, how they
acquire the structures they have, and how
alterations in their structures affect their
function.
6Physiological Modeling
The study of the mechanical, physical, and
biochemical functions of living organisms through
the use and creation of mathematical models of
physiological systems. Examples include models of
components of organisms, such as particular
organs or cell systems.
7Image Processing and Visualization
The science of organizing, displaying, and
analyzing image data taken from any living
organism in a realistic life-like manner.
8Computational Neuroscience and Signal Processing
Applying mathematical and computational methods
to understand the signaling, control and other
networks in living organisms
9Who Employs Computational Bioscientists?
- Pharmaceuticals Biotechnology (Bayer,
Schering-Plough, Amgen, Merck, Eli Lilly, etc,) - Hospitals (particularly research hospitals)
- Agriculture (Monsanto, Pioneer, etc.)
- Academia (particularly research
universities/institutes) - Government
- NIH (many institutes including NLM, NCBI, NCI,
CDC) - DOE (National labs)
- Department of Defense (including Army Corps of
Engineers) - Agriculture, Veterans Affairs, NSF
- Government Contractors (such as Computercraft,
SRA)
10Computational Biosciences Job Growth
Engineers, Life and Physical Scientists and
Related Occupations. Occupational Outlook
Handbook, 2008-09 Edition. Department of Labor,
Bureau of Labor Statistics
11Computational Biosciences Salaries
National Occupational Employment and Wage
Estimates Department of Labor, Bureau of Labor
Statistics, May 2007
12Computational Biosciences
- Interdisciplinary skills are required
- Require knowledge in the following areas
- Biology
- Chemistry
- Computer Science
- Mathematics
- Statistics
- Physics
- Engineering
13Computational Biosciences Required Skill Sets
- Agricultural and food scientists need the
ability to apply statistical techniques, and the
ability to use computers to analyze data and to
control biological and chemical processing. - Biological scientists usually study allied
disciplines such as mathematics, physics,
engineering and computer science. Computer
courses are beneficial for modeling and
simulating biological processes, operating some
laboratory equipment and performing research in
the emerging field of bioinformatics - Computer skills are essential for prospective
environmental scientists and hydrologists.
Students who have some experience with computer
modeling, data analysis and integration, digital
mapping, remote sensing and Geographic
Information Systems will be the most prepared to
enter the job market - Medical scientists in addition to required
courses in chemistry and biology undergraduates
should study allied disciplines such as
mathematics, engineering, physics, and computer
science
Engineers, Life and Physical Scientists and
Related Occupations. Occupational Outlook
Handbook, 2008-09 Edition. Department of Labor,
Bureau of Labor Statistics
14Computational Biosciences Required Skill Sets
- Developments in the field of Chemistry that
involve life sciences will expand, resulting in
more interaction among biologists, engineers,
computer specialists and chemist. Chemistry
majors usually study biological sciences
mathematics physics and increasingly computer
science. Computer courses are essential because
employers prefer job applicants who are able to
apply computer skills to modeling and simulation
tasks and operate computerized laboratory
equipment. This is increasingly important as
combinatorial chemistry and advanced screening
techniques are more widely applied. Courses in
statistics are useful because chemists need the
ability to apply basic statistical techniques.
Chemists should experience employment growth in
pharmaceutical and biotechnology research as
recent advances in genetics open new avenues of
treatment for diseases. Job growth for chemists
is expected to be strongest in pharmaceutical and
biotechnology firms.
Engineers, Life and Physical Scientists and
Related Occupations. Occupational Outlook
Handbook, 2008-09 Edition. Department of Labor,
Bureau of Labor Statistics
15Bioinformatics
- The interdisciplinary science of using
computational approaches to analyze, classify,
collect, represent and store biological data with
the goal of accelerating and enhancing the
understanding of DNA, RNA and Protein sequences.
16What is a Sequence?
- A sequence is a way to represent a protein, DNA,
or RNA molecule as a character string.
Phospholipase A2 - Bos taurus (Bovine).
MRLLVLAALLTVGAGQAGLNSRALWQFNGMIKCKIPSSEPLLDFNNYGCY
CGLGGSGTPV DDLDRCCQTHDNCYKQAKKLDSCKVLVDNPYTNNYSYSC
SNNEITCSSENNACEAFICNC DRNAAICFSKVPYNKEHKNLDKKNC
17Molecular Alphabet
- DNA/RNA Sequences Letters represent side chains
or bases - A - Adenine
- C - Cytosine
- G - Guanine
- T - Thymine (DNA)
- U - Uracil (RNA)
- X or N (Unknown)
Image from Wikipedia Commons http//en.wikipedia.
org/wiki/FileDNA_chemical_structure.svg
18Molecular Alphabet
- Protein Sequences Letters represent amino
acids
- A - Alanine
- R - Arginine
- N - Asparagine
- D - Aspartic acid
- C - Cysteine
- E - Glutamic acid
- Q - Glutamine
- G - Glycine
- H - Histidine
- I - Isoleucine
- L - Leucine
- K Lysine
- M Methionine
- F - Phenylalanine
- P - Proline
- S - Serine
- T - Threonine
- W - Tryptophan
- Y - Tyrosine
N
Q
P
G
I
C
L
C
Y
Image from Wikipedia Commons http//en.wikipedia.
org/wiki/FileOxytocin.jpg
19What is an Information Library?
- A compilation of prior experimental knowledge
about biologically relevant molecules into a
computer system. - Bioinformatics power is in the ability to
leverage and apply this prior experimental
knowledge to additional biological problems. - In order to effectively search prior experimental
knowledge, the prior experimental knowledge must
be organized in a way that makes sense from both
a computer science prospective and a biological
point of view.
20How is Information Organized?
- From a computer-science perspective, there are
several ways that data can be organized and
stored - In a relational database
- In a flat file
- In a networked (hyperlinked) model
- From a biologists perspective, there are also
several different ways that data can be organized
- Sequence
- Structure
- Family/Domain
- Species
- Taxonomy
- Function/Pathway
- Disease/Variation
- Publication Journal
- And many other ways
21Representing Biological Data
- Sequence Libraries
- Character based
- Classification Libraries (Aligned sets of
sequences) - Ambiguous consensus patterns
- Weight Matrix
- Position Specific Scoring Matrix (Profile)
- Hidden Markov Models
- Structural Libraries
- X,Y,Z coordinates for each alpha carbon atom
- Taxonomy
- Tree structure represents the taxonomic lineage
22What does a biologist do with this data?
- Search for similar sequences (sequences that
share a biological relationship)
23What does a biologist do with this data?
- Search for similar sequences (sequences that
share a biological relationship)
24What does a biologist do with this data?
- Align groups of sequences that share a biological
relationship (family)
25What does a biologist do with this data?
- Understand phylogenetic relationships of the
family.
26What does a biologist do with this data?
- Understand key positions (residues) of the family.
27What does a biologist do with this data?
- Understand how key positions affect the structure
and function of the molecule being studied
28What does a biologist do with this data?
- Use structural data for a molecule from one
species to model a related molecule from another
species.
29Job Opportunities in Bioinformatics
- This course will teach you many essential skills
that are asked for in these job postings. - Lets look at actual job postings asking for
bioinformatics expertise - Not all jobs will be labeled bioinformatics or
sequence analysis many are in a related
computational bioscience field. - Specific skills required
30Summer Internship-Computational Biology
- QualificationsTo be eligible for a
Computational Biology Summer Scientific
Internship students will have completed their
undergraduate Sophomore year (by June 2009) - Be majoring in a biological, chemistry or
computer science program. - Candidates would have completed at least one
programming course before the start of the
internship. - All interns must have current authorization to
work for any employer within the United States. - Experience with MatLab, SQL, C and/or PERL
experience is desired.
http//jobview.monster.com/getjob.aspx?JobID78206
043JobTitleSummerInternship-ComputationalBiolo
gyqcomputationalbiologycyuslid316re0pg1
dv1AVSDM2008-12-18143a203a00seq2fseo1i
sjs1re1000
31Bioinformatics Assembly Analyst
- Responsibilities
- assembling genome sequence data using a variety
of tools and parameters and performing the
experiments needed to evaluate sequencing
strategies - using existing software and databases to analyze
genomic data and correlating assemblies and
sequences with a variety of genetic and physical
maps and other biological information - identifying problems and serving as point of
contact for various groups to propose and
implement solutions - proposing and implementing upgrades to existing
tools and processes to enhance analysis
techniques and quality of results - developing and implementing scripts to
manipulate, format, parse, analyze, and display
genome sequence data and developing new
strategies for analysis and presentation of
results. - Requirements
- a bachelor's degree in biology or related field
- at least three years of experience in DNA
sequencing and sequence analysis. - Must possess solid knowledge of sequencing
software and public sequencing databases. - Knowledge of bioinformatics tools helpful.
- http//sh.webhire.com/servlet/av/jd?ai631ji2285
147snI
32Bioinformatics Analyst
- Responsibilities
- The Bioinformatics Analyst will process sequence
data and apply quality control measures for
generating high quality raw sequence and
assembled data from next generation sequencing
technologies.? - Will perform whole genome alignments using
existing alignment tools, including BLAST, mummer
and patternhunter Perform mapping and
post-mapping analysis with short reads using
third-party and internally developed tools.? - Responsible for receiving, processing and
managing sequence data.? - Evaluate new methodologies and tools and improve
data processing and quality control protocols.? - Develop suitable metrics for reporting the
completeness and quality of the sequence
delivered to the customers.? - Requirements
- B.?S.? in biology, computer science,
bioinformatics or related field, or equivalent
combination of education and experience - A minimum of 2 years experience in genomics and
bioinformatics-related work.? - Proficiency in Unix and experience in one or more
of these programming languages -perl, SQL, jython
and java is required.? - Familiar with the use of commonly-used sequence
analysis tools and genomic databases - Willing to multi-task and respond to new
challenges as required.? - Excellent communication skills.?
- Hands-on experience in a research or production
environment
http//jobview.monster.com/getjob.aspx?JobID78527
133JobTitleBioinformaticsAnalystbrd1qbioinf
ormaticscyuslid316re130AVSDM2009-01-0912
3a563a00pg1seq11fseo1isjs1re1000
33Business Systems Analyst
- Responsibilities
- The ideal candidate should be a highly motivated
team player with a strong understanding of
informatics solutions to biology and chemistry,
especially in the area of data visualization/?stat
istical analysis and with proven record of
building/?integrating effective tools for
scientists to help them in their daily work.? - Actively work with scientists/?computational
biologists in a disease area to understand their
needs - Define proper data analysis solution(s) to meet
their scientific needs - Perform rapid prototyping to refine the
requirements with proper documentation - Work with internal and external software teams,
where appropriate to design/?implement proper
solutions to meet scientists' needs - Work either as a team member or lead a team to
deliver data analysis platforms to
scientists/?computational biologists - Work effectively with different NITAS groups to
ensure a globally consistent implementation
scheme.? - Requirements
- Bachelor's degree in computer science, Biology,
Bioinformatics or comparable qualification - At least 3-5 years hands-on experience on data
analysis in a drug discovery, scientific or
biotech environment - Strong communications and interpersonal skills
- Proven capabilities interacting with scientists
and being customer service oriented - Ability to work independently and/?or as part of
a team - Familiarity with scientific LIMS such as
ActivityBase, and data visualization/?analysis
tools such as Spotfire - Solid understanding of relational databases and
familiarity with Oracle and/?or SQL server - Good understanding in fundamentals of software
engineering.?
34Summary
- Wide variety of jobs
- Biology, especially molecular biology and
genetics - Some statistics
- Computer skills
- UNIX
- Bioinformatics Tools
- Database (SQL)
- Some Programming
- Web
- Bioinformatics can be a rewarding career path
35National Resource for Biomedical Supercomputing