Title: Course Objectives
1Course Objectives
- To learn about research studies driving the field
and computing techniques that have been
developed. - To learn about computational and informatics
projects related to biology, medicine, and other
life science disciplines at Emory. - To learn about opportunities for summer research
and dissertation topics. - To stimulate ideas for further collaboration
between Math/CS and X.
But impossible to give a complete treatment of
field.
2Why?Computational and Life Science?
- So you need to go see a doctor?
3Why CLS?
- A look at your personal medical history
- Do you eat right? Do you exercise?
- Do you smoke? Do you drink alcohol?
- What is your current / past profession?
- Have you had any of the following
- Difficulty breathing Circulatory problems
- Eating disorder Behavioral issues
- Heart problems Visual impairment
- Allergies Asthma
4- A shallow picture of your medical profile.
Ancestral Knowledge Future symptoms Information
today Disease
5- Follow a specific problem (nausea)
- Additional lab tests (bacterial, viral, hormonal,
ulcers, celiacs, diabetes, cancer) - More specific questions to determine extent of
problem and other symptoms
6- Look at your family medical history
- Has any one in your immediate family had any of
the following - Heart disease Diabetes
- Cancer Alzheimers
- If so who?
- Mother, Father, Siblings, Aunts/ Uncles,
Grandparents
7Ancestral Knowledge Future symptoms Information
today Disease
8- But how can this picture give any facts of the
specifics, or causes of diseases within your
ancestral medical history. - Was your grandmother a heavy smoker, was your
grandfather overweight. - Even if similar symptoms the causes may be more
due to personal choices or environment - How could we decipher the facts / causes?
- (venn diagram of symptoms and causes)
9- A deeper picture of your medical profile.
- More depth.
- Cumulation of information points to specific
diagnosis. - But this required symptoms
10- Current diagnostics like to follow a single path
at a time. - Do test, examine results, prove or disprove.
- If disproved, evaluate a second route.
- Efficient in the case of clinical costs,
inefficient in the cost of time.
11- Even with symptoms the diagnosis may be wrong.
- Car link
12- Well, the flow chart diagnosis has been completed
and the final result is a defective PCM. I just
had a strange feeling and I just cannot seem to
accept that. - The other PCM made no difference.
- What went wrong? The diagnostic trouble chart was
carefully followed and yet the end result was
incorrect? Was the flow chart misleading?
Absolutely NOT, one thing to KEEP IN MIND when
following the flow charts is that the "MOST
LIKELY" cause will be shown. There is no way to
know exactly what fails from one case to another.
I don't fault the information at all, as a matter
of fact, even though the problem was not yet
known, I do know, by following the flow chart,
what areas are correct. - So, there you have it, not every cause will be
listed as the end result when using a diagnostic
flow chart.
13- The current information is not sufficient
- What if we could add to this information?
- What would you want to add?
- Can we start the diagnosis earlier
- before symptoms?
- Is ones personal prevalence for a specific
disease measurable? - How would one determine this?
14- What can be measured?
- Recorded?
- Compared between normal and diseased?
- Can a variance be measured?
- Is this variance predictive?
15- Back to data points.
- Clinical lab studies (images, chemical
monitoring, physical exam, etc.) - Scientists are currently accumulating data in
multiple areas (DNA, RNA, protein, etc.) - Recording data for normals, diseased, with
treatment, without treatment. - Many, many replicates!
- Billions of data points
- Comparison
- What features correlate with normal or disease,
etc. - Can this feature be predictive?
16Technology and CS Requirements
- Given 1000s of instances
- queriable database
- feature definition, feature extraction
- feature selection
- comparison, classification, correlation
- prediction
- modeling predictive risk models
Will discuss this protocol in many different
instances.
17DILS 2005 keynotes
- Shankar Subramaniam, Professor of Bioengineering
and Chemistry at UCSD - the standard paradigm in biology hypothesis to
experimentation (low throughput data) to models
is being replaced by data to hypothesis to
models and experimentation to more data and
models. - need for robust data repositories that allow
interoperable navigation, query and analysis
across diverse data, a plug-and-play environment
that will facilitate seamless interplay of tools
and data and versatile biologist-friendly user
interfaces.
18Databases
- Data, Data, Data
- Organization of database (studies, experiments,
sample sets, patients, treatments) - Meta-data, including experimental conditions and
clinical data - repeated data points
- Secondary experimental procedures (more variate
data) - Incomplete data sets
- Multiple analysis runs (multiple data sets)
(scaling, normalization, archive, comparisons,
requerying) - From experimental results, re-query data on other
meta-data and reprocess - Annotations of experimental data points (genes,
proteins, etc.)
19Technology and CS Requirements
- Definition of data structure
- Download of data into database
- Storage and retrieval
- Security
- Integrated database, data archive, analytical
results archive - ...
- Feature selection and modeling
- generation of sophisticated, integrated
predictive risk models
20Predictive Health
- Health general condition of the body or mind
with reference to soundness and vigor, freedom
from disease or ailment. - Diagnose to recognize (a disease) by signs and
symptoms, to analyze the cause or nature of. - Predict to declare in advance (of symptoms) on
the basis of observation, experience, scientific
reasoning.
21Predictive Health
- Predictive health is an emerging paradigm that
emphasizes maintaining health by detecting the
genetic risk factors for illness and taking steps
to prevent disease or illness before it starts.
- In the future, providers will combine an
individuals genetic information with cutting
edge biotechnology to keep that person healthy.
Eventually, the occurrence of disease will be
seen as a failure of the health care system,
rather than its main focus. - Momentum Summer 2006, Seeking Ponces Dream
22Momentum Winter 2006-2007, DNA Rubric
- SNP accounts for some of the variation among
humans. These naturally occurring differences,
polymorphisms, help explain difference in human
appearance and why some people are susceptible to
diseases like lung cancer and others arent.
They also provide an explanation for why there
can be individualized responses to environmental
factors and medications. - These patterns (of specific variation) will help
us predict the future health of an individual and
develop personalized health treatments, including
specific drugs tailored to each individual, given
their specific genetic code. - Scott Devine, PhD, Biochemistry
23Predictive Health 2007
- Center for Health Discovery and Well-Being
- participants - 100 - 200 generally healthy people
- collect physical, medical and lifestyle
histories, environmental factors - perform 50 blood and plasma tests (including
genotypes) that target known critical predictors
of health and illness - the research program will develop and validate
novel biologic markers to predict health, disease
risk, and prognosis. - based on these profiles and a predictive risk
model, each participant will be prescribed a
personalized health program designed to address
individual risks.
24Technology and CS Requirements
- Database and Security
- Integrated database and data archive
- Feature definition, feature extraction
- Feature selection
- Comparison, classification
- Prediction
- Modeling sophisticated, integrated predictive
risk models - Annotations, data-mining
- ...
25Systems Biology
- Systems Biology is the science of discovering,
modeling, understanding and ultimately
engineering at the molecular level the dynamic
relationships between the biological molecules
that define living organisms. - Leroy Hood, ISB
- http//www.systemsbiology.org/Systems_Biology_in_D
epth
26Momentum Winter 2006-2007, Fresh Air
- Molecular signaling pathways within normal cells
follow a cascade of molecular reactions that emit
proteins, which turn on - The premise acknowledges that a single genetic
mutation doesnt cause lung cancer. Instead
there are many causes on the cellular level, with
many genetic mutations from many different
sources. - Fadlo Khuri, PhD.
- Clinical and Translational Research
27(No Transcript)
28 29- List of Model Repositories
- CellML biochemical and cellular processes
- DOQCS DB of Quantitative and Cellular
Signalling - Model DB Sense Lab, nerves and neurons
- SigPath and SigMoid Signalling pathways
- PathArt Metabolic pathways
30- Systems biology markup language
31Technology and CS Requirements
- Database and Security
- Integrated database and data archive
- Feature definition, feature extraction
- Feature selection
- Comparison, classification, correlation
- Prediction
- Modeling sophisticated, integrated predictive
risk models - Annotations, data-mining
- ...
32CS in CLS?
- 5 of biological researchers have hired a CS
or DB staff. - 95 who dont because
- do not see the need,
- have no experience in CS or managing CS,
- can not raise the funds.
- Communication, Communication, Communication
33Meta-Objectives
- How does a CS knowledgeable person become an
X-informatics or computational-X researcher? - How useful is it to work with just symbolic
abstractions? - How much X does one need to learn for the
research to be meaningful? - How can it be more mutual collaboration?
- Most of the time, it is just CS servicing X.
- X researchers really dont care how the CS is
done. Just Do It!
34Meta-Objectives CS in CLS?
- The CS scientist should know enough biology to
probe beyond the obvious question that the
biologist is asking. -
- Be able to and willing to offer direction. You
can use this CS technology or algorithm to answer
X about your data.
35NCBI Derivative Sequence Data (Maureen J. Donlin,
St. Louis University)
C
C
Curators
GA
GA
ATT
C
GA
GA
ATT
C
RefSeq
TATAGCCG
ACGTGC
TATAGCCG AGCTCCGATA CCGATGACAA
ATTGACTA
CGTGA
TTGACA
Labs
TTGACA
TTGACA
ACGTGC
Genome Assembly
TATAGCCG
ACGTGC
TATAGCCG
ATTGACTA
CGTGA
CGTGA
ATTGACTA
TATAGCCG
CGTGA
ATTGACTA
TTGACA
ATTGACTA
TATAGCCG
ATTGACTA
TATAGCCG
TATAGCCG
TATAGCCG
TATAGCCG
ATT
C
GenBank
GA
UniGene
AT
GA
C
C
Algorithms
ATT
C
C
GA
GA
ATT
GA
GA
ATT
C
C
GA
ATT
GA
GA
ATT