Title: Bioinformatics: Definitions, Challenges and Impact on Health Care Systems
1Bioinformatics Definitions, Challenges and
Impact on Health Care Systems
- Joyce Mitchell, PhD
- Professor and Chair
- Department of Biomedical Informatics
- University of Utah School of Medicine
- http//uuhsc.utah.edu/medinfo
2Topics
- What is Bioinformatics?
- Scope of Bioinformatics
- Genomics
- Proteomics
- Functional genomics
- Genomics data and patient care
- Impact of Bioinformatics on Health Information
Systems
3Central Dogma of Molecular Biology
Transcription
DNA
RNA
Protein
Phenotype
Phenotype
Translation
Post Translational Modification
Replication
4 What is Bioinformatics?
5NIH Working Definition
- Bioinformatics Research, development, or
application of computational tools and approaches
for expanding the use of biological, medical,
behavioral or health data, including those to
acquire, store, organize, archive, analyze, or
visualize such data. - http//www.bisti.nih.gov/CompuB
ioDef.pdf
6AnotherNCBI (National Center for Biotechnology
Information
- Bioinformatics is the field of science in
which biology, computer science, and information
technology merge into a single discipline. The
ultimate goal of the field is to enable the
discovery of new biological insights and to
create a global perspective from which unifying
principles in biology can be discerned. - http//www.ncbi.nlm.nih.gov/About/primer/bioinform
atics.html
7Bioinformatics Health Informatics
- Bioinformatics is the study of the flow of
information in biological sciences. - Health Informatics is the study of the flow of
information in patient care. - These two field are on a collision course as
genomics data becomes used in patient care. - Russ Altman,MD, PhD, Stanford Univ.
8Scope of Bioinformatics
9Omes and Omics
- Genomics
- Primarily sequences (DNA and RNA)
- Databanks and search algorithms
- Supports studies of molecular evolution (Tree
wars) - Proteomics
- Sequences (Protein) and structures
- Mass spectrometry, X-ray crystallography
- Databanks, knowledge bases, visualization
- Functional Genomics (transcriptomics)
- Microarray data
- Databanks, analysis tools, controlled
terminologies - Systems Biology (metabolomics)
- Metabolites and interacting systems
(interactomics) - Graphs, visualization, modeling, networks of
entities
10Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Functional Genomics (Transcriptomics)
Structural Genomics
Phenomics
Proteomics
11Human Genome Project
- Human Genome Project - International research
effort - Determine sequence of human genome and other
model organisms - Began 1990, completed 2003
- Next steps for 20,000 genes
- Function and regulation of all genes
- Significance of variations between people
- Cures, therapies, genomic healthcare
12Genome and Genomics
- Genome entire complement of DNA in a species
- Both nuclear and mitochondrial/chloroplast
- Variants among individuals
- Genomics study of the sequence, structure and
function of the genome. Study relationships
among sets of genes rather than single genes. - Comparative genomics study of the differences
among species. Usually covers evolutionary
studies of differences conservation over time.
13Genome Databases (e.g., GenBank)
- Consists of
- long strings of DNA bases ATCG..
- Annotations of this database to attach meaning to
the sequence data. - Example entry from GenBank
- http//www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val
NM_000410doptgb Hemochromatosis gene HFE
14(No Transcript)
15The Genome Sequence is at handso?
The good news is that we have the human genome.
The bad news is its just a parts list
16The Human Genome Project has catalyzed striking
paradigm changes in biology - biology is an
information science.
- Leroy Hood, MD, PhD
- Institute for Systems Biology
- Seattle, Washington
17Genomes In Public Databases
- Published complete genomes
- Ongoing prokaryotic genome projects
- Ongoing eukaryotic genomes
2700
http//www.genomesonline.org/
18Genomics activities
- Sequence the genes and chromosomes done by
breaking the DNA into parts - Map the location of various gene entities to
establish their order - Compare the sequences with other known sequences
to determine similarity - Across species, conserved sequence motifs
- Predict secondary structure of proteins
- Create large databases GenBank, EMBL, DDBJ
- Develop algorithms and similarity measures
- BLAST and its many forms
19Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Genomics
Proteomics
Transcriptomics Functional Genetics
20Proteome vs Transcriptome
- Functional genomics (transcriptomics) looks at
the timing and regulation of gene products (mRNA,
primarily) - Proteome is final end-product (set of many or all
proteins). - Relationship between transcriptome and proteome
is complex, due to longevity of mRNA signal,
subsequent control of translation to protein, and
post translational modifications.
21Functional Genomics TechnologiesGene Chips,
Microarrays, etc
22Functional Genomics Microarrays
- Transcriptome and transcriptomics
- High throughput technique designed to measure the
relative abundance of mRNA in a cell or tissue
in response to an experiment. - Also called gene expression analysis
23GeneChip synthesis
24- Structure of a Gene Chip
- Animation of Gene Chip experiment
- http//www.affymetrix.com/corporate/outreach/lesso
n_plan/educator_resources.affx
25Characteristics of Array Data
- Voluminous tens of thousands of variables with
relatively few observations of each (upside down
vs. classical biostatistics) - Noisy error rates up to 8
- Methods designed to detect patterns and
associations always find patterns and
associations
26Experimental Design
- A fundamental challenge of microarray
experiments underdetermined systems
Kohane IS, Kho AT, Butte AJ. Microarrays for an
Integrative Genomics. (The MIT Press Cambridge,
MA 2003), p. 11.
27(No Transcript)
28Uses of Expression Profiling
- Pharmaceutical research
- ID drug targets by comparing expression profile
of drug-treated cells with those of cells
containing mutations in genes encoding known drug
targets - Disease Dx and Tx
- Distinguish morphologically similar cancers
- DLBCL (Poulsen et al (2005) Microarray-based
classification of diffuse large B-cell lymphomas
European Journal of Haematology 74(6)453-65.)) - Therapy potential
- Rabson AB, Weissmann D. From microarray to
bedside targeting NF-kappaB for therapy of
lymphomas. Clin Cancer Res. 2005 Jan 111(1)2-6.
29Recent Applications
- Diagnostic tool to screen for infective agents
- Chip imprinted with set of pathogenic genomes
used to identify bacterial, viral, or parasite
genomic material in patients body fluids - Diagnostic chip to check for mutations involved
in drug-gene interactions. - Roche Amplichip
30Public Microarray Data Repositories
- Major public repositories
- GEO (NCBI)
- http//www.ncbi.nlm.nih.gov/geo/
- ArrayExpress (EBI)
- http//www.ebi.ac.uk/arrayexpress/
31Standards and Repositories
- Brazma, A, et al. Minimum information about a
microarray experiment (MIAME)-toward standards
for microarray data. Nature Genetics. 2001
Dec29(4)373. - http//www.nature.com/cgi-taf/DynaPage.taf?file/
ng/journal/v29/n4/full/ng1201-365.html - Ball, CA, et al. Submission of Microarray Data to
Public Repositories. PLoS Biology. 2004
September 2 (9) e317 - http//www.pubmedcentral.nih.gov/articlerender.fc
gi?toolpubmedpubmedid15340489
32Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype Tissues Organs Organisms
Genomics
Proteomics
Transcriptomics Functional Genetics
33Proteome and Proteomics
- Proteome the entire set of proteins (and other
gene products) made by the genome. - Proteomics study of the interactions among
proteins in the proteome, including networks of
interacting proteins and metabolic
considerations. Also includes differences in
developmental stages, tissues and organs.
34Protein Functions
- Catalysis
- Transport
- Nutrition and storage
- Contraction and mobility
- Structural elements
- Cytoskeleton
- Basement membranes
- Defense mechanisms
- Regulation
- Genetic
- Hormonal
- Buffering capacity
35Protein Databases
- SwissProt
- PIR
http//www.pir.uniprot.org/ - GENE http//www.ncbi.nlm.nih.gov/gene
- InterPro http//www.ebi.ac.uk/interpro/
- Correspond to (and derived from) Genome data
bases - All connected by Reference Sequences (NCBI)
UniProt
36Gene/Protein Database entries
- HFE record in Entrez GENE (NCBI)
- http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
genecmdretrievedoptGraphicslist_uids3077
37Structure Function Determination
- X-ray crystallography
- Nuclear magnetic resonance spectroscopy and
tandem MS/MS - Computational modeling
- Sequence alignment from others
- Homology modeling
38Structure Databases
- Contain experimentally determined and predicted
structures of biological molecules - Most structures determined by X-ray
crystallography, NMR - Example MMDB molecular modeling db
http//www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.sh
tml - HFE Entry
- http//www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv
.cgi?form6dbtDoptsuid9816
39Protein Interaction Databases
- Record observations of protein-protein
interactions in cells - Attempts to detail interactions observed in
thousands of small-scale experiments described in
published articles - Examples
- BIND Biomolecular Interaction Network Database
- DIP Database of Interacting Proteins
- MIPS Munich Information Center for Protein
Sequences - PRONET Protein interaction on the Web
- Many others, both academic and commercial
40Controlled Vocabularies in Bioinformatics
- The Gene Ontology http//www.geneontology.org/
- Knowledge about gene function (the ontology
itself) - Annotation of gene products (for comparisons)
- The MGED Ontology (arising from MIAME)
- http//mged.sourceforge.net/
- Annotation of microarray experiments for public
repositories - Clinical Bioinformatics Ontology
- Annotation of gene tests in electronic medical
records - http//www.cerner.com/cbo
- MIAPE from Proteomics Standards Initiative (PSI)
- Annotation of proteomics experiments for public
repositories - http//psidev.sourceforge.net/
41Genomics Data and Patient Care
- From genotype to phenotype
42Human Disease Gene Specifics
- Genes linked to human diseases (9-2004)
- 425 in 2 yrs
- 1700/20,000 9 of loci
43Informatics Issues related to Genomics Data and
Patient Care
- Linking known data for genes causing human
diseases to clinical decision support and EMR
documentation - Representation of genetic data in electronic
medical records
44Common Questions
- What genes cause the condition?
- What are the normal function of the gene?
- What mutations have been linked to diseases?
- How does the mutation alter gene function?
- What laboratories are performing DNA tests?
- Are there gene therapies or clinical trials?
- What names are used to refer to the genes and the
diseases? - What other conditions are linked to these same
genes?
45Answers exist online
- but it is not easy answers in many places
- Cant navigate by genes names - must use hot
links and numeric identifiers - The number and function of alternate forms of the
protein are inconsistently reported - Synonymy (many names, same meaning) and polysemy
(same name, different meanings) cause confusion - Upper and lower case are used for species
distinctions
46Major Challenges of Navigation
- Complexity of data
- Dynamic nature of the data
- Diverse foci and number of data/knowledge base
systems - Data and knowledge representation lack standards
- Can navigate if you know what you are looking for.
47Genetics Home Reference
- Consumer health resource to help the public
navigate from phenotype to genotype. - Focus on health implications of the Human Genome
Project. - http//ghr.nlm.nih.gov
- Mitchell, Fun, McCray, JAMIA, 2004 Nov
11(6)439-437
48Genetics is Impacting Medicine Today
- 2000 genes health conditions
- gt 1500 gene tests for diagnosis
- Relate to diagnosis, therapy, drug dosage,
occupational hazards, reproductive plans, health
risks, . - And direct to consumer genetic test marketing
(23andMe, navigenetics, )
49Well-known Examples (germ line)
- Pharmacogenetics
- CYP450 alleles exaggerated, diminished or
ultra-rapid drug responses. E.G., Warfarin. 93
of patients are OK on standard doses. 7 of
patients have severe hemorrhage. CYP2C92 and
CYP2C93 most severe of 6 known mutations. - Environmental susceptibility
- Sickle Cell trait carrier and malaria parasite
- Nutrition
- PKU and avoidance of phenylalanine
50Example (somatic mutation)Iressa (gefitinib)
erlotinib
- Non-small cell lung CA 140,000 pt/yr
- Iressa (Astra Zeneca) causes remission in 1 of 10
patients. Newer drug is erlotinib - Iressa erlotinib efficacy correlates with EGFR
mutation in the tumor. Now have gene testing for
EGFR so can target appropriate people.
http//www.sciencemag.org/cgi/content/full/305/568
8/1222a
51Implications for Health Care System
- More gene tests will be ordered. reports of 300
increase in gene tests in 2003. - Arch Pathol Lab Med 2004, 128(12)1330-1333
- The FDA will regulate panels of tests.
- http//www.fda.gov/bbs/topics/news/2004/new01149.h
tml - Non-discrimination laws for insurance and
employment will open a floodgate. GINA - Preventive healthcare will play a larger part.
- Environmental risk factors dictate OSHA-type
approach to worker empowerment and education
about safe behavior
52Unsolved Informatics IssuesWhat Should Be
Stored in the EMR?
- Complete DNA sequence for specific genes into the
EMR? Where? - Microarray and gene chip data?
- Meta-data about the DNA sequence arrays?
- If not the sequence (ie., diff from reference
sequence), what to do when the reference sequence
changes? Or gene chip changes? - How to trigger alerts and reminders? And for
what?
53Genetic data in electronic medical records
- Implications for component systems
- Laboratory
- Pharmacy
- Computerized order entry
- Documentation and notes
- Knowledge management
- Alerts and reminders
- Finding patients matching profiles
- Practice guidelines and clinical trials
- Appropriate therapies and medications
54Genome Data and Other Information Systems
- Genomic information will be pervasive in all
healthcare information systems. - Also in public health systems
- Newborn screening
- Tissue and organ banks
- DOD requires DNA samples
- Bioterrorism and homeland security
- Identification of World Trade Center victims
- Privacy and security issues are important but not
inherently different than other EMR data.
55Summary
- Informatics will be the key enabling technology
for personalized, genomic medicine. - Current separation between bioinformatics and
clinical informatics will diminish as the two
subdisciplines merge
56Optional ExerciseHands-on with GHR
- Scavenger hunt with hemochromatosis and the genes
that influence it. - Explore the Genetics Home Reference by answering
the following questions. Start at
http//ghr.nlm.nih.gov .
57GHR Scavenger Hunt
- How common is hemochromatosis?
- How many genes have been proven to be involved in
hemochromatosis when the genes are mutated? - What are the symbols for these genes?
- Can you find the link to MedlinePlus with health
information on hemochromatosis?
58GHR Scavenger Hunt
- What are the names of the patient support
associations for hemochromatosis? - One synonym for this condition is bronze
diabetes. Can you find a reason for this? - What kind of damage is done to the liver of
people with hemochromatosis?
59GHR Scavenger Hunt
- For the genes involved in hemochromatosis, how
many of them are available as a DNA test? - Give one place where you would choose to send a
tissue sample for DNA testing. - What sites are listed under Research Resources
for the TFR2 gene? - How many alternately spliced proteins for TFR2?
- In what tissues is this gene expressed?
60GHR Scavenger Hunt
- How do people inherit hemochromatosis?
- Do the genes involved in hemochromatosis cause
other health conditions when they are mutated? - Can you find a protein sequence for one of the
genes? - What clinical trials are available for
hemochromatosis patients close to where you live?
61Questions to
- Joyce Mitchell
- Joyce.mitchell_at_hsc.utah.edu
- http//uuhsc.utah.edu/medinfo