Title: Medical Genomics
1Medical Genomics
- The Impact of Genome Data and New Technologies
on Health Care
Stuart M. Brown Research Computing, NYU School of
Medicine
2A Genome Revolution in Biology and Medicine
- We are in the midst of a "Golden Era" of biology
- The Human Genome Project has produced a huge
storehouse of data that will be used to change
every aspect of biological research and medicine - The revolution is mostly about treating biology
as an information science, not about specific
biochemical technologies.
3- I. The Human Genome Project
- II. Genomics
- - microarrays
- - SNP genotyping
- III. The medical and business applications
4The Human Genome Project
5Bold Words from Francis Collins
- The history of biology was forever altered a
decade ago by the bold decision to launch a
research program that would characterize in
ultimate detail the complete set of genetic
instructions of the human being.
Francis S. Collins Director of the National
Human Genome Research Institute N Engl J Med 1999
88242-65
6 A review of some basic genetics
7(No Transcript)
8DNA
- 4 bases (G, C, T, A)
- base pairs
- G--C
- T--A
- genes
- non-coding regions
9Decoding Genes
10- The human genome is the the complete DNA content
of the 23 pairs of human chromosomes - 44
autosomes plus two sex chromosomes - - approximately 3.2 billion base pairs.
11Genome Projects
- Complete genomic sequences
- Dozens of microorganisms
- Yeast, C. elegans, Drosophila
- Mouse
- Human
- Comparative genomics
- All this data is enabling new kinds of research -
for those with the computational skills to take
advantage of it.
12How does genome sequencing technology work?
- Molecular biology of the Sanger method
- Manual Gels vs. ABI machines
- Sub-cloning of fragments - BAC, PAC, cosmid,
plasmid, phage - The need for computers to assemble the "reads"
and manage the workflow
13(No Transcript)
14(No Transcript)
15- Automated sequencing machines,
- particularly those made by PE Applied
Biosystems, use 4 colors, so they can read all 4
bases at once.
16(No Transcript)
17Subcloning
- DNA sequencers can only read small fragments of
DNA 500-1000 bases long - It is necessary to break the genome into small
pieces . - Individual chromosomes are cut into 1 million
base chunks that are cloned into large vectors
called BACs, PACs, and YACs. - These pieces can then be further cut into
sequenceable pieces (1000 bases) and cloned into
plasmid or phage vectors.
18(No Transcript)
19Raw Genome Data
20Lots of Sequence Data
- How to extract useful knowledge from all of this
data? - Need sophisticated computer tools
- Find the genes
- Figure out what they do (function)
- Diagnostic tests
- Medical treatments
21What is a Gene?
- For every 2 biologists, you get 3 definitions
- A DNA sequence that encodes a heritable
trait. - The unit of heredity
- Is it an abstract concept, or something you can
isolate in a tube or print on your screen? - Classic vs. modern understanding of molecular
biology
22Classic Molecular Biology
- A gene is a DNA sequence at a particular locus on
a chromosome that encodes a protein. - The Central Dogma of Molecular Biology
-
- DNA gt RNA gt Protein
- A mutation changes the DNA sequence - leads to a
change in protein sequence - or no protein. - Alleles are slightly different DNA sequences of
the same gene.
23Genome Confusion
- The sequence of a gene in the genome includes
- protein coding sequence
- introns and exons
- 5' and 3' untranslated regions on the mRNA
- promoter and 5' transcription factor binding
sites - enhancers??
- What about alternative splicing?
- Multiple cDNAs with different sequences (that
produce different proteins) can be transcribed
from the same genomic locus
24Finding genes in genome sequence is not easy
- About 1 of human DNA encodes functional genes.
- Genes are interspersed among long stretches of
non-coding DNA. - Repeats, pseudo-genes, and introns confound
matters
25- The next step is obviously to locate all of the
genes and describe their functions. This will
probably take another 15-20 years!
26How Many Genes?
- The current estimate is 34,000 human genes.
- The same number as the mouse, only about 5 times
more than yeast. - Yet two different versions of the human genome
(Celera vs. Ensembl/UCSC) show only about 50
overlap between the genes that they have
described.
27All the Genes?
- Any human cDNA can now be found in the genome by
similarity searching with 99 certainty. - However, the sequence still has many gaps
- unlikely to find a completely uninterrupted
genomic segment for any gene - still cant identify pseudogenes with certainty
- This will improve as more sequence data
accumulates
28Data Mining Tools
- Scientists need to work with a lot of layers of
information about the genome - coding sequence of known genes and cDNAs
- genetic maps (known mutations and markers)
- gene expression
- cross species homology
29(No Transcript)
30UCSC
31Ensembl at EBI/EMBL
32(No Transcript)
33(No Transcript)
34II. Genomics
- What is Genomics?
- An operational definition
- The application of high throughput automated
technologies to biology. - A philosophical definition
- A wholistic or systems approach to the study of
information flow within a cell.
35Genome Sequencing created Genomics
- All genomics technologies depend on the data
produced by genome sequencing - Do molecular biology in a massively parallel
fashion using robotics and automated data
collection - Build databases rather than ask questions about
single genes or a single process
36Genomics Technologies
- Automated DNA sequencing
- Automated annotation of sequences
- DNA microarrays
- gene expression (measure RNA levels)
- SNP Genotyping
- Genome diagnostics (genetic testing)
- Proteomics
- Protein identification
- Protein-protein interactions
37DNA chip microarrays
- Put a large number (100K) of cDNA sequences or
synthetic DNA oligomers onto a glass slide (or
other substrate) in known locations on a grid. - Label an RNA sample and hybridize
- Measure amounts of RNA bound to each square in
the grid - Make comparisons
- Cancerous vs. normal tissue
- Treated vs. untreated
- Time course
- Many applications in both basic and clinical
research
38Goal of Microarray experiments
- Microarrays are a very good way of identifying a
bunch of genes involved in a disease process - Differences between cancer and normal tissue
- Tuberculosis infected vs resistant lung cells
- Mapping out a pathway
- Co-regulated genes
- Finding function for unknown genes
- Involved these processes
39Competing Microarray Technologies
- Affymetrix Gene chip system
- Uses 25 base oligos synthesized in place on a
chip - Can have as many as 20,000 genes on a chip
- Arrays get smaller every year (more genes)
- Chips are very expensive
- Proprietary system black box software, can
only use their chips - cDNA spotting technology
- Multiple vendors, or make your own
- Can buy chips, complete systems, or contract
services (Incyte) - Hundreds to a few thousands of genes per chip
- More sensitive, but less specific than Affymetrix
system - Oligonucleotides
- Nylon Filters
40(No Transcript)
41(No Transcript)
42cDNA spotted microarrays
43(No Transcript)
44Direct Medical Applications
- Diagnosis
- Type of cancer
- Aggressive or benign?
- Monitor treatment outcome
- Is a treatment having the desired effect on the
target tissue?
45Human Genetic Variation
- Every human has essentially the same set of genes
- But there are different forms of each gene --
known as alleles - blue vs. brown eyes
- genetic diseases such as cystic fibrosis or
Huntingtons disease are caused by dysfunctional
alleles
46- Alleles are created by mutations in the DNA
sequence of one person - which are passed on to
their descendants
47Effects of Mutations
- Mutations occur randomly throughout the DNA
- Most have no phenotypic effect (non-coding
regions, equivalent codons, similar AAs) - Some damage the function of a protein or
regulatory element - A very few provide an evolutionary advantage
48Human Alleles
- The OMIM (Online Mendelian Inheritance in Man)
database at the NCBI tracks all human mutations
with known pheontypes. - It contains a total of about 2,000 genetic
diseases and another 11,000 genetic loci with
known phenotypes - but not necessarily known gene
sequences - It is designed for use by physicians
- can search by disease name
- contains summaries from clinical studies
49(No Transcript)
50Clinical Manifestationsof Genetic Variation
- (All disease has a genetic component)
- Susceptibility vs. resistance
- Variations in disease severity or symptoms
- Reaction to drugs (pharmacogenetics)
- All of these traits can be traced back to
particular genes (or sets of genes) but we don't
know these associations yet.
51 So Whats a SNP
- A mutation that causes a single base change is
known as a Single Nucleotide Polymorphism (SNP) - SNPs are very common in the human population (one
SNP every 1250 bases) - there are SNPs located near all genes
- they can be used as markers
- Most of these have no visible effect
- in regions between genes
52Genome Sequencing find SNPs
53SNP Genotyping
- SNPs are a form of mutation that can be used to
measure genetic differences in a high-throughput
fashion. - A genomics approach to genetic testing
- Lots of room for bio-technology innovation
- Allele-specific PCR
- Site specific sequencing
- Genotyping microarray chips
54SNP Genotyping
- It is possible to measure many thousands of SNPs
simultaneously in a small blood sample from a
patient - Can compare genotypes for SNP markers linked to
virtually any trait - A human genome can be characterized with a few
thousand common SNP markers - on a single chip
- a personal genetic profile
55Some Diseases Involve Many Genes
- There are a number of classic genetic diseases
caused by mutations of a single gene - Huntingtons, Cystic Fibrosis, Tay-Sachs, PKU,
etc. - There are also many diseases that are the result
of the interactions of many genes - asthma, heart disease, cancer
- Each of these genes may be considered to be a
risk factor for the disease. - Groups of genetic markers (SNPs) may be
associated with disease risk without determining
a mechanism.
56DNA Diagnostic Testing
- Hereditary diseases - potential parents,
pre-natal, late onset diseases - Genes that predisposes to disease (risk factors)
- Genotyping of infectious agents (bacterial
viral) - Measure the type and stage of cancer tumors
- Forensics - using DNA testing to establish
identity
57III. The Medical and Business Applications of
Genomics
58Implications for Biomedicine
- Physicians will use genetic information to
diagnose and treat disease. - Virtually all medical conditions have a genetic
component. - Faster drug development research
- Individualized drugs
- Gene therapy
- All Biologists will use gene sequence information
in their daily work
59Pharmacogenomics
- The use of DNA sequence information to measure
and predict the reaction of individuals to drugs - Personalized drugs
- Faster clinical trials
- Less drug side effects
60People React Differently to Drugs
- Side effects
- Effectiveness
- There are genes that control these reactions
- SNP markers can be used to identify these genes
61 Make Genetic Profiles
- Identify populations of people who show specific
responses to a drug - Scan these populations with a large number of SNP
markers. - Find markers linked to drug response phenotypes.
62Use the Profiles
- Genetic profiles of new patients can then be used
to prescribe drugs more effectively avoid
adverse reactions. - Sell a drug with a gene test
- Can also speed clinical trials by testing on
those who are likely to respond well.
63Toxicogenomics
- There are a number of common pathways for drug
toxicity (or environmental tox.) - It is possible to compile genomic signatures
(gene expression data) for these pathways. - Candidate drug molecules can be screened in cell
culture or in animals for induction of these
toxicity pathways.
64Genomics supports Biotechnology
- Biotechnology is based on developing new drugs
- Some Biotech companies produce and sell these
drugs (Amgen, Genentech), while others partner
with big pharmaceutical companies (sell
intellectual property) - Genomics is a way of using information to find
new drugs faster and more cheaply.
65Tools vs. Targets
- Genomics/Bioinformatics companies can sell
information and software (tools) or the results
of genome analysis (targets) - Tools are bought by big Biotech or Pharma
companies to aid their own research. - Targets are proteins that have already been
identified as playing a role in disease - ready for drug development
66Tools can be Software or Technology
- Database, data analysis, data mining, and
interface software is essential - Machines and reagents for genomics experiments
(GeneChips, gene testing machines) - Some tools will have a mainstream application in
medicine (diagnostic tests) - a much wider market
67Impact on Bioinformatics
- Genomics produces high-throughput, high-quality
data, and bioinformatics provides the analysis
and interpretation of these massive data sets. - It is impossible to separate genomics laboratory
technologies from the computational tools
required for data analysis. - Investment in genomics lab technology must
include funding for bioinformatics support
68Planning for a Genomics Revolution
- Bioinformatics support must be integral in the
planning process for the development of new
genomics research facilities. - Genome Project sequencing centers have more staff
and more spent on data analysis than on the
sequencing itself. - Microarray facilities will be even more skewed
toward data analysis - It is an information-intensive business!
69Long Term Implications
- A "periodic table for biology" will lead to an
explosion of research and discoveries - we will
finally have the tools to start making systematic
analyses of biological processes (quantitative
biology). - Understanding the genome will lead to the
ability to change it - to modify the
characteristics of organisms and people in a wide
variety of ways
70Genomics Education
- Genomics scientists need basic training in both
Molecular Biology and Computing - Specific training in the use of automated
laboratory equipment, the analysis of large
datasets, and bioinformatics algorithms - Particularly important for the training of
medical doctors - at least a familiarity with the
technology
71Genomics in Medical Education
- The explosion of information about the new
genetics will create a huge problem in health
education. Most physicians in practice have had
not a single hour of education in genetics and
are going to be severely challenged to pick up
this new technology and run with it." - Francis Collins
72Stuart M. Brown, Ph.D.stuart.brown_at_med.nyu.eduww
w.med.nyu/rcr
Bioinformatics A Biologist's Guide to
Biocomputing and the Internet