Title: Pr
1(No Transcript)
2What next?
- High quality genome sequencing and annotation
(2003) - Complete sequencing the genomes of other model
organisms (e.g. Mouse) - The next step Functional Genomics
- Determine what our genes do through systematic
studies of function on a large scale - Transcriptomics - Comparative analysis of mRNA
expression /splicing - Proteomics - Comparative analysis of protein
expression and post-translational modifications - Structural genomics - Determine 3-D structures of
key family members - Intervention studies - Effects of inhibiting gene
expression - Comparative genomics - Analysis of DNA sequence
patterns of humans and well studies model
organisms
3Beyond Genomics Systems Biology
- Human Genome 30,000 to 60,000 genes
- Human Proteome 300,000 to 1,200,000 protein
variants - Human Metabalome metabolic products of the
organism (lipids,carbohydrates, amino acids,
peptides, prostaglandins, etc)
4Functional Genomics
- Whole genome
- Once the whole genome is truly known and the
whole genome sequences become available for an
organism, the challenge turns from identifying
parts to understanding function - Functional genomics
- The post-genomic era is defined as functional
genomics - Assignation of function to identified genes
- Organisation and control of genetic pathways that
come together to make up the physiology of an
organism
5Functional Genomics
- 42 of human genes of unknown function have been
found in the human genome - assigning function to these genes using
systematic high throughput methods is required
6The Periodic Table
Functional grouping of Chemical Elements
7Biologists Periodic Table
Genomics
- Will not be two-dimensional
- Will reflect similarities at diverse levels
- Primary DNA sequence in coding and regulatory
regions - Polymorphic variation within a species or
subgroup - Time and place of expression of RNAs during
development, physiological response and disease - Subcellular localisation and intermolecular
interaction of protein products
8Gene Expression analysis
- Array of hope?
- Arrays offer hope for global views of
biological processes - Systematic way to study DNA and RNA variation
- Standard tool for molecular biology research
clinical diagnostics - Labelled nucleic acid molecules can be used to
interrogate nucleic acid molecules attached to
solid support (remember Southern Blotting?) - (Refer to January 1999, Nature Genetics
Supplement, Volume 21)
9Gene Expression analysis
- DNA chips Also known as gene chips, biochips,
microarraysbasically DNA-covered pieces of glass
(or plastic) capable of simultaneously analysing
thousands of genes at a time they can be high
density arrays of oligonucleotides or cDNA - Chips allow the monitoring of mRNA expression on
a big scale (i.e many many genes at the same
time)
Pre-1995, Northern Blots used to look at gene
expression
10Gene Expression analysis
Incyte
11Gene Expression analysis
Affymetrix
12Nanogen_Movie_1
Nanogen_Movie_2
Nanogen_Movie_3
Affymetrix_Movie_3
http//www.learner.org/channel/courses/biology/uni
ts/genom/images.html
13(No Transcript)
14Determining gene function
sequence homology
sequence motif
tissue distribution
chromsme localisation
function .
expression in disease
biochemical assays
proteomics .
expression in models
15Protein synthesis
16RNA synthesis and processing
17Alternatively spliced mRNA
18The transcriptome
- DEFINITION
- The mRNA collection content, present at any
given moment in a cell or a tissue, and its
behaviour over time and cell states - (Adam Sartel, COMPUGEN).
- The complete collection of mRNAs and their
alternative splice forms is sometimes referred to
as the trancriptome. The transcriptome is teh
set of instructions for creating all of the
different proteins found in an organism. - (From Genome to Transcriptome, Incyte)
19Genome, proteome and transcriptome
The Transcriptome
Golden path
Proteome information in
DNA technology
The Proteome
- Index to a range of possible proteins
- Useful as a map and for inter-organisms
analysis
- Describes what actually happens in the cell
- Complex tools, partial results
20Use of transcriptome analysis
- Discovery of new proteins
- that are present in specific tissues
- that have specific cell locations
- that respond to specific cell states
- Discovery of new variants
- of important genes
- that work to increase/decrease the activity of
the native protein - The transcriptome reflects tissue source (cell
type, organ) and also tissue activity and state
such as the stage of development, growth and
death, cell cycle, diseased or healthy, response
to therapy or stress.. -
21Beyond genomicsproteomics
- Proteomicswhere the genome hits the road
- Proteomics refers to the simultaneous, large
scale analysis of all (or many) of the proteins
made in a cell at one time to get a global
picture of what proteins are made in cells and
when - Hopefully then we can determine the whys and
what we can thus do about it very important for
drug development - The proteome is the protein complement encoded by
a genome and the term was first proposed by an
Australian post-doc, Marc Wilkins in 1994 -
22Beyond the genome Proteomics
- Genomics involves study of mRNA expression-the
full set of genetic information in an organism
contains the recipes for making proteins - Proteins constitute the bricks and mortar of
cells and do most of the work - Proteins distinguish various types of cells,
since all cells have essentially the same
Genome their differences are dictated by which
genes are active and the corresponding proteins
that are made - Similarly, diseased cells may produce dissimilar
proteins to healthy cells - However task of studying proteins is often more
difficult than genes (e.g. post-translational
modifications can dramatically alter protein
function)
23Beyond the genome Proteomics
- Identification of all the proteins made in a
given cell, tissue or organism - Identification of the intracellular networks
associated with these proteins - Identification of the precise 3D-structure of
relevant proteins to enable researchers to
identify potential drug targets to turn protein
on or off - Proteomics very much requires a coordinated focus
involving physicists, chemists, biologists and
computer scientists
24Beyond the genome Proteomics
- Major challenge-how do we go from the treasure
chest of information yielded by genomics in
understanding cellular function - Genomics based approaches initially use
computer-based similarity searches against
proteins of known function - Results may allow some broad inferences to be
made about possible function - However, a significant percentage (gt30) of the
sequences thus far ascertained seem to code for
proteins that are unrelated at this level to
proteins of known function
25Beyond the genome Proteomics
- Beyond the genetic make-up of an individual or
organism, many other factors determine gene and
ultimately protein expression and therefore
affect proteins directly - These include environmental factors such as pH,
hypoxia, drug treatment to name a few - Examination of the genome alone can not take into
account complex multigenic processes such as
ageing, stress, disease or the fact that the
cellular phenotype is influenced by the networks
created by interaction between pathways that are
regulated in a coordinated way or that overlap
26Beyond the genome Proteomics
- Genomic analysis has certainly provided us with
much insight into the possible role of particular
genes in disease - However proteins are the functional output of the
cell and their dynamic nature in specific
biological contexts is critical - The expression or function of proteins is
modulated at many diverse points from
transcription to post-translation and very little
of this can be predicted from a simple analysis
of nucleic acids alone - There is generally poor correlation between the
abundance of mRNA transcribed from the DNA and
the respective proteins translated from that mRNA - Furthermore, transcript splicing can yield
different protein forms - Proteins can undergo extensive modifications such
as glycosylation, acetylation, and
phosphorylation which can lead to multiple
protein products from the same gene
27Proteomics Tools
- The core methodologies for displaying the
proteome are a combination of advanced separation
techniques principally involving two-dimensional
electrophoresis (2D-GE) and mass spectrometry
http//www.learner.org/channel/courses/biology/uni
ts/proteo/images.html
http//www.childrenshospital.org/cfapps/research/d
ata_admin/Site602/mainpageS602P0.html
282D-GE basic methodology
- Sample (tissue, serum, cell extract) is
solubilized and the proteins are denatured into
polypeptide components - This mixture is separated by isoelectric focusing
(IEF) on the application of a current, the
charged polypeptide subunits migrate in a
polyacrylamide gel strip that contains an
immobilized pH gradient until they reach the pH
at which their overall charge is neutral
(isoelctric point or pI), hence producing a gel
strip with distinct protein bands along its
length - This strip is applied to the edge of a
rectangular slab of polyacrylamide gel containing
SDS. The focused polypeptides migrate in an
electric current into the second gel and undergo
separation on the basis of their molecular size
292D-GE basic methodology
- The resultant gel is stained (Coomassie, silver,
fluorescent stains) and spots are visualized by
eye or an imager. Typically 1000-3000 spots can
be visualized with silver. Complementary
techniques, e.g. immunoblotting allow greater
sensitivity for specific molecules. - Multiple forms of individual proteins can be
visualized and the particular subset of proteins
examined from the proteome is determined by
factors such as initial solubilization
conditions, pH range of the IPG and gel gradient
30General schematic of 2D-PAGE for protein
identification in Toxicology
31General strategy for proteomic analysis
Sample solubilization
Sample growth
Isoelectric focusing (IPG)
2D-PAGE
Immunoblot (Western)
Image analysis
Isolation of spots of interest
Trypsin digestion of proteins
MS analysis of tryptic fragments
Identification of proteins
32Nature of IPG determines spot location on 2D-PAGE
33Limitations of 2D-GE
- In the large scale analysis of proteomics, 2D-GE
has been the major workhorse over the last 20
years-its unique application in being able to
distinguish post-translational modifications and
is analytically quantitative - However despite the significant improvements
(e.g. immobilized pH gradients) to the technique
and its coupling with MS analysis it is still
difficult to automate - Although at first glance the resolution of 2D
seems very impressive, it still lags behind the
enormous diversity of proteins and thus
comigrating protein spots are not uncommon - This is especially of concern when trying to
distinguish between highly abundant proteins e.g.
actin (108 molecules/cell) and low abundant like
transcription factors (100-1000)-this is beyond
the dynamic range of 2D - Enrichment or prefractionation can often overcome
such discrepancies
34Limitations of 2D-GE
- Chemical heterogeneity of proteins also presents
a major limitation - Thus the full range of pIs and MWs of proteins
exceeds what can routinely be analyzed on 2D-GE.
However improvements to IPGs is expected to
overcome some of these constraints and greatly
imrpove the coverage of the entire proteome of
the cell - Problems liked with extraction and solubilization
of proteins prior to 2D-GE present an even
greater challenge-especially for extremely
hydrophobic proteins, such as membrane and
nuclear proteins. Again recent advances in buffer
composition has diminished the scale of this
problem
35Differential Gel Electrophoresis (DiGE)
36Protein identification and characterization
- Specialized imaging software allows for a more
detailed analysis of spot identification and
comparison between gels, and treatments - By a process of subtraction, differences (e.g.
presence, absence, or intensity of proteins or
different forms) between healthy and diseased
samples can be revealed - Cross-references to protein databases allow
assignment by known pIs and apparent molecular
size. Ultimate protein identification requires
spot digestion (enzymatic) and analysis of charge
and mass by mass spectrometry (MS) - Spot cutter tools can be coupled to image
analysis tools and in gel tryptic digestion
techniques in 96 or 384 well format can greatly
reduce the bottle-neck in sample identification
by MS
37Protein analysis by MS
- Compared to sequencing, MS is more sensitive
(femtomole to attomole concentrations) and is
higher throughput - Digestion of excised spot with trypsin results in
a mixture of peptides. These are ionized by
electrospray ionization from liquid state or
matrix-assisted laser desorption ionization from
solid state (MALDI-TOF) and the mass of the ions
is measured by various coupled analyzers (e.g.
time of flight measures the time for ions to
travel from the source to the detector, resulting
in a peptide fingerprint - The resultant signature is compared with the
peptide masses predicted from theoretical
digestion of protein sequences found in
databases-identification of protein! - Tandem MS allows one to obtain actual protein
sequence information-discrete peptide ions can be
selected and further fragmented, and complex
algorithms employed to correlate exp data with
database derived peptide sequences
38MS analysis
39MS analysis
40Antibody arrays
Good for low-abundance proteins Problem is
antibody specificity
41Protein microarrays
42Caveats
- The technology of proteomics is not as mature as
genomics, owing to the lack of amplification
schemes akin to PCR. Only proteins from a natural
source can be analyzed - The complexities of the proteome arise because
most proteins seem to be processed and modified
in complex ways and can be the products of
differential splicing - in addition protein abundance spans a range
estimated to be 5 to 6 orders of magnitude in
yeast and 10 orders of magnitude in humans.
43challenges
- Complexity some proteins have gt1000 variants
- Need for a general technology for targeted
manipulation of gene expression - Limited throughput of todays proteomic platforms
- Lack of general technique for absolute
quantitation of proteins