Title: ENCODE Pseudogene Summary for GT call
1ENCODE Pseudogene Summaryfor GT call
- Mark Gerstein
- 2005,10.28 1100 EDT
- summary of 6 Calls Sept. 15, 22 Oct. 6, 13,
20, 27
2Developed Consensus Set of 198 Pseudogenes
- A Derived from a qualified union of GIS, Havana,
UCSC, Yale with a uniform criteria on
boundaries - Identify a good set of human proteins HAVANA
set? - Remove pseudogenes (from all 4 groups)
overlapping with current GENCODE exons - (does GENCODE have an updated version?).
- Create an union of the remaining pseudogenes.
- Find the best matching proteins for each
pseudogene, remove entries without a BLAST hit
(e-value cutoff issue?). - Realign each pseudogene to its parent protein to
produce a uniform alignment and to define the
start and end coordinates. - Apply a threshold to sequence identity and
coverage? (No.) - Classify pseudogenes into processed and
non-processed (how?) - B Overall 222 pseudogenes application of above
receipe gives 198 Consensus - (Intersection set of above is 81 (proc) 49
(non-proc)) - C Currently, on test browser encode wiki
http//pseudogene.org/ENCODE
From Deyou Z. Robert B.
3Interesting Complexities of Pseudogene
Annotation Insertion of One Pseudogene into
Another One
First insertion event
heterogeneous nuclear ribonucleoprotein A1
(HNRPA1) pseudogene (parent on Chr12)
Remnant of a second, mitochondrial insertion
event (has post-insertion deletions)
NADH dehydrogenase 2 (MTND2) pseudogene (parent
mitochondrial)
NADH dehydrogenase 4 (MTND4) pseudogene (parent
mitochondrial)
cytochrome b (CYTB) pseudogene (parent
mitochondrial)
Protein evidence
From Adam F.
4EST Evidence of Expression from a Pseudogene at
5 UTR of Known Gene
LILR pseudogene
Frameshift
Upstream pseudogene corresponds to exons 1-3 of
LILR family genes, 3 exons have been lost. EST
evidence supports expression from the pseudogene
locus extending to known gene LILRA3.
LILRA3
From Adam F.
5 TAR/Transfrag Evidence for Transcription in 198
consensus pseudogenes
- of 198 overlapped by interrogated regions
(affy arrays) 180 (90.9) - of 198
overlapped by yale tars or affy transfrags
(union) 106 (53.5 of all 58.9 of
interrogated) gt There is evidence of
transcription (from tars or transfrags) of the
pseudogene or the parent gene (if
cross-hybridization) for 53.5 of the consensus
pseudogenes (upper bound on transcription) -
overlapping cage tags 11 (5.5) -
overlapping ditag tags 1 (0.5) (83 (41.9)
are overlapped by full length ditags)
From France D.
6Example Pseudogene overlapped by tars/transfrags
and tags ENCODE_consensus_187
but pseudogene is 93 similar to parent
From France D.
7Consensus Pseudogenes with 2 ChIP-chip Hits
Pgene-ID Pgene-type E2F H3K4me3 (0h 30h) Sp3 STAT1
13 Processed 0 1 0 0
45 Processed 0 1 0 0
47 Processed 0 1 0 0
77 Processed 1 1 0 0
126 Processed 0 1 0 0
149 Processed 1 1 0 0
174 Non-Processed 0 1 0 0
177 Non-Processed 1 1 0 0
187 Processed 0 1 0 0
193 Processed 0 0 1 1
Has Trans-criptional Evidence (intersects Gencode
transcript)
Look for ChIP-chip hits upstream of the
pseudogenes
From Deyou Z.
8Pot. Transcribed Pseudogene (177)with Upstream
ChIP-chip Hits
From Deyou Z.
9Experiments to Validate Expression of Encode
Pseudogenes
- Select ENCODE pseudogenes from the intersection
part of consensus set - 49 non-processed, 125 processed
- Designed oligos (25mer, Tm 70C)
- Either specific to pseudogene or shared between
parental gene and pseudogene
- Doing 5RACE in 12 human tissues
- Brain, heart, kidney, spleen, liver, colon, sm.
intestine, muscle, lung, stomach, testis,
placenta - First 96 pseudogenes 5RACEs done in 12 tissues
- Last 78 will be done next week
- To do pool multiple RACEs, send to Santa Clara
and hybridize to Affymetrix ENCODE 20 nucleotide
resolution arrays
Stylianos Antonarakis, Robert Baertsch, Jorg
Drenkow, Tom Gingeras, Charlotte Henrichsen
Philipp Kapranov, Catherine Ucla, Alexandre
Reymond Affymetrix, UCSC, University of Geneva,
University of Lausanne
From Alex R.
10Extra Slides
11Pseudogene group
- Core people Jennifer Harrow ltjla1_at_sanger.ac.ukgt,
WEI Chia-Lin ltweicl_at_gis.a-star.edu.sggt, Adam
Frankish ltaf2_at_sanger.ac.ukgt, "Dike, Sujit"
ltSujit_Dike_at_affymetrix.comgt, Robert Baertsch
ltbaertsch_at_SOE.UCSC.EDUgt, fdenoeud_at_imim.es, Deyou
Zheng ltzhengdy_at_csb.yale.edugt, Yontao Lu
ltytlu_at_SOE.UCSC.EDUgt Alexandre.Reymond_at_medecine.un
ige.ch, ytlu_at_SOE.UCSC.EDU - Others "Hoyem, Tara L" ltTara.Hoyem_at_pnl.govgt,
Roderic Guigo Serra ltrguigo_at_imim.esgt, "'Gingeras,
Tom' Tom_Gingeras_at_affymetrix.comgt,
thomas.royce_at_yale.edu, Suganthi Balasubramanian
suganthi_at_csb.yale.edu - 6 Calls Sept. 15, 22 Oct. 6, 13, 20, 27
12Refresher many repetitions of the below Venn
analysis
54 (2)
Havana-Gencode 165 pseudogenes (167 -2 )
17 (2)
16 (0)
Yale 167 pseudogenes (164 3)
81 (34)
15 (1)
16 (7)
7 Havana agrees to be added (8, 11, 40, 59, 139,
152, 169). 4 at coding loci. Yale agrees to
delete 1 with weak sequence identity. 5 with
non-real proteins.
Numbers according to Adams note
33 (1)
UCSC retrogenes 146 not expressed
9 Havana agrees to be added. 2 at coding loci.
Yale agrees to delete 1 with weak sequence
identity. 2 with non-real proteins.
Solved by consistent protein set threshold
13Rearranged exon order in unprocessed pseudogene
From Adam F.
Dot plot protein evidence vs genome
adaptor-related protein complex 1, beta 1 subunit
(AP1B1) pseudogenes
Protein evidence
Exon 6
Exon 3
Splice sites same as parent gene
Following duplication of the AP1B1 locus
rearrangements/duplications have produced two
unprocessed pseudogenes corresponding to exons 6
and 3 of the parent gene
14Rearrangement of processed pseudogene
From Adam F.
mRNA dot plot
pseudogene similar to part of ribosomal protein
L3 (RPL3)
Following insertion, one end of the RPL3
pseudogene has been flipped onto the opposite
strand (with some loss of internal sequence)
Protein dot plot
15 Overlaps by tar/transfrag subset - Nb
overlapped by interrogated regions (affy
arrays) 180 (90.9) - Nb overlapped by yale
tars or affy transfrags (union) 106 (53.5 of
all 58.9 of interrogated) - Nb overlapped by
yale tars (union) 84 (42.4 of all 46.7 of
interrogated) - Nb overlapped by affy
transfrags (union) 102 (51.5 of all 56.7
of interrogated) - Nb overlapped by polyA
tars/transfrags (union) 105 (53 of all 58.3
of interrogated) - Nb overlapped by total RNA
tars (union) 61 (30.8 of all 33.9 of
interrogated)
From France D.
16Expression from pseudogene locus (1) putative
novel transcript
Aligned proteins (column collapsed)
HAVANA sialyltransferase pseudogene (RP3-477O4.5)
supported by protein evidence
Supporting EST (100 ID)
Putative novel transcript supported by a single
EST with has a polyA site and signal
polyA site and signal
Appears to be some transcription from this locus
which is supported at the 3 end by a single EST
From Adam F.
17Intersect Consensus Pseudogenes with ChIP-chip
Hits
Factors E2F H3K4me3 (0h) H3K4me3 (30h) Sp3 STAT1
Group UCDavis UCSD UCSD Stanford Yale
Total Hits 400 1000 1000 400 400
Known Genes (405) 145 149 154 86 15
?genes (198) 4 25 24 3 7
From Deyou Z.