ENCODE Pseudogene Summary for GT call - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

ENCODE Pseudogene Summary for GT call

Description:

Title: How to generate a consensus? Author: Mark Gerstein Last modified by: Mark Gerstein Created Date: 9/21/2005 11:43:10 PM Document presentation format – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 18
Provided by: MarkGe4
Category:

less

Transcript and Presenter's Notes

Title: ENCODE Pseudogene Summary for GT call


1
ENCODE Pseudogene Summaryfor GT call
  • Mark Gerstein
  • 2005,10.28 1100 EDT
  • summary of 6 Calls Sept. 15, 22 Oct. 6, 13,
    20, 27

2
Developed Consensus Set of 198 Pseudogenes
  • A Derived from a qualified union of GIS, Havana,
    UCSC, Yale with a uniform criteria on
    boundaries
  • Identify a good set of human proteins HAVANA
    set?
  • Remove pseudogenes (from all 4 groups)
    overlapping with current GENCODE exons
  • (does GENCODE have an updated version?).
  • Create an union of the remaining pseudogenes.
  • Find the best matching proteins for each
    pseudogene, remove entries without a BLAST hit
    (e-value cutoff issue?).
  • Realign each pseudogene to its parent protein to
    produce a uniform alignment and to define the
    start and end coordinates.
  • Apply a threshold to sequence identity and
    coverage? (No.)
  • Classify pseudogenes into processed and
    non-processed (how?)
  • B Overall 222 pseudogenes application of above
    receipe gives 198 Consensus
  • (Intersection set of above is 81 (proc) 49
    (non-proc))
  • C Currently, on test browser encode wiki
    http//pseudogene.org/ENCODE

From Deyou Z. Robert B.
3
Interesting Complexities of Pseudogene
Annotation Insertion of One Pseudogene into
Another One
First insertion event
heterogeneous nuclear ribonucleoprotein A1
(HNRPA1) pseudogene (parent on Chr12)
Remnant of a second, mitochondrial insertion
event (has post-insertion deletions)
NADH dehydrogenase 2 (MTND2) pseudogene (parent
mitochondrial)
NADH dehydrogenase 4 (MTND4) pseudogene (parent
mitochondrial)
cytochrome b (CYTB) pseudogene (parent
mitochondrial)
Protein evidence
From Adam F.
4
EST Evidence of Expression from a Pseudogene at
5 UTR of Known Gene
LILR pseudogene
Frameshift
Upstream pseudogene corresponds to exons 1-3 of
LILR family genes, 3 exons have been lost. EST
evidence supports expression from the pseudogene
locus extending to known gene LILRA3.
LILRA3
From Adam F.
5
TAR/Transfrag Evidence for Transcription in 198
consensus pseudogenes
- of 198 overlapped by interrogated regions
(affy arrays) 180 (90.9) - of 198
overlapped by yale tars or affy transfrags
(union) 106 (53.5 of all 58.9 of
interrogated) gt There is evidence of
transcription (from tars or transfrags) of the
pseudogene or the parent gene (if
cross-hybridization) for 53.5 of the consensus
pseudogenes (upper bound on transcription) -
overlapping cage tags 11 (5.5) -
overlapping ditag tags 1 (0.5) (83 (41.9)
are overlapped by full length ditags)
From France D.
6
Example Pseudogene overlapped by tars/transfrags
and tags ENCODE_consensus_187
but pseudogene is 93 similar to parent
From France D.
7
Consensus Pseudogenes with 2 ChIP-chip Hits
Pgene-ID Pgene-type E2F H3K4me3 (0h 30h) Sp3 STAT1
13 Processed 0 1 0 0
45 Processed 0 1 0 0
47 Processed 0 1 0 0
77 Processed 1 1 0 0
126 Processed 0 1 0 0
149 Processed 1 1 0 0
174 Non-Processed 0 1 0 0
177 Non-Processed 1 1 0 0
187 Processed 0 1 0 0
193 Processed 0 0 1 1
Has Trans-criptional Evidence (intersects Gencode
transcript)
Look for ChIP-chip hits upstream of the
pseudogenes
From Deyou Z.
8
Pot. Transcribed Pseudogene (177)with Upstream
ChIP-chip Hits
From Deyou Z.
9
Experiments to Validate Expression of Encode
Pseudogenes
  • Select ENCODE pseudogenes from the intersection
    part of consensus set
  • 49 non-processed, 125 processed
  • Designed oligos (25mer, Tm 70C)
  • Either specific to pseudogene or shared between
    parental gene and pseudogene
  • Doing 5RACE in 12 human tissues
  • Brain, heart, kidney, spleen, liver, colon, sm.
    intestine, muscle, lung, stomach, testis,
    placenta
  • First 96 pseudogenes 5RACEs done in 12 tissues
  • Last 78 will be done next week
  • To do pool multiple RACEs, send to Santa Clara
    and hybridize to Affymetrix ENCODE 20 nucleotide
    resolution arrays

Stylianos Antonarakis, Robert Baertsch, Jorg
Drenkow, Tom Gingeras, Charlotte Henrichsen
Philipp Kapranov, Catherine Ucla, Alexandre
Reymond Affymetrix, UCSC, University of Geneva,
University of Lausanne
From Alex R.
10
Extra Slides
11
Pseudogene group
  • Core people Jennifer Harrow ltjla1_at_sanger.ac.ukgt,
    WEI Chia-Lin ltweicl_at_gis.a-star.edu.sggt, Adam
    Frankish ltaf2_at_sanger.ac.ukgt, "Dike, Sujit"
    ltSujit_Dike_at_affymetrix.comgt, Robert Baertsch
    ltbaertsch_at_SOE.UCSC.EDUgt, fdenoeud_at_imim.es, Deyou
    Zheng ltzhengdy_at_csb.yale.edugt, Yontao Lu
    ltytlu_at_SOE.UCSC.EDUgt Alexandre.Reymond_at_medecine.un
    ige.ch, ytlu_at_SOE.UCSC.EDU
  • Others "Hoyem, Tara L" ltTara.Hoyem_at_pnl.govgt,
    Roderic Guigo Serra ltrguigo_at_imim.esgt, "'Gingeras,
    Tom' Tom_Gingeras_at_affymetrix.comgt,
    thomas.royce_at_yale.edu, Suganthi Balasubramanian
    suganthi_at_csb.yale.edu
  • 6 Calls Sept. 15, 22 Oct. 6, 13, 20, 27

12
Refresher many repetitions of the below Venn
analysis
54 (2)
Havana-Gencode 165 pseudogenes (167 -2 )
17 (2)
16 (0)
Yale 167 pseudogenes (164 3)
81 (34)
15 (1)
16 (7)
7 Havana agrees to be added (8, 11, 40, 59, 139,
152, 169). 4 at coding loci. Yale agrees to
delete 1 with weak sequence identity. 5 with
non-real proteins.
Numbers according to Adams note
33 (1)
UCSC retrogenes 146 not expressed
9 Havana agrees to be added. 2 at coding loci.
Yale agrees to delete 1 with weak sequence
identity. 2 with non-real proteins.
Solved by consistent protein set threshold
13
Rearranged exon order in unprocessed pseudogene
From Adam F.
Dot plot protein evidence vs genome
adaptor-related protein complex 1, beta 1 subunit
(AP1B1) pseudogenes
Protein evidence
Exon 6
Exon 3
Splice sites same as parent gene
Following duplication of the AP1B1 locus
rearrangements/duplications have produced two
unprocessed pseudogenes corresponding to exons 6
and 3 of the parent gene
14
Rearrangement of processed pseudogene
From Adam F.
mRNA dot plot
pseudogene similar to part of ribosomal protein
L3 (RPL3)
Following insertion, one end of the RPL3
pseudogene has been flipped onto the opposite
strand (with some loss of internal sequence)
Protein dot plot
15
Overlaps by tar/transfrag subset - Nb
overlapped by interrogated regions (affy
arrays) 180 (90.9) - Nb overlapped by yale
tars or affy transfrags (union) 106 (53.5 of
all 58.9 of interrogated) - Nb overlapped by
yale tars (union) 84 (42.4 of all 46.7 of
interrogated) - Nb overlapped by affy
transfrags (union) 102 (51.5 of all 56.7
of interrogated) - Nb overlapped by polyA
tars/transfrags (union) 105 (53 of all 58.3
of interrogated) - Nb overlapped by total RNA
tars (union) 61 (30.8 of all 33.9 of
interrogated)
From France D.
16
Expression from pseudogene locus (1) putative
novel transcript
Aligned proteins (column collapsed)
HAVANA sialyltransferase pseudogene (RP3-477O4.5)
supported by protein evidence
Supporting EST (100 ID)
Putative novel transcript supported by a single
EST with has a polyA site and signal
polyA site and signal
Appears to be some transcription from this locus
which is supported at the 3 end by a single EST
From Adam F.
17
Intersect Consensus Pseudogenes with ChIP-chip
Hits
Factors E2F H3K4me3 (0h) H3K4me3 (30h) Sp3 STAT1
Group UCDavis UCSD UCSD Stanford Yale
Total Hits 400 1000 1000 400 400
Known Genes (405) 145 149 154 86 15
?genes (198) 4 25 24 3 7
From Deyou Z.
Write a Comment
User Comments (0)
About PowerShow.com