Title: ??(Penaeus%20monodon)???????
1????????????????????????????????
- ???? ??? ??
- ??? ???
- 2012/07/26
2??
3??
4??--????
- ????????
- ??(Penaeus monodon)???????
- 1970? 73? ? 1987? 78,548?
- 1988? ???????
- 2000???,??????????
????FAO
5??--????
- ???????
- 2000? 146,362? ? 2009? 2,327,534?
- ????????66??
- ?????????????
????FAO
6??--????
- ?????????
- ????????????????
- ?????????????????
- ????????????
- ??????pH???
- ??????,????,???? (White Spot virus disease)
- ??(1988)???(1993)???(1996)????????????
- ????
- ???????
7??--????
- Expressed sequenced tag (EST,??????)
- 1991?,?????????????
- ?????????? cDNA library?,?????????,?????(???)????
- ??????????????
- ??
- ???????????????
- ???????????
(Adams, et al. 1991)
8??--????
- EST????
- ??????,???????
- ?????????
- ????EST??????,??????????,????? ESTs ????,???????
?? ESTs????
????? 10,446
?? 161,241
?? 39,397
??? 3,156
214,240
????NCBI Taxonomy (Date March, 2012)
9??--????
- ??EST??,??????????(transcriptome)
- ???????
- ??????
- ???????????
- ???????????????
- ???????????????,????????????????,??,???????????,??
??????????????????
10??--????
??
?? Gross et al. O' Leary et al. Clavero-Sales et al.
?? 2001 2006 2007
EST? 2,045 13,656 601
???? ???????????? vs ???????????? ??????????????????????? ?(??????)
assembly N Y Y
assembly ?? de novo (CAP3) de novo (CAP3)
11??--????
??? ????
?? Rojtinnakorn et al. Yamano ? Unuma Dong and Xiang Jianhai et al.
?? 2002 2006 2007 2008
EST? 1,005 1,988 2,371 10,446
???? ??? ?????? vs ?? ?? ??? ???
assembly N N N Y
assembly ?? de novo (CAP3)
12??--????
??
?? Lehnert et al. Supungul et al. Supungul et al. Tassanakajon et al. Preechaphol et al. Leu et al. Pongsomboon et al. Leelatanawit et al.
?? 1999 2002 2004 2006 2007 2007 2008 2009
EST? 176 615 1,062 10,100 1,051 13,934 1,033 896
???? ??? ?? ?? ??? ??? ?????? vs ?? ?? ??? ???? ??? ???? ?? ?? ?? ????? vs ???? ???? ?????? Vs ?? ??
assembly N N N N N Y N N
assembly ?? de novo (CAP3)
13??--????
- ????????????EST????
- ???????assembly
- assembly?? de novo assembly
14????
15????--????
- ???EST
- ??NCBI dbESTs (Database of Expressed Sequence
Tags) - ??2011/3/17
- 84? cDNA libraries
Numbers Min. length(bps) Max. length(bps) Ave. length(bps)
ESTs of Litopenaeus vannamei 161,241 19 2,143 494
16????--????
- ????????
- ????JGI
- ??2011/4/13
- ??FilteredModelsv1.1.na.fasta.gz
Numbers Min. length(bps) Max. length(bps) Ave. length(bps)
Genes of Daphnia pulex 30,907 150 24,144 1,061
17????--????
- ??????
- ????FlyBase
- ??2011/6/29
- ??dmel-all-translation-r5.38
Numbers Min. length(bps) Max. length(bps) Ave. legnth(bps)
Proteins of Drosophila melanogaster 23,711 11 22,971 632
18????--???????
19????--???????
- ??
- MIRA 3.2.1
- ??Sanger?454?Solexa
- ?high confidence region (HCR)???,??????HCR????????
??,??????,????????????,?????????
20????--???? (??)
Singleton I
21????--???? (??)
????? (??????)
No Hit
??ESTs
alignment length gt 50 aa.
MIRA, de novo assembly
Contig II
Singleton II
22????--???? (DE NOVO ASSEMBLY)
Singleton II
Singleton III
Contig III
23????--????
- ????
- Contig I--????????????????????????
- Contig II--???????????????????????
- Contig III--de novo assembly????
- Singleton III--????? ESTs ?????contig
- ???????????
- blastx ? NCBI nr ( E-valuelt10-5)
- rpsblast ? Pfam (E-valuelt10-3 )
- ??????????,??? E-value ???????????
24????--????
- Gene Ontology
- ????????
- ?????????????????????
- cellular component (CC)??????????????
- molecular function (MF)?????????
- biological process (BP)???????????
- ??pfam2go???,???????GO??
25????--?????????
Library ID Library Description ESTs number
Lib.22684 Litopenaeus vannamei eyestalk cDNA library 29,575
Lib.22686 Litopenaeus vannamei hemocyte cDNA library 27,369
Lib.22685 Litopenaeus vannamei gills cDNA library 24,296
Lib.22688 Litopenaeus vannamei lymphoid organ cDNA library 24,214
Lib.22687 Litopenaeus vannamei hepatopancreas cDNA library 22,272
Lib.22689 Litopenaeus vannamei nerve cord cDNA library 20,179
Total 147,905
- ?????????????????????
- ??????20,000?
- ??EST ?????????,??????????
26????--?????????
27????--???????????
- Digital Differential Display (DDD)
- -- ??????????????,?EST??????????
- ????contigs vs. ?????????contigs
- contig?EST? ? TPM??
- Fishers exact test??,?? p-valuelt10-3 ?contigs
- ??????????????????????contigs
28????--???????????
Selected library Others library Total
TPM value of selected contigs A B AB
TPM value of non-selected contigs C D CD
Total AC BD ABCD (N)
29????--???????????
- Gene Ontology Enrichment Analysis
30????--???????????
The selected library Complement of the selected library Total
The number of selected GO term a b ab
The number of complement of the selected GO term c d cd
Total number ac bd abcd(n)
31????--???????????
- Venn diagram KEGG pathway enrichment analysis
- Venn diagram
- ???????????????????
- ????libraries???contigs????
- KEGG PATHWAY database
- ???????????????
32????--???????????
33????--???????????
The selected sets Complement of the selected sets Total
The number of selected pathway a ß aß
The number of complement of the selected pathway ? d ?d
Total number a? ßd aß?d(k)
34?????
35??--??EST???????????
- ?????????Contig I?Contig II ???1/3?ESTs
- ?de novo ???Contig III ???????ESTs
36??--???????
number Avg. length Min. length Max. length
ESTs 161,241 494 19 2,143
Contig I 3,361 839 83 2,789
Contig II 920 712 99 2,199
Contig III 12,605 635 80 4,501
Singleton III 20,515 400 19 2,143
- ?de novo ???contig ????? (Contig III )
- Contig III ????????4000bp ???
- ??????ESTs ????contig
37??--?????????????
- Contig???300900bp ?800bp??
- Singleton???200800bp ?200bp??
38??
- Contig I?46,471?ESTs??,Contig II?7,501?ESTs??
- ???????????,???????????
- ?????????EST??????
- Contig III?86,754?ESTs??
- ???????ESTs ???????????????
- ???????????
39??
- ???????????????
- Contig???ESTs ????
- Singleton ???????
?? ?? ????
?? ??? OLeary et al. Clavero-Sales et al. Tassanakajon et al. Leu et al. Jianhai et al.
all EST 161,241 13,656 601 10,100 15,981 10,446
EST in contig 140,726 (87) 8,171 (59.8) 404 (67) 6,172 (61) 7,723 (48) 8,725 (83.5)
singleton 20,515 (12.7) 5,484 (40.2) 197 (33) 3,928 (39) 8,258 (52) 1,721 (16.5)
40??--??????
- blastx nr database ? 11,565? 30.92
- rpsblast Pfam database ? 15,398? 41.17
???32 ??????40 ????28
41??
- nr database?11,565?,?78???Pfam database???????
- ??Pfam??????
- nr Pfam ? 48
42??--???ESTS????
43??--???ESTS????
eyestalk gills hemocyte hepatopancreas lymphoid organ nerve cord
EST 29575 24296 27369 22272 24214 20179
annotated ESTs () 20712 (70.03) 15166 (62.42) 17183 (62.78) 16641 (74.72) 16006 (66.10) 12374 (61.32)
contigs 3743 2905 2973 2825 2530 2836
singleton 1567 964 1025 1456 785 1126
- ???? -- EST????
- ????? ????EST??????
44??--????????? (GO - CC)
45??--????????? (GO - MF)
46??--????????? (GO - BP)
47??
- ???????????
- ?????????(GO term) ???
- ???????level 1???????
48??--???????????
- ?DDD???,??????????????
- ?? - 451? contigs
- ? - 530? contigs
- ??? - 572? contigs
- ??? - 732? contigs
- ????- 590? contigs
- ??? - 410? contigs
49??-- GO ENRICHMENT ANALYSIS (MF)
Tissue GO terms Proportion () P-value
eyestalk structural constituent of cuticle 13.3 3.14E-34
pattern binding 3.33 2.09E-08
carbohydrate binding 3.33 2.84E-08
structural constituent of ribosome 6.21 4.49E-08
gills structural constituent of cuticle 4.72 1.19E-05
structural constituent of ribosome 4.34 0.000247
hemocyte structural constituent of cuticle 4.9 1.77E-06
hepatopancreas hydrolase activity 12.70 7.52E-33
pattern binding 4.64 2.68E-22
carbohydrate binding 4.64 5.82E-22
substrate-specific transporter activity 3.14 4.69E-07
lymphoid organ structural constituent of ribosome 4.92 3.17E-06
pattern binding 2.37 3.50E-06
carbohydrate binding 2.37 4.57E-06
structural constituent of cuticle 4.4 2.56E-05
ion binding 3.56 0.000697
nerve cord structural constituent of cuticle 4.39 0.000456
50??
- ??????????? ? ?????????????
- ????????? ? ????????????
- structural constituent of cuticle
?????????(??????????) - ????????cuticle protein???
- ???????cuticle proteins????????cuticle
- ???????????
51?? -- VENN DIAGRAM
- ???? ? 3,742 ?????
- ????? ? 2,825 ?????
- ?Venn Diagram ????? 1,002 ?????
52?? KEGG PATHWAY
eyestalk_only hepatopancreas_only intersection
contigs 2740 1823 1002
E-valuelt10-3 1310 (48) 1146 (63) 759 (76)
associated pathway 247 238 205
- Intersection ???????????
- Eyestalk_only ???????????
53?? KEGG PATHWAY ENRICHMENT ANALYSIS
eyestalk_only P-value hepatopancreas_only P-value intersection P-value
Tight junction 2.41E-14 Metabolic pathways 1.13E-57 Ribosome 9.91E-75
Regulation of actin cytoskeleton 4.88E-13 Betalain biosynthesis 2.3E-44 Metabolic pathways 8.16E-45
Focal adhesion 5.83E-13 Isoquinoline alkaloid biosynthesis 2.3E-44 Oxidative phosphorylation 9.92E-38
Glycosphingolipid biosynthesis - ganglio series 5.67E-11 Riboflavin metabolism 4.2E-43 Parkinson's disease 3.15E-37
Glycosphingolipid biosynthesis - globo series 1.66E-10 Melanogenesis 4.98E-36 Huntington's disease 3.85E-33
GnRH signaling pathway 2.58E-10 Tyrosine metabolism 3.99E-35 Alzheimer's disease 1.63E-30
Leukocyte transendothelial migration 7.57E-10 Biosynthesis of secondary metabolites 1.59E-34 RNA transport 3.79E-13
Glycosaminoglycan degradation 3.02E-09 Protein digestion and absorption 3.24E-20 Cardiac muscle contraction 2.91E-12
Dilated cardiomyopathy 4.19E-09 Tuberculosis 1.72E-19 Proteasome 2.75E-11
Viral myocarditis 1.21E-08 Pancreatic secretion 7.25E-18 Carbon fixation in photosynthetic organisms 1.09E-10
54??
- Intersection Ribosome
- ??????142?contigs,??????????
- Hepatopancreas_only Betalain biosynthesis
- betalain?????????????????
- ????????????(hemocyanin)
- ??????????????
- eyestalk_only Tight junction
- ??????????actin myosin
- actin?myosin???????????
- ???????????????(membrane vesicle)?????(cell
crawling) - ?????????
55??
- ??GO ?KEGG PATHWAY ,??????????
- ?????????????
- ????????????
- ???????
- ?????????
- Functional enrichment???????????????
- ????
- ??????
- ??????
- ????????
- ?????????????
- ????,?????????????,?????????functional
enrichment?????,???????????????
56??
57??
- ?????????????--?????,???????,??????(Mapping)?de
novo assembly?????,????????? - ?? 20,515?singletons?16,886?contigs,?37,401??????
- ??nr ? Pfam ???????,?48 ????????????
- ???????????????,????????????
- ??????????????????????,?????????????
- ?????????????actin?myosin?????
- ??????????,??????? (hemocyanin)?
58Thank you for your attention
58
59??
- Contig III???????4,000 bps?????
- nr???,??????neuroblast differentiation-associated
protein AHNAK - ORFinder
- 2????(Frame)????,???????3????ORF??
- ???ORF?2,574 bps,????857 aa
- ???ORF????neuroblast differentiation-associated
protein AHNAK??????,?2??????????ORF,?????neurobla
st differentiation-associated protein
AHNAK??????,?????????????neuroblast
differentiation???
60??
- ?????????????--?????,???????,??????(Mapping)?de
novo assembly?????,?????????
61Introduction
- Assembly
- reconstruct contigs from a set of partially
overlapping sequence
sequences
contigs
61
62Introduction
- map on genome
- align ESTs to reference genome
reference sequence
ESTs
62
63Introduction
reference sequence
ESTs
- localization of ESTs
- cluster ESTs of same gene first, then construct
contigs
63
64Introduction
- de novo assembly
- use in non-model organism with no complete genome
- use overlap and consensus region of sequence to
assembling
64
65Introduction
- Daphnia pulex (water fleas)
- a crustacean arthropod
- live in ponds and lakes
- draft genome are published in 2010
(Colbourne, et al. 2011)
66purpose
- Providing more gene information of L. vannamei by
integrate functional and comparative genomics.
66
67EST of the white shrimp from NCBI
Daphnia pulex as reference genome
Annotated the ESTs of white shrimp
Comparative analysis
67
68Materials and Methods
69Materials and Methods
??????????????????????
http//genome.jgi-psf.org/Dappu1/Dappu1.home.html
70Analysis pipeline
blast
L.v est
Water fleas genome (transcript)
??best hit
hit????gene?est
???????gene?est
cluster together
de novo assembly
assembly
contig
no assembly
contig
70
71Analysis pipeline
contig
no assembly
domain mapping
similarity search
Gene Ontology
UniProt
KEGG
comparative analysis
physiology comparative
Drosophila comparative
other penaeid shrimps
71
72Results
72
73Results
L. vannamei (ESTs) D. pulex (transcripts)
sequence number 161241 30907
average length 494 1061
maximum length 2143 24144
minimum length 19 150
GC ratio 0.43 0.46
73
74Results
75Discussions
- repeat regions include in ESTs sequence
- assembly to wrong site
Assembly
75
76Discussions
- diverse quantity
- normalization or subtraction
- gene with low expression rate
- no sequence data
- remove high abundant gene
- e.g., mitochondria sequence
76
77References
- Food and Agriculture Organization of the United
Nations - ??? (2006) ???????????????????????????????????????
?????? - ??? (2007) ????? (SPF) ???????????????????????
- ??????????
- Adams MD, Kelley JM, Gocayne JD, Dubnick M,
Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde
B et al (1991) Complementary DNA sequencing
expressed sequence tags and human genome project.
Science 25216511656 - Colbourne JK et al. (2011) The Ecoresponsive
Genome of Daphnia pulex. Science 331 555-561 - Leu, J. H., Chen, S. H., Wang, Y. B., Chen, Y.
C., Su, S. Y., Lin, C. Y., Ho, J. M., Lo, C. F.
(2010) A review of the major penaeid shrimp EST
studies and the construction of a shrimp
transcriptome database based on the ESTs from
four penaeid shrimp. Marine Biotechnology,
(e-publish first, 17 April 2010,
doi10.1007/s10126-010-9286-y) - Montagné N, Desdevises Y, Soyez D, Toullec JY
(2010) Molecular evolution of the crustacean
hyperglycemic hormone family in ecdysozoans. BMC
Evol Biol 10. doi10.1186/1471-2148-1062 - Nagaraj SH., Gasser RB., Ranganathan A. (2006) A
hitchhikers guide to expressed sequence tag
(EST) analysis. Briefings in Bioinformatics 8,
621. - Oleary NA, Trent HF III, Robalino J, Peck MET,
Mickillen DJ, Gross PS et al. (2006) Analysis of
multiple tissue-specific cDNA libraries from the
Pacific whiteleg shrimp, Litopenaeus vannamei.
Integrative and Comparative Biology 46 931-939
77
78Thank you for your attention
78
79Integrative Functional and Comparative
Genomicsfor Pacific White Shrimp, Litopenaeus
vannamei
- ???? ??? ??
- ??? ???
- 2011/04/14
80Introduction
1968
1987
1988
1998
2003
(?, 2006) (?, 2007)
81Introduction
81
????FAO
82Introduction
- ?????????
- ???
- ????????
- ??????
- ????
??????????????
83Introduction
- EST-expressed sequence tag is a segment sequence
of a cDNA clone - partial sequence
- provide information of transcript
- low cost than sequencing full length of a cDNA
(Adams, et al. 1991)
84Introduction
organism
organ
tissue
or
or
mRNA
reverse transcriptase
cDNA
double strand cDNA
cDNA lirary
84
85Selected a cDNA clone randomly
double strand cDNA
5end sequencing
3end sequencing
or
ESTs
85
86Introduction
- EST can
- discover novel gene
- profile a expression gene pattern
- facilitate proteome analysis
- guide single nucleotide polymorphism
characterization - genome annotation
(Nagaraj, et al. 2006)
87Introduction
- Assembly
- reconstruct contigs from a set of partially
overlapping sequence
sequences
contigs
87
88Introduction
- has reference genome
- align ESTs to reference genome
reference sequence
est
88
89Introduction
reference sequqnce
est
- localization of ESTs
- cluster ESTs of same gene first, then construct
contigs
89
90Introduction
- de novo assembly
- use overlap and consensus region of sequence to
assembling
90
91Introduction
- de novo assembly
- use in non-model organism with no complete genome
91
92Introduction
- Daphnia pulex (water fleas)
- a crustacean arthropod
- live in ponds and lakes
- draft genome are published in 2010
(Colbourne, et al. 2011)
93Introduction
93
(Montagné, et al. 2010)
94purpose
- Providing more gene information of L. vannamei by
integrate functional and comparative genomics.
94
95EST of the white shrimp from NCBI
Daphnia pulex as reference genome
Annotated the ESTs of white shrimp
Comparative analysis
95
96Materials and Methods
97Materials and Methods
- Litopenaeus vannamei
- Eukaryota
- Metazoa
- Arthropoda
- Crustacea
- Malacostraca
- Eumalacostraca
- Eucarida
- Decapoda
- Dendrobranchiata
- Penaeoidea
- Penaeiodae
-
Litopenaeus -
??????????????????????
97
98Materials and Methods
- Daphnia pulex
- Eukaryota
- Metazoa
- Arthropoda
- Crustacea
- Branchiopoda
- Diplostraca
- Cladocera
- Anomopoda
- Daphniidae
- Daphnia
http//genome.jgi-psf.org/Dappu1/Dappu1.home.html
98
99Analysis pipeline
blast
L.v est
Water fleas genome (transcript)
??best hit
hit????gene?est
???????gene?est
cluster together
de novo assembly
assembly
contig
no assembly
contig
99
100Analysis pipeline
contig
no assembly
domain mapping
similarity search
Gene Ontology
UniProt
KEGG
ppi
comparative analysis
physiology comparative
Drosophila comparative
other penaeid shrimps
100
101Results
101
102Results
L. vannamei (ESTs) D. pulex (transcripts)
sequence number 161241 30907
average length 494 1061
maximum length 2143 24144
minimum length 19 150
GC ratio 0.43 0.46
102
103Anticipated result
eyestalk
ventral nerve cord
103
104Anticipated result
- white shrimp v.s other penaeid shrimp
104
105Anticipated result
- White shrimp v.s Drosophila
105
106Discussions
- repeat regions include in ESTs sequence
- assembly to wrong site
Assembly
106
107Discussions
- diverse quantity
- normalization or subtraction
- gene with low expression rate
- no sequence data
- remove high abundant gene
- e.g., mitochondria sequence
107
1084? 5? 6? 7? 8?
EST clustering
blast to database
data analysis
thesis
108
109References
- Food and Agriculture Organization of the United
Nations - ??? (2006) ???????????????????????????????????????
?????? - ??? (2007) ????? (SPF) ???????????????????????
- ??????????
- Adams MD, Kelley JM, Gocayne JD, Dubnick M,
Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde
B et al (1991) Complementary DNA sequencing
expressed sequence tags and human genome project.
Science 25216511656 - Colbourne JK et al. (2011) The Ecoresponsive
Genome of Daphnia pulex. Science 331 555-561 - Leu, J. H., Chen, S. H., Wang, Y. B., Chen, Y.
C., Su, S. Y., Lin, C. Y., Ho, J. M., Lo, C. F.
(2010) A review of the major penaeid shrimp EST
studies and the construction of a shrimp
transcriptome database based on the ESTs from
four penaeid shrimp. Marine Biotechnology,
(e-publish first, 17 April 2010,
doi10.1007/s10126-010-9286-y) - Montagné N, Desdevises Y, Soyez D, Toullec JY
(2010) Molecular evolution of the crustacean
hyperglycemic hormone family in ecdysozoans. BMC
Evol Biol 10. doi10.1186/1471-2148-1062 - Nagaraj SH., Gasser RB., Ranganathan A. (2006) A
hitchhikers guide to expressed sequence tag
(EST) analysis. Briefings in Bioinformatics 8,
621. - Oleary NA, Trent HF III, Robalino J, Peck MET,
Mickillen DJ, Gross PS et al. (2006) Analysis of
multiple tissue-specific cDNA libraries from the
Pacific whiteleg shrimp, Litopenaeus vannamei.
Integrative and Comparative Biology 46 931-939
109
110??--???????????
Tissue GO terms Proportion () P-value
eyestalk non-membrane-bounded organelle 6.65 8.20E-09
ribonucleoprotein complex 6.2 1.65E-07
intracellular organelle 7.98 1.77E-07
intracellular 5.32 5.09E-06
intracellular part 10.42 0.000104
protein-DNA complex 1.33 0.000522
gills non-membrane-bounded organelle 4.72 8.45E-05
ribonucleoprotein complex 4.34 0.000745
hemocyte
hepatopancreas membrane part 7.38 8.32E-07
lymphoid organ non-membrane-bounded organelle 5.25 8.89E-07
intracellular 4.75 7.48E-06
ribonucleoprotein complex 4.92 1.64E-05
intracellular organelle 5.93 0.000221
intracellular part 9.32 0.000672
nerve cord
111Tissue GO terms Proportion () P-value
eyestalk macromolecule metabolic process 13.3 5.01E-09
primary metabolic process 14.19 2.75E-08
biosynthetic process 7.54 2.39E-05
multicellular organismal reproductive process 1.33 0.000178
gills
hemocyte
hepatopancreas primary metabolic process 18.85 4.88E-28
macromolecule metabolic process 16.67 2.67E-25
nitrogen compound metabolic process 5.19 7.11E-07
cellular process involved in reproduction 0.82 2.62E-06
lymphoid organ biosynthetic process 7.29 5.97E-06
macromolecule metabolic process 9.83 0.000197
primary metabolic process 10.85 0.000265
nerve cord