Title: Application of Bioinformatics on Cancer Research
1??????????????
??????????????????
?????????/????? ????? ?????
2????????
- A. ???????????????,????????????????,??????????????
????????????,???????????????
B. ???,??????????????,??????????,????????????????
3??????????
- ???(???/???)?????
- ????
- ????
- Internet?????????
4?????????
????
???????
Blast
Blast
5?????????????
How is tumor generated ?
61996, 97????????
??????????????,????? VS IBM??
7What Can Bioinformatics Do in Cancer
Research?How to Do?
8???????????
- ?????
- ???????????
- ?????????????
- ??????????????
- ????????
9?????(??Linux???MySQL?????)
- 1) Reference, Locus Link, Unigene,
Mapview???GenBank???? - 2) Gene Ontology ???, ?????,????,?????????????????
????????????
10?????(?)
- 3) UCSC Human Genome ??? (Golden Path)?
- 4) ????Blast??? (nt, nr, human_est, htg,
swissprot, yeast, mouse_est )? - 5) ????????????????(????)
11Computational Velocity Doubled after 18
monthsDNA Data Quantity Doubled after
14 months
12???????????
- ?????
- ???????????
- ?????????????
- ??????????????
- ????????
13???????????
- cDNA, mRNA?EST????????????
- ??????????
- ???? (Gene-gene interaction)????
????????????,?????????????????
14cDNA, mRNA?EST????????????
- a) ?????,??????,????????,?????????,?Fasta?????
- b) ??Reference mRNA???Unigene??,??????,????????
- c) ?????????????????,???????????,???????
15cDNA, mRNA?EST???????????? (?)
- d) ?EST????,??cDNA???????????????
- e) ?????SNP??
- f) ???????? (PDF??)
16Primary Analysis of Lung Cancer SSH cDNA Library
????
17Definition of EST
????
EST (Expressed Sequence Tag) ?cDNA
???????????????,?????500bp?
- ESTs offer a rapid and inexpensive route to gene
discovery, reveal expression and regulation data
(Vasmatis, et al,1998), highlight gene sequence
diversity and splicing (Wolfberg and Landsman,
1997), and may identify more than half of known
human genes (Hillier, et al, 1996).
18Procedures
Sequencing Result
Mask Vector and Format
Blast to Reference mRNA DB
Reference DB None-hit EST
Blast to Human EST DB
Human EST DB None-hit EST
Screened Known Genes
Cluster ESTs by Gene
Blast to Human Genome
Map to Human Genome
New Genes
Garbage ESTs
Gene Expression Map
Point Mutation/SNP Analysis
In silico EST Elongation
19Original sequence Database
??????
20??????????
Cluster Result Database
21???EST
New Gene (EST) Database
22Elongated EST
???EST??????
- gtIDNo2_rlcrt0-000159.fasLength2540
- ......AGCGGGTCCCGCCTCCCAGCGACTCTCGGCAGTGCCGGAGTCGG
GTGGGTTGGCGGCTATAAAGCTGGTAGCGAAGGGGAGGCGCCGCGGACTG
TCCTAGGTACACTTTTCTCATAAAGTTTAGCCTACAGAAACTATCGCCAC
CCAAATTAAACATCACCCAAGCTAATATTCTTTCCTCCTTCTAAAGATGA
GCTAGCGAAACTTTTTATAGGTTGTCCCTTTAATGCAGCTTTTTAGAATA
AACATTTTTACATTTTTTCTTAAAAGAATTATTTTTTGAAGTCTGAGGAA
AAATCCGCTTGCCTAGTGAATTTGGCACACACAGAGTAACAACAAATCAA
ACTTTAAGCTAGCAACCAACACACAAAATAAGCATGCAAGGAATAGAATA
AGTTTTATATGGATAAGGTATTTTAGCCAACTCCACTTATAAGGTATTAC
AAAATCTCTATATNGTTTTGAAGCTATGTGTCGCAGTTTAAAGTTACTTT
TAACAATAATACGTATATTTACAATTGACTTAAAAAACTATTTTCAAGGA
AGTTAGAAACCTATGGCACACCAACGCATCTTCTGGAAAATGAAGACGAT
ACAATGTCATGTGGCAAGTTTCAATATATGAAGGACTAGACCAGTG....
..........
23Using Reference mRNA Database Blast Output to
Search Mutation/SNP
24Mutations Found from Blast Output Analysis
???/SNP????
25Deletion (ClustalX 1.82)
???/SNP???? (?)
26???/SNP????
Insertion (ClustalW 1.82)
27???/SNP???? (?)
Here - means insertion
28(No Transcript)
29Further Analysis
30??????
6 SSH Libraries Gene Expresstion Map to Human
Genomes(122X)
31Expression level of genes in SSH libraries
Different Colors correspond to different libraries
32Is there a LOH?
Expressed in two down-regulated libraries
33LOH map vs. SSH map
Lung Cancer Related LOH
Lung Cancer Related SSH
34??????????
- a) ?raw data???????,???thredshold?(???????????,???
???) - b) ??R/S,SAS???????????????????????
- c) ??????????????,???????????????????????????arra
y?????????,??????????(hierarchical, SOM and
K-means clustering) ??Gene Ontology, Biocarta,
KEGG???????pathway???
35Normalization
????
36PathWay Analysis
????
37????
Genome-wide Gene Expression Map and Analysis of
Non-Small Cell Lung Cancer Based on Microarray
38PNAS November 20, 2001 vol. 98 no. 24
39Original Array Data
- Chip Human U95A oligonucleotide probe arrays
(Affymetrix, SantaClara, CA) 12,600 cDNA clone - Sample
- Squamous cell lung carcinomas 21
- Adenocarcinomas 127
- Normal Lung 17
- Array data (normalized)
40Analysis Procedures
- Finding genes from 12,600 cDNA clone
- Get 7932 genes
- Flagging the values lower than threshold value
- About half values keeped
- Doing T-test with SAS/R for each gene
- Hierarchical Clustering
- Divide into two parts up-regulated and
down-regulated - Construction of Gene Expression Map and
Transcriptome Map
41Clustering Result
42??????? (3????)
?????????(3????)
43Analysis Procedures (Cont.)
- High-resolution detection of differentially
expressed chromosomal regions in NSCLC was
obtained by using moving-median method - Screening of important NSCLC-related gene
44Results
???????????????
- 75 (24 of 32) of our results were consistent
with the previous studies. And the counterparts
in other reports, normally with larger sizes,
were narrowed down and many specific genes
involved in these regions were identified. - 4 new aberrant regions in squamous carcinoma,
2q31-32, 12q23-24, 14q22-q24 and Xp11.4-p11.23,
were discovered.
45???? (Gene-gene Interaction) ????
- a) ?????,? GO??????????,?? extracellular???????
- b) ??GO, BioCarta?Kegg???????????????????,?????
- c) ?????????,????????????,?????????????
- d) ????????(????
46????
Gene Ontology Pathway Network
??DAG (???), ???????
47????
BioCarta Cell Cycle Pathway
?????????
48????
KEGG ???? Pathway
?????????
49?????????????
????
- GO0003673-gtbiological_process-gtphysiological
processes-gtcell growth and/or maintenance(D10S170)
-gtcell proliferation(FTH1,AKR1C3,C20orf1)-gtcell
cycle(AHR,BUB1,STAG2)-gtDNA replication and
chromosome cycle-gtchromosome segregation(STAG2)-gtm
itotic chromosome segregation
50???????????????????,?????????????
????
- 26 N 15 219 T 78 429 nucleotide binding
- 32 N 28 396 T 120 728 nucleic acid binding
- 2 N 1 31 T 21 91 structural constituent of
ribosome - 47 N 0 0 T 1 7 apoptosis inhibitor activity
- 38 N 0 0 T 10 50 transcription factor activity
- 37 N 1 13 T 1 13 enzyme inhibitor activity
- 46 N 6 46 T 6 46 metal ion binding
51??
- ???????,??????????????????????
- 1)??????????????????????,?????,????????
- 2)????????????????????
- 3)???????????????????
- 4)????????????????
52???????????
- ?????
- ???????????
- ?????????????
- ??????????????
- ????????
53?????????????
- 1) ??mRNA/cDNA???siRNA??????siRNA???????,?????
Human Genome???????,???????????????? - 2) ??????????,?????? cDNA?????cDNA??(image
clone)??? - 3) DNA???????cDNA????????????????????????
54?????????????(?)
- 4) ????????????????? (????)???,?????????,?Gene
Ontology??????????????domain?????????? - 5) ????accession number,??????????????????????,???
????????
55?????????????(?)
- 6) ???SAGEmap??,??????????NCBI SAGEmap????????????
- 7) DNA/RNA ??ORF?????,???????,cDNA?EST???ORF??????
??? - 8) ??EST???????cDNA???
56?????????????(?)
- 9) ???????????,????????????,????????????????????ov
erlap??????,????????????? - 10) ??Blast??????
- 11) ???????contig????????????????,????????
57????
??Gene Ontology????EMBOSS?????SSH?????????(??2???
?)
58Screened siRNA target sites for X1blue
????
- gtIDX1_blueNonsense0Length21GC38A8G5C3
T5 - AAAGATGTGGAAAGTTACCTC
- siRNA
- Sense AGAUGUGGAAAGUUACCUC
UU - Antisense GAGGUAACUUUCCACAUCU
UU - Negative Control Sense GGAUGUACGGCAAAUUCUAUU
- Negative Control Antisense UAGAAUUUGCCGUACAUCCUU
59???SAGEmap??
????
60?accession number??????????????????????
????
gtNM_005400 Homo sapiens protein kinase C, epsilon
(PRKCE), mRNA. CTCCCCGCCCCGACCATGGTAGTGTTCAATGGCCT
TCTTAAGATCAAAATCTGCGAGGCC GTGAGCTTGAAGCCCACAGCCTGG
TCGCTGCGCCATGCGGTGGGACCCCGGCCGCAGACT TTCCTTCTCGACC
CCTACATTGCCCTCAATGTGGACGACTCGCGCATCGGCCAAACGGCC.
gtNM_005813 Homo sapiens protein kinase C, nu
(PRKCN), mRNA. AAAGTTCATCCCCCCAGAATGAAAATGAGGACATT
TGAGAAGGTGATTTAAGGTGTGGAC ATTTGAGAAGGTGTCCTATCAAAT
TAGTAAACCAAAGGAAAAGTACTGAATAGATTAATC gtHSPKCB2A
Human mRNA for protein kinase C (PKC) type beta
II. CAGAGCCGGCGCAGGGGAAGCGCCCGGGGCCCCGGGTGCAGCAGCG
CCCGCCGCCTCCCG
- NM_002737
- NM_002738
- X07109
- NM_002739
- NM_002740
- NM_006255
- NM_005400
- NM_002742
- NM_005813
- L07032
- NM_002744
- NM_006254
61??
- ??????????????,????????
- ??????????????????????????????????????,???????????
???????????
62???????????
- ?????
- ???????????
- ?????????????
- ??????????????
- ????????
63??????????????
- EMBOSS (????????????,?????Linux?????GCG???,???????
? ) - JaMBW (Java based Molecular Biologists
Workbench)???????????,?????? European Molecular
Biology Laboratory of Heidelberg?Java????????????,
???????????????,????,???? ?
64??????????????(?)
- SMS (The Sequence Manipulation
Suite)???DNA???????????????,????????
???????????HTML??,??????????,????HTML????????? - PredictProteinProtein sequence analysis, and
structure prediction. - GeneScan ?????genomic DNA??????????
- RepeatMasker ??????
65??????????????(?)
- Cross_match/Consed/phred/Phrap/phd2fasta
???????,?????????????,??Contig ??????? - Primer 3 ??web?PCR???????
- sim4 EST?????????
- HMMER ???????????(HMM)?????????,?????????????????
? - ClusterW/X ??????????
- Artemis DNA????????????
- BioPerl ?????????Perl???
66(No Transcript)
67???????????
- ?????
- ???????????
- ?????????????
- ??????????????
- ????????
68???????????????
- 2000?????????
- 2001?4?????? HP-LH3000
- 2001?9???????????????????(????????????
24??),?????? - 2002????????????????
69????????? http//bioinfor.cicams.ac.cn
70- ???
- ????? ?????
- ???? ???520??
- ???? 67781331-8665
- Email xyzh_at_263.net.cn
- ??http//bioinfor.cicams.ac.cn
?????????????????.doc??, ????8665???email?
71Thanks !