Title: WWW.NCBI.NLM.NIH.GOV
1WWW.NCBI.NLM.NIH.GOV PubMed Scientific
Journals Entrez Keyword Search of
Database BLAST Sequence Queries OMIM Online
Mendelian Inheritance in Man Books TaxBrowser S
tructure 3D Molecular Structures
2Sequence Files Since the information relevant to
biological processes is contained in the gene or
protein sequence, all genetic and protein data
are contained in sequence files. Importantly,
there is a directionality that exists in nature
that is conserved in the sequence file Nucleic
Acids are always written 5 to 3 (describing the
5 or 3 free hydroxyl group used in the
phosphodiesterase bond). nucleic acids (genes)
5-AGCTCGTGTAGACCATTC-3 Amino Acids are always
written with the free amino (N-terminus) first
and the carboxylic acid (C-terminus)
last. amino acids (proteins)
amino-IPKERYRGQIESIWA-carboxy
3DNA is Double Stranded Anti-parallel
Configuration Top strand is ALWAYS written 5 to
3 When DNA is written in file, top strand is
represented and bottom strand is assumed.
3 5
5 3
5 3
3 5
AGTCGTGATCTGCTAAATGTCTCGAAGTTCGATGCTAG
TCAGCACTAGACGATTTACAGA
GCTTCAAGATACGATC
Courier font is preferred for writing sequence
data since letter spacing is independent of
character content.
4gtgi1924939embX98411.1HSMYOSIE Homo sapiens
partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAA
GATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC
TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTC
TATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGC
AGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT
TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAA
CTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAG
GCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC
TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACA
AGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCAC
CATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC
AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGC
GCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCA
GCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC
CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTT
CCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGC
TCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT
GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCA
AGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATA
CCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT
TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTG
ACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCC
AGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC
CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCT
CCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATC
CAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA
GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGA
GGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGA
GGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC
GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCC
ATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGC
GAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT
GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCT
CAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGAC
AGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT
TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCT
TCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGG
CGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT
TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGAT
GGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTA
AACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT
GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGG
GGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCA
CAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG
CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCA
ACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGG
GCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC
GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGAT
GTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGG
AAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG
GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTG
GGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGG
GAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG
CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCT
GGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCC
TCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT
GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAA
GAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTG
GGGAGGGGGGGCCGGAATCCGC
FASTA File Format
5FASTA File Format
- A sequence in FASTA format begins with a
single-line description, followed by lines of
sequence data. - 1) The description line starts with a greater
than symbol ("gt"). - 2) The word following the greater than symbol
("gt") immediately is the "ID" (name) of the
sequence, the rest of the line is the
description. The "ID" and the description are
optional. - 3) All lines of text should be shorter than 80
characters. - 4) The sequence ends if there is another greater
than symbol ("gt") symbol at the beginning of a
line and another sequence begins.
6FASTA File Format
The following example contains two protein
sequences (Example1, Example2) gtExample1
envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVS
VVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYN
LTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWF
NCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQ
RTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPK
NRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLL
AGILQQQKNL LAAVEAQQQMLKLTIWGVK gtExample2
synthetic peptide HITREPLKHIPKERYRGTNDTLSPQIESIWAA
ELDRYKLVKTNCSNVS
7FASTA File Format
- Sequences are expected to be represented in the
standard IUB/IUPAC amino acid and nucleic acid
codes, with these exceptions - Lower-case letters are accepted and are mapped
into upper-case - A single hyphen or dash can be used to represent
a gap of indeterminate length - In amino acid (protein) sequences, U and are
acceptable letters. - N for unknown nucleic acid residue or X for
unknown amino acid residue. - mRNA is often listed as cDNA, and the U is
replaced with T
The nucleic acid codes supported are A ?
adenosine M ? A C (amino) C ? cytidine S ?
G C (strong) G ? guanine W ? A T (weak) T ?
thymidine B ? G T C U ? uridine D ? G A
T R ? G A (purine) H ? A C T Y ? T C
(pyrimidine) V ? G C A K ? G T (keto) N ? A
G C T (any) - ? gap of indeterminate length
8FASTA File Format
For those programs that use amino acid (protein)
query sequences (e.g. BLASTP and TBLASTN), the
accepted amino acid codes are
A ? alanine P ? proline B ? aspartate Q ?
glutamine C ? cystine R ? arginine D ?
aspartate S ? serine E ? glutamate T ?
threonine F ? phenylalanine U ?
selenocysteine G ? glycine V ? valine H ?
histidine W ? tryptophan I ? isoleucine Y ?
tyrosine K ? lysine Z ? glutamine L ?
leucine X ? any M ? methionine ?
translation stop N ? asparagine - ? gap of
indeterminate length
9gtgi1924939embX98411.1HSMYOSIE Homo sapiens
partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAA
GATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC
TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTC
TATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGC
AGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT
TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAA
CTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAG
GCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC
TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACA
AGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCAC
CATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC
AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGC
GCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCA
GCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC
CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTT
CCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGC
TCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT
GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCA
AGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATA
CCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT
TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTG
ACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCC
AGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC
CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCT
CCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATC
CAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA
GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGA
GGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGA
GGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC
GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCC
ATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGC
GAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT
GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCT
CAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGAC
AGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT
TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCT
TCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGG
CGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT
TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGAT
GGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTA
AACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT
GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGG
GGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCA
CAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG
CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCA
ACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGG
GCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC
GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGAT
GTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGG
AAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG
GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTG
GGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGG
GAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG
CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCT
GGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCC
TCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT
GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAA
GAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTG
GGGAGGGGGGGCCGGAATCCGC
FASTA File Format
10gtgi1924940embCAA67058.1 myosin-IF Homo
sapiens QEKLTSRKMDSRWGGRSESINVTLNVEQAAYTRDALAKGLY
ARLFDFLVEAINRAMQKPQEEYSIGVLDI YGFEIFQKNGFEQFCINFVN
EKLQQIFIELTLKAEQEEYVQEGIRWTPIQYFNNKVVCDLIENKLSPPGI
MSVLDDVCATMHATGGGADQTLLQKLQAAVGTHEHFNSWSAGFVIHHYA
GKVSYDVSGFCERNRDVLFSD LIELMQSSDQAFLRMLFPEKLDGDKKGR
PSTAGSKIKKQANDLVATLMRCTPHYIRCIKPNETKHARDWE
ENRVQHQVEYLGLKENIRVRRAGFAYRRQFAKFLQRYAILTPETWPRWRG
DERQGVQHLLRAVNMEPDQY QMGSTKVFVKNPESLFLLEEVRERKFDGF
ARTIQKAWRRHVAVRKYEEMREEASNILLNKKERRRNSINR
NFVGDYLGLEERPELRQFLGKKERVDFADSVTKYDRRFKPIKRDLILTPK
CVYVIGREKMKKGPEKGPVC EILKKKLDIQALRGVSLSTRQDDFFILQE
DAADSFLESVFKTEFVSLLCKRFEEATRRPLPLTFSDTLQF
RVKKEGWGGGGTRSVTFSRGFGDLAVLKVGGRTLTVSVGDGLPKNSKPTG
KGLAKGKPRRSSQAPTRAAP GAPQGMDRNGAPLCPQGGAPCPLEKFIWP
RGHPQASPALRPHPWDASRRPRARPPSEHNTEFLNVPDQGM
AGMQRKRSVGQRPVPVGRPKPQPRTHGPRCRALYQYVGQDVDELSFNVNE
VIEILMEDPSGWWKGRLHGQ EGLFPGNYVEKI
FASTA File Format
lt?xml version"1.0"?gt lt!DOCTYPE TSeq PUBLIC
"-//NCBI//NCBI TSeq/EN" "http//www.ncbi.nlm.nih.g
ov/dtd/NCBI_TSeq.dtd"gt ltTSeqgt ltTSeq_seqtype
value"nucleotide"/gt ltTSeq_gigt1924939lt/TSeq_gi
gt ltTSeq_accvergtX98411.1lt/TSeq_accvergt
ltTSeq_taxidgt9606lt/TSeq_taxidgt
ltTSeq_orgnamegtHomo sapienslt/TSeq_orgnamegt
ltTSeq_deflinegtHomo sapiens partial mRNA for
myosin-IFlt/TSeq_deflinegt ltTSeq_lengthgt2711lt/TS
eq_lengthgt ltTSeq_sequencegtCAGGAGAAGCTGACCAGCCG
CAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGT lt/
TSeqgt
TinySeq XML
11FASTA File Format(note U T) gtgi1234my name
from genetic code in DNA ATGATTTGTCACGCTGAGCTC-AAA
GCT AACGAGTAA gtgi1234my name translated into
protein MICHAEL-KANE
A ? alanine P ? proline B ? aspartate Q ?
glutamine C ? cystine R ? arginine D ?
aspartate S ? serine E ? glutamate T ?
threonine F ? phenylalanine U ?
selenocysteine G ? glycine V ? valine H ?
histidine W ? tryptophan I ? isoleucine Y ?
tyrosine K ? lysine Z ? glutamine L ? leucine
X ? any M ? methionine ? translation
stop N ? asparagine - ? gap of
indeterminate length
12Where do we get DNA sequence information? DNA
Sequencing Methods -conversion of
biological/bioanalytical data into sequence
information There are automated, high-throughput
sequencing centers that COMPLETELY automate
(robotics and information systems) DNA
sequencing, preliminary identification and
publishing.
13DNA Sequencing (old method)
5-AAACCAGGCCGATAAGGTACTACACGAAAAAAA-3
TTTTTTT
AAACCAGGCCGATAAGGTACTACACGAAAAA
Step 1. Extend complementary sequence using
free nucleotides with limiting amounts of
radioactive terminating nucleotides. Step 2.
Run product out on a electrophoresis gel. Step
3. Place gel against radiographic film, develop.
14DNA Sequencing new method)
http//users.rcn.com/jkimball.ma.ultranet/BiologyP
ages/D/DNAsequencing.html
15GGATCCTGCAAGGAGGGATACAAATTACATACATTTGTCAAAACCCACAG
CATGTTGACCACCAGGAGGAGACCCCATGTGACTCCAGGACCCTGGTTGA
TAACAACGTATCGAGATTCCTCACATGGAACCAGTGCGCTCCTGTGGTGG
AGGGTGTACCTGTGTCAGGGCAGGGGGTACGTGGACATTTTCTGCAGTTT
TTGATCAATTTTGCAATGAACTAAATCTGTGGTATAAAAATAAAGTCTAT
TAAAAGAATCCAAGGCTCCCTCTCATCTCACGATAAGATAAAGTCCCCAT
CCATTTTACTCCTCTCAGCCCTGGAGAAAGGAGAGGCCAGGTCCCACCAC
CTTCCACCAGCATGGACCCCCAGTCCAGACCCCACGCCTTTTCTCAGCAT
CCTCAGACCAGCAGGACTTGCAGCAATGGGGAATTAGGCACCTGACTTCT
CCTTCATCTACCTTTGGCTGGGGGCCTCCAGCCTTGACCTTCGCTCTGAG
AGTCTCAGGCAGGTCCAGAGCCAGTTCTCCCATGACGTGATATGTTTCCA
GAGCAGGTTCCTGGGTGAGATAAAAGGATTTGGGCTGAACAGGGTGGAGG
GAGCATTGGAATGGCACTCAGGGCAAAGGCAGAGGTGTGCGTGGCAGCGC
CCTGGCTGTCCCTGCAAAGGGCACGGGCACTGGGCACTAGAGCCGCTCGG
GCCCCTAGGACGGTGCTGCCGTTTGAAGCCATGCCCCAGCATCCAGGCAA
CAGGTGGCTGAGGCTGCTGCAGATCTGGAGGGAGCAGGGTTATGAGCACC
TGCACCTGGAGATGCACCAGACCTTCCAGGAGCTGGGGCCCATTTTCAGG
TAAAGCCCTCCCTGGCCCTCGCTGGGAACACCCAGATCCCTGCCCCTGCT
GCCCAGGACCCTGCCAGGCACTCAGCACTGCCATTCCCAGCAGGTCCCGG
CACTCTGCATCCTTTGGAGGATGGGGAAGGAGTGCAGCACATGCTGGTCT
GTGGTGCTGCCAGGGCAGGGGATAGTGCAGAGAAAACCCCAGCTCACTGC
AGAGAGGGCAGGACTCAGAAGCACTAAAGTTGAAAGGTTCCAGGGAGCCA
GCAGGAGGGCTTTAGCTGTGAAGCCGCTAATCCAGGAGCAGGGAGGGTGG
ACAGGAGACACTTTGGATTGGGACTGCAGGGTGGGGCCACGAGGGACATG
ACCCCGTCCAGCAGGGCCTCCTGCTTGGCCCCACAGGTACAACTTGGGAG
GACCACGCATGGTGTGTGTGATGCTGCCGGAGGATGTGGAGAAGCTGCAA
CAGGTGGACAGCCTGCATCCCTGCAGGATGATCCTGGAGCCCTGGGTGGC
CTACAGACAACATCGTGGGCACAAATGTGGCGTGTTCTTGTTGTAAGCGG
CGAGTTGGGAGCTGAGAGCTGGGAGCAGGGTGGGCAGCCTGGGTGTAGGG
GGGAGGCGAGAGAGGTAGGACCCAAAAGCACATCTGCCCTGGGCCCCTGT
GGTGGGCAGTGAGGGTGAGCACCCGGCCCAGAGGACGGCCATCCTGTGGG
GTCGCGTCTGCACTGTGGGTTGGGGAAGCAGGGCGGTGGTGGAGAAATGG
GCACGGGCACCTCTGCAGAGAAGACGCAGAGCAATGAGCCCTTCTGTGTA
GTGAGAACCCGCTCTGCACCAACCTCGGCGGCTGCTTTCTCTTGCGGTCT
GGGGACTGTCCTTCCCATAGGTCAGAAAACTGAGGCCCTGAGAAGGGGAC
TTCCACTGGCCCAGGTCACAGGCTGAGTGCTGAGCCTGGTGTTCGCCGGG
GCCGCAGCCTCCCTCAGGGCGCTCAGGGTCCCTGCAGTCCTGGCAAACCT
TCCTGATGGGGACAGTCCGGGGCAGGAGGCAGGTGGGGACGCAGGTGGCT
GGTGGTTCCGTTGTTCTCAGAAGCAAGGCACAAGGTGGGGCGGTTGATGG
CACTGGGGAGGATGTTTCCTGGCCCGTGGAGAGGGTGGCGCCTGGTCAGG
TGGGCAGGGAGAGGCTGATGCTTGGAGTCGGTCACCTGCAGGGATGTTGT
CATTAGGACGGGGGAAGGACTGGATGAGGATGTCACAGTGGTGACAGCCC
CCACTCCATGGTAGGAAGGGAACGCTATTGGGAATAGTGGGGTTTAGGTA
AAAGGGCACCCGTGGGTCGGGGCCTTCACTGAGGCTGGCCTATAGATGAC
ATCTGGGAGAGAGTCAGGACCCAGGAAGGCAGGTCCAGGA