Title: Biomedical Informatics
1Biomedical Informatics CIT 499B
Michael D. Kane, Ph.D. FALL 2009
2The Cell is a Living Machine
3DNA is Information Storage
4Zipped Files
Decompression
Executable Files
5DNA is Double Stranded One strand is the
coding strand and the other strand is there to
stabilize the DNA sequence when not in use.
Double-stranded DNA is very durable in our
environment.
6CAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGAC
TCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCA
CCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAG
AAGAGGCCTACTCAAATCCTTTCTGAGGTTCCGAGAGAAATATGGGGACG
TCTTCACGGTACACCTGGGACCGAGGCCCGTGGTCATGCTGTGTGGAGTA
GAGGCCATACGGGAGGCCCTTGTGGACAAGGCTGAGGCCTTCTCTGGCCG
GGGAAAAATCGCCATGGTCGACCCATTCTTCCGGGGATATGGTGTGATCT
TTGCCAATGGAAACCGCTGGAAGGTGCTTCGGCGATTCTCTGTGACCACT
ATGAGGGACTTCGGGATGGGAAAGCGGAGTGTGGAGGAGCGGATTCAGGA
GGAGGCTCAGTGTCTGATAGAGGAGCTTCGGAAATCCAAGGGGGCCCTCA
TGGACCCCACCTTCCTCTTCCAGTCCATTACCGCCAACATCATCTGCTCC
ATCGTCTTTGGAAAACGATTCCACTACCAAGATCAAGAGTTCCTGAAGAT
GCTGAACTTGTTCTACCAGACTTTTTCACTCATCAGCTCTGTATTCGGCC
AGCTGTTTGAGCTCTTCTCTGGCTTCTTGAAATACTTTCCTGGGGCACAC
AGGCAAGTTTACAAAAACCTGCAGGAAATCAATGCTTACATTGGCCACAG
TGTGGAGAAGCACCGTGAAACCCTGGACCCCAGCGCCCCCAAGGACCTCA
TCGACACCTACCTGCTCCACATGGAAAAAGAGAAATCCAACGCACACAGT
GAATTCAGCCACCAGAACCTCAACCTCAACACGCTCTCGCTCTTCTTTGC
TGGCACTGAGACCACCAGCACCACTCTCCGCTACGGCTTCCTGCTCATGC
TCAAATACCCTCATGTTGCAGAGAGAGTCTACAGGGAGATTGAACAGGTG
ATTGGCCCACATCGCCCTCCAGAGCTTCATGACCGAGCCAAAATGCCATA
CACAGAGGCAGTCATCTATGAGATTCAGAGATTTTCCGACCTTCTCCCCA
TGGGTGTGCCCCACATTGTCACCCAACACACCAGCTTCCGAGGGTACATC
ATCCCCAAGGACACAGAAGTATTTCTCATCCTGAGCACTGCTCTCCATGA
CCCACACTA
7THEREDCAT_HSDKLSD_WASNOTHOTBUT_WKKNASDNKSAOJ.ASDNA
LKS_WASWET_ASDFLKSDOFIJEIJKNAWDFN_ANDMAD_WERN.JSND
FJN_YETSAD_MNSFDGPOIJD_BUTTHEFOX_SDKMFIDSJIR.JER_G
OTWET_JSN.DFOIAMNJNER_ANDATEHIM.
8Start with a thin 2 x 4 lego block
9(No Transcript)
10What are the comparative genome sizes of humans
and other organisms being studied?
Genome size does not correlate with evolutionary
status, nor is the number of genes proportionate
with genome size.
11Molecular Biology TERMS DNA deoxyribose
nucleic acid, a polymer 4 nucleotides or bases
arranged in an anti-parallel, double helix
structure. Permanent genetic information. Adeni
ne (A) Guanine (G) Thymine (T) Cytosine
(C) RNA Ribose nucleic acid, represented by
4 nucleotides or bases. Transient genetic
information. Same as DNA, except Thymine (T)
is replaced by Uracil (U) mRNA messenger
RNA hnRNA heteronuclear RNA rRNA ribosomal
RNA tRNA transfer RNA Gene subsection of
chromosome that encodes a specific
protein. Protein Structural and Active
component of living organisms. cDNA
Complementary DNA, refers to the other strand
in a double helix, and often describes the
complementary DNA strand to mRNA. Many mRNAs are
published as cDNA. Genomics the study of
genetic content in a high-throughput or high
content manner. Proteomics the study of
proteins in a high-throughput or high content
manner. SNP Single Nucleotide Polymorphism, a
single base pair that differs between 2 genes or
2 or more people, but does not necessarily lead
to a disease or disorder (unlike a
mutation). Oligonucleotide short DNA strand,
usually chemically synthesized and less than 100
nucleotides in length. Slang Oligo
12Cell Biology TERMS Nucleus Subcellular
organelle that harbors chromosomes (genes,
genetic content) and where RNA (hnRNA) splicing
occurs. Ribosome Subcellular organelle where
mRNA is TRANSLATED into proteins. Chromosome
The large genetic structures within the nucleus
made up of packaged genes. Plasma Membrane or
Cell Membrane lipid bilayer that represents the
outside wall of the cell, some subcellular
organelles also have a lipid bilayer wall (e.g.
nucleus, mitochondria). Cytosol or Cytoplasm
intracellular contents of the cell, within the
plasma membrane.
13- Concepts
- Survey the NCBI web site, resources,
capabilities. - Describe key components of the FASTA file
structure. - Annotation of double-stranded DNA (5 to 3 5
prime to 3 prime) - Sequencing Technology
14WWW.NCBI.NLM.NIH.GOV PubMed Scientific
Journals Entrez Keyword Search of
Database BLAST Sequence Queries OMIM Online
Mendelian Inheritance in Man Books TaxBrowser S
tructure 3D Molecular Structures
15Sequence Files Since the information relevant to
biological processes is contained in the gene or
protein sequence, all genetic and protein data
are contained in sequence files. Importantly,
there is a directionality that exists in nature
that is conserved in the sequence file Nucleic
Acids are always written 5 to 3 (describing the
5 or 3 free hydroxyl group used in the
phosphodiesterase bond). nucleic acids (genes)
5-AGCTCGTGTAGACCATTC-3 Amino Acids are always
written with the free amino (N-terminus) first
and the carboxylic acid (C-terminus)
last. amino acids (proteins)
amino-IPKERYRGQIESIWA-carboxy
16DNA is Double Stranded Anti-parallel
Configuration Top strand is ALWAYS written 5 to
3 When DNA is written in file, top strand is
represented and bottom strand is assumed.
3 5
5 3
5 3
3 5
AGTCGTGATCTGCTAAATGTCTCGAAGTTCGATGCTAG
TCAGCACTAGACGATTTACAGA
GCTTCAAGATACGATC
Courier font is preferred for writing sequence
data since letter spacing is independent of
character content.
17gtgi1924939embX98411.1HSMYOSIE Homo sapiens
partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAA
GATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC
TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTC
TATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGC
AGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT
TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAA
CTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAG
GCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC
TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACA
AGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCAC
CATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC
AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGC
GCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCA
GCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC
CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTT
CCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGC
TCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT
GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCA
AGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATA
CCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT
TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTG
ACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCC
AGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC
CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCT
CCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATC
CAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA
GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGA
GGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGA
GGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC
GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCC
ATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGC
GAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT
GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCT
CAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGAC
AGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT
TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCT
TCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGG
CGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT
TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGAT
GGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTA
AACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT
GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGG
GGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCA
CAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG
CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCA
ACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGG
GCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC
GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGAT
GTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGG
AAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG
GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTG
GGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGG
GAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG
CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCT
GGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCC
TCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT
GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAA
GAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTG
GGGAGGGGGGGCCGGAATCCGC
FASTA File Format
18FASTA File Format
- A sequence in FASTA format begins with a
single-line description, followed by lines of
sequence data. - 1) The description line starts with a greater
than symbol ("gt"). - 2) The word following the greater than symbol
("gt") immediately is the "ID" (name) of the
sequence, the rest of the line is the
description. The "ID" and the description are
optional. - 3) All lines of text should be shorter than 80
characters. - 4) The sequence ends if there is another greater
than symbol ("gt") symbol at the beginning of a
line and another sequence begins.
19FASTA File Format
The following example contains two protein
sequences (Example1, Example2) gtExample1
envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVS
VVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYN
LTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWF
NCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQ
RTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPK
NRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLL
AGILQQQKNL LAAVEAQQQMLKLTIWGVK gtExample2
synthetic peptide HITREPLKHIPKERYRGTNDTLSPQIESIWAA
ELDRYKLVKTNCSNVS
20FASTA File Format
- Sequences are expected to be represented in the
standard IUB/IUPAC amino acid and nucleic acid
codes, with these exceptions - Lower-case letters are accepted and are mapped
into upper-case - A single hyphen or dash can be used to represent
a gap of indeterminate length - In amino acid (protein) sequences, U and are
acceptable letters. - N for unknown nucleic acid residue or X for
unknown amino acid residue. - mRNA is often listed as cDNA, and the U is
replaced with T
The nucleic acid codes supported are A ?
adenosine M ? A C (amino) C ? cytidine S ?
G C (strong) G ? guanine W ? A T (weak) T ?
thymidine B ? G T C U ? uridine D ? G A
T R ? G A (purine) H ? A C T Y ? T C
(pyrimidine) V ? G C A K ? G T (keto) N ? A
G C T (any) - ? gap of indeterminate length
21FASTA File Format
For those programs that use amino acid (protein)
query sequences (e.g. BLASTP and TBLASTN), the
accepted amino acid codes are
A ? alanine P ? proline B ? aspartate Q ?
glutamine C ? cystine R ? arginine D ?
aspartate S ? serine E ? glutamate T ?
threonine F ? phenylalanine U ?
selenocysteine G ? glycine V ? valine H ?
histidine W ? tryptophan I ? isoleucine Y ?
tyrosine K ? lysine Z ? glutamine L ?
leucine X ? any M ? methionine ?
translation stop N ? asparagine - ? gap of
indeterminate length
22gtgi1924939embX98411.1HSMYOSIE Homo sapiens
partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAA
GATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC
TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTC
TATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGC
AGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT
TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAA
CTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAG
GCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC
TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACA
AGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCAC
CATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC
AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGC
GCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCA
GCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC
CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTT
CCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGC
TCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT
GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCA
AGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATA
CCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT
TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTG
ACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCC
AGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC
CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCT
CCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATC
CAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA
GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGA
GGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGA
GGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC
GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCC
ATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGC
GAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT
GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCT
CAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGAC
AGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT
TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCT
TCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGG
CGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT
TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGAT
GGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTA
AACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT
GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGG
GGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCA
CAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG
CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCA
ACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGG
GCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC
GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGAT
GTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGG
AAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG
GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTG
GGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGG
GAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG
CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCT
GGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCC
TCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT
GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAA
GAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTG
GGGAGGGGGGGCCGGAATCCGC
FASTA File Format
23gtgi1924940embCAA67058.1 myosin-IF Homo
sapiens QEKLTSRKMDSRWGGRSESINVTLNVEQAAYTRDALAKGLY
ARLFDFLVEAINRAMQKPQEEYSIGVLDI YGFEIFQKNGFEQFCINFVN
EKLQQIFIELTLKAEQEEYVQEGIRWTPIQYFNNKVVCDLIENKLSPPGI
MSVLDDVCATMHATGGGADQTLLQKLQAAVGTHEHFNSWSAGFVIHHYA
GKVSYDVSGFCERNRDVLFSD LIELMQSSDQAFLRMLFPEKLDGDKKGR
PSTAGSKIKKQANDLVATLMRCTPHYIRCIKPNETKHARDWE
ENRVQHQVEYLGLKENIRVRRAGFAYRRQFAKFLQRYAILTPETWPRWRG
DERQGVQHLLRAVNMEPDQY QMGSTKVFVKNPESLFLLEEVRERKFDGF
ARTIQKAWRRHVAVRKYEEMREEASNILLNKKERRRNSINR
NFVGDYLGLEERPELRQFLGKKERVDFADSVTKYDRRFKPIKRDLILTPK
CVYVIGREKMKKGPEKGPVC EILKKKLDIQALRGVSLSTRQDDFFILQE
DAADSFLESVFKTEFVSLLCKRFEEATRRPLPLTFSDTLQF
RVKKEGWGGGGTRSVTFSRGFGDLAVLKVGGRTLTVSVGDGLPKNSKPTG
KGLAKGKPRRSSQAPTRAAP GAPQGMDRNGAPLCPQGGAPCPLEKFIWP
RGHPQASPALRPHPWDASRRPRARPPSEHNTEFLNVPDQGM
AGMQRKRSVGQRPVPVGRPKPQPRTHGPRCRALYQYVGQDVDELSFNVNE
VIEILMEDPSGWWKGRLHGQ EGLFPGNYVEKI
FASTA File Format
lt?xml version"1.0"?gt lt!DOCTYPE TSeq PUBLIC
"-//NCBI//NCBI TSeq/EN" "http//www.ncbi.nlm.nih.g
ov/dtd/NCBI_TSeq.dtd"gt ltTSeqgt ltTSeq_seqtype
value"nucleotide"/gt ltTSeq_gigt1924939lt/TSeq_gi
gt ltTSeq_accvergtX98411.1lt/TSeq_accvergt
ltTSeq_taxidgt9606lt/TSeq_taxidgt
ltTSeq_orgnamegtHomo sapienslt/TSeq_orgnamegt
ltTSeq_deflinegtHomo sapiens partial mRNA for
myosin-IFlt/TSeq_deflinegt ltTSeq_lengthgt2711lt/TS
eq_lengthgt ltTSeq_sequencegtCAGGAGAAGCTGACCAGCCG
CAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGT lt/
TSeqgt
TinySeq XML
24FASTA File Format(note U T) gtgi1234my name
from genetic code in DNA ATGATTTGTCACGCTGAGCTC-AAA
GCT AACGAGTAA gtgi1234my name translated into
protein MICHAEL-KANE
A ? alanine P ? proline B ? aspartate Q ?
glutamine C ? cystine R ? arginine D ?
aspartate S ? serine E ? glutamate T ?
threonine F ? phenylalanine U ?
selenocysteine G ? glycine V ? valine H ?
histidine W ? tryptophan I ? isoleucine Y ?
tyrosine K ? lysine Z ? glutamine L ? leucine
X ? any M ? methionine ? translation
stop N ? asparagine - ? gap of
indeterminate length
25Where do we get DNA sequence information? DNA
Sequencing Methods -conversion of
biological/bioanalytical data into sequence
information There are automated, high-throughput
sequencing centers that COMPLETELY automate
(robotics and information systems) DNA
sequencing, preliminary identification and
publishing.
26DNA Sequencing (old method)
5-AAACCAGGCCGATAAGGTACTACACGAAAAAAA-3
TTTTTTT
AAACCAGGCCGATAAGGTACTACACGAAAAA
Step 1. Extend complementary sequence using
free nucleotides with limiting amounts of
radioactive terminating nucleotides. Step 2.
Run product out on a electrophoresis gel. Step
3. Place gel against radiographic film, develop.
27DNA Sequencing new method)
http//users.rcn.com/jkimball.ma.ultranet/BiologyP
ages/D/DNAsequencing.html