Biomedical Informatics - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Biomedical Informatics

Description:

The Cell is a Living Machine. DNA is Information Storage 'Zipped Files' Decompression ' ... DNA is Double Stranded One strand is the 'coding strand' and the ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 28
Provided by: TECHN212
Category:

less

Transcript and Presenter's Notes

Title: Biomedical Informatics


1
Biomedical Informatics CIT 499B
Michael D. Kane, Ph.D. FALL 2009
2
The Cell is a Living Machine
3
DNA is Information Storage
4
Zipped Files
Decompression
Executable Files
5
DNA is Double Stranded One strand is the
coding strand and the other strand is there to
stabilize the DNA sequence when not in use.
Double-stranded DNA is very durable in our
environment.
6
CAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGAC
TCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCA
CCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAG
AAGAGGCCTACTCAAATCCTTTCTGAGGTTCCGAGAGAAATATGGGGACG
TCTTCACGGTACACCTGGGACCGAGGCCCGTGGTCATGCTGTGTGGAGTA
GAGGCCATACGGGAGGCCCTTGTGGACAAGGCTGAGGCCTTCTCTGGCCG
GGGAAAAATCGCCATGGTCGACCCATTCTTCCGGGGATATGGTGTGATCT
TTGCCAATGGAAACCGCTGGAAGGTGCTTCGGCGATTCTCTGTGACCACT
ATGAGGGACTTCGGGATGGGAAAGCGGAGTGTGGAGGAGCGGATTCAGGA
GGAGGCTCAGTGTCTGATAGAGGAGCTTCGGAAATCCAAGGGGGCCCTCA
TGGACCCCACCTTCCTCTTCCAGTCCATTACCGCCAACATCATCTGCTCC
ATCGTCTTTGGAAAACGATTCCACTACCAAGATCAAGAGTTCCTGAAGAT
GCTGAACTTGTTCTACCAGACTTTTTCACTCATCAGCTCTGTATTCGGCC
AGCTGTTTGAGCTCTTCTCTGGCTTCTTGAAATACTTTCCTGGGGCACAC
AGGCAAGTTTACAAAAACCTGCAGGAAATCAATGCTTACATTGGCCACAG
TGTGGAGAAGCACCGTGAAACCCTGGACCCCAGCGCCCCCAAGGACCTCA
TCGACACCTACCTGCTCCACATGGAAAAAGAGAAATCCAACGCACACAGT
GAATTCAGCCACCAGAACCTCAACCTCAACACGCTCTCGCTCTTCTTTGC
TGGCACTGAGACCACCAGCACCACTCTCCGCTACGGCTTCCTGCTCATGC
TCAAATACCCTCATGTTGCAGAGAGAGTCTACAGGGAGATTGAACAGGTG
ATTGGCCCACATCGCCCTCCAGAGCTTCATGACCGAGCCAAAATGCCATA
CACAGAGGCAGTCATCTATGAGATTCAGAGATTTTCCGACCTTCTCCCCA
TGGGTGTGCCCCACATTGTCACCCAACACACCAGCTTCCGAGGGTACATC
ATCCCCAAGGACACAGAAGTATTTCTCATCCTGAGCACTGCTCTCCATGA
CCCACACTA
7
THEREDCAT_HSDKLSD_WASNOTHOTBUT_WKKNASDNKSAOJ.ASDNA
LKS_WASWET_ASDFLKSDOFIJEIJKNAWDFN_ANDMAD_WERN.JSND
FJN_YETSAD_MNSFDGPOIJD_BUTTHEFOX_SDKMFIDSJIR.JER_G
OTWET_JSN.DFOIAMNJNER_ANDATEHIM.
8
Start with a thin 2 x 4 lego block
9
(No Transcript)
10
What are the comparative genome sizes of humans
and other organisms being studied?
Genome size does not correlate with evolutionary
status, nor is the number of genes proportionate
with genome size.
11
Molecular Biology TERMS DNA deoxyribose
nucleic acid, a polymer 4 nucleotides or bases
arranged in an anti-parallel, double helix
structure. Permanent genetic information. Adeni
ne (A) Guanine (G) Thymine (T) Cytosine
(C) RNA Ribose nucleic acid, represented by
4 nucleotides or bases. Transient genetic
information. Same as DNA, except Thymine (T)
is replaced by Uracil (U) mRNA messenger
RNA hnRNA heteronuclear RNA rRNA ribosomal
RNA tRNA transfer RNA Gene subsection of
chromosome that encodes a specific
protein. Protein Structural and Active
component of living organisms. cDNA
Complementary DNA, refers to the other strand
in a double helix, and often describes the
complementary DNA strand to mRNA. Many mRNAs are
published as cDNA. Genomics the study of
genetic content in a high-throughput or high
content manner. Proteomics the study of
proteins in a high-throughput or high content
manner. SNP Single Nucleotide Polymorphism, a
single base pair that differs between 2 genes or
2 or more people, but does not necessarily lead
to a disease or disorder (unlike a
mutation). Oligonucleotide short DNA strand,
usually chemically synthesized and less than 100
nucleotides in length. Slang Oligo
12
Cell Biology TERMS Nucleus Subcellular
organelle that harbors chromosomes (genes,
genetic content) and where RNA (hnRNA) splicing
occurs. Ribosome Subcellular organelle where
mRNA is TRANSLATED into proteins. Chromosome
The large genetic structures within the nucleus
made up of packaged genes. Plasma Membrane or
Cell Membrane lipid bilayer that represents the
outside wall of the cell, some subcellular
organelles also have a lipid bilayer wall (e.g.
nucleus, mitochondria). Cytosol or Cytoplasm
intracellular contents of the cell, within the
plasma membrane.
13
  • Concepts
  • Survey the NCBI web site, resources,
    capabilities.
  • Describe key components of the FASTA file
    structure.
  • Annotation of double-stranded DNA (5 to 3 5
    prime to 3 prime)
  • Sequencing Technology

14
WWW.NCBI.NLM.NIH.GOV PubMed Scientific
Journals Entrez Keyword Search of
Database BLAST Sequence Queries OMIM Online
Mendelian Inheritance in Man Books TaxBrowser S
tructure 3D Molecular Structures
15
Sequence Files Since the information relevant to
biological processes is contained in the gene or
protein sequence, all genetic and protein data
are contained in sequence files. Importantly,
there is a directionality that exists in nature
that is conserved in the sequence file Nucleic
Acids are always written 5 to 3 (describing the
5 or 3 free hydroxyl group used in the
phosphodiesterase bond). nucleic acids (genes)
5-AGCTCGTGTAGACCATTC-3 Amino Acids are always
written with the free amino (N-terminus) first
and the carboxylic acid (C-terminus)
last. amino acids (proteins)
amino-IPKERYRGQIESIWA-carboxy
16
DNA is Double Stranded Anti-parallel
Configuration Top strand is ALWAYS written 5 to
3 When DNA is written in file, top strand is
represented and bottom strand is assumed.
3 5
5 3
5 3
3 5
AGTCGTGATCTGCTAAATGTCTCGAAGTTCGATGCTAG
TCAGCACTAGACGATTTACAGA
GCTTCAAGATACGATC
Courier font is preferred for writing sequence
data since letter spacing is independent of
character content.
17
gtgi1924939embX98411.1HSMYOSIE Homo sapiens
partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAA
GATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC
TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTC
TATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGC
AGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT
TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAA
CTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAG
GCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC
TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACA
AGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCAC
CATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC
AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGC
GCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCA
GCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC
CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTT
CCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGC
TCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT
GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCA
AGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATA
CCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT
TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTG
ACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCC
AGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC
CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCT
CCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATC
CAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA
GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGA
GGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGA
GGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC
GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCC
ATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGC
GAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT
GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCT
CAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGAC
AGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT
TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCT
TCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGG
CGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT
TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGAT
GGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTA
AACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT
GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGG
GGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCA
CAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG
CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCA
ACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGG
GCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC
GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGAT
GTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGG
AAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG
GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTG
GGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGG
GAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG
CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCT
GGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCC
TCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT
GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAA
GAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTG
GGGAGGGGGGGCCGGAATCCGC
FASTA File Format
18
FASTA File Format
  • A sequence in FASTA format begins with a
    single-line description, followed by lines of
    sequence data.
  • 1) The description line starts with a greater
    than symbol ("gt").
  • 2) The word following the greater than symbol
    ("gt") immediately is the "ID" (name) of the
    sequence, the rest of the line is the
    description. The "ID" and the description are
    optional.
  • 3) All lines of text should be shorter than 80
    characters.
  • 4) The sequence ends if there is another greater
    than symbol ("gt") symbol at the beginning of a
    line and another sequence begins.

19
FASTA File Format
The following example contains two protein
sequences (Example1, Example2) gtExample1
envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVS
VVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYN
LTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWF
NCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQ
RTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPK
NRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLL
AGILQQQKNL LAAVEAQQQMLKLTIWGVK gtExample2
synthetic peptide HITREPLKHIPKERYRGTNDTLSPQIESIWAA
ELDRYKLVKTNCSNVS
20
FASTA File Format
  • Sequences are expected to be represented in the
    standard IUB/IUPAC amino acid and nucleic acid
    codes, with these exceptions
  • Lower-case letters are accepted and are mapped
    into upper-case
  • A single hyphen or dash can be used to represent
    a gap of indeterminate length
  • In amino acid (protein) sequences, U and are
    acceptable letters.
  • N for unknown nucleic acid residue or X for
    unknown amino acid residue.
  • mRNA is often listed as cDNA, and the U is
    replaced with T

The nucleic acid codes supported are A ?
adenosine M ? A C (amino) C ? cytidine S ?
G C (strong) G ? guanine W ? A T (weak) T ?
thymidine B ? G T C U ? uridine D ? G A
T R ? G A (purine) H ? A C T Y ? T C
(pyrimidine) V ? G C A K ? G T (keto) N ? A
G C T (any) - ? gap of indeterminate length
21
FASTA File Format
For those programs that use amino acid (protein)
query sequences (e.g. BLASTP and TBLASTN), the
accepted amino acid codes are
A ? alanine P ? proline B ? aspartate Q ?
glutamine C ? cystine R ? arginine D ?
aspartate S ? serine E ? glutamate T ?
threonine F ? phenylalanine U ?
selenocysteine G ? glycine V ? valine H ?
histidine W ? tryptophan I ? isoleucine Y ?
tyrosine K ? lysine Z ? glutamine L ?
leucine X ? any M ? methionine ?
translation stop N ? asparagine - ? gap of
indeterminate length
22
gtgi1924939embX98411.1HSMYOSIE Homo sapiens
partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAA
GATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC
TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTC
TATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGC
AGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT
TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAA
CTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAG
GCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC
TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACA
AGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCAC
CATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC
AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGC
GCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCA
GCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC
CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTT
CCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGC
TCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT
GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCA
AGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATA
CCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT
TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTG
ACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCC
AGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC
CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCT
CCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATC
CAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA
GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGA
GGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGA
GGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC
GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCC
ATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGC
GAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT
GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCT
CAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGAC
AGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT
TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCT
TCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGG
CGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT
TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGAT
GGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTA
AACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT
GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGG
GGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCA
CAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG
CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCA
ACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGG
GCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC
GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGAT
GTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGG
AAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG
GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTG
GGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGG
GAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG
CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCT
GGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCC
TCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT
GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAA
GAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTG
GGGAGGGGGGGCCGGAATCCGC
FASTA File Format
23
gtgi1924940embCAA67058.1 myosin-IF Homo
sapiens QEKLTSRKMDSRWGGRSESINVTLNVEQAAYTRDALAKGLY
ARLFDFLVEAINRAMQKPQEEYSIGVLDI YGFEIFQKNGFEQFCINFVN
EKLQQIFIELTLKAEQEEYVQEGIRWTPIQYFNNKVVCDLIENKLSPPGI
MSVLDDVCATMHATGGGADQTLLQKLQAAVGTHEHFNSWSAGFVIHHYA
GKVSYDVSGFCERNRDVLFSD LIELMQSSDQAFLRMLFPEKLDGDKKGR
PSTAGSKIKKQANDLVATLMRCTPHYIRCIKPNETKHARDWE
ENRVQHQVEYLGLKENIRVRRAGFAYRRQFAKFLQRYAILTPETWPRWRG
DERQGVQHLLRAVNMEPDQY QMGSTKVFVKNPESLFLLEEVRERKFDGF
ARTIQKAWRRHVAVRKYEEMREEASNILLNKKERRRNSINR
NFVGDYLGLEERPELRQFLGKKERVDFADSVTKYDRRFKPIKRDLILTPK
CVYVIGREKMKKGPEKGPVC EILKKKLDIQALRGVSLSTRQDDFFILQE
DAADSFLESVFKTEFVSLLCKRFEEATRRPLPLTFSDTLQF
RVKKEGWGGGGTRSVTFSRGFGDLAVLKVGGRTLTVSVGDGLPKNSKPTG
KGLAKGKPRRSSQAPTRAAP GAPQGMDRNGAPLCPQGGAPCPLEKFIWP
RGHPQASPALRPHPWDASRRPRARPPSEHNTEFLNVPDQGM
AGMQRKRSVGQRPVPVGRPKPQPRTHGPRCRALYQYVGQDVDELSFNVNE
VIEILMEDPSGWWKGRLHGQ EGLFPGNYVEKI
FASTA File Format
lt?xml version"1.0"?gt lt!DOCTYPE TSeq PUBLIC
"-//NCBI//NCBI TSeq/EN" "http//www.ncbi.nlm.nih.g
ov/dtd/NCBI_TSeq.dtd"gt ltTSeqgt ltTSeq_seqtype
value"nucleotide"/gt ltTSeq_gigt1924939lt/TSeq_gi
gt ltTSeq_accvergtX98411.1lt/TSeq_accvergt
ltTSeq_taxidgt9606lt/TSeq_taxidgt
ltTSeq_orgnamegtHomo sapienslt/TSeq_orgnamegt
ltTSeq_deflinegtHomo sapiens partial mRNA for
myosin-IFlt/TSeq_deflinegt ltTSeq_lengthgt2711lt/TS
eq_lengthgt ltTSeq_sequencegtCAGGAGAAGCTGACCAGCCG
CAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGT lt/
TSeqgt
TinySeq XML
24
FASTA File Format(note U T) gtgi1234my name
from genetic code in DNA ATGATTTGTCACGCTGAGCTC-AAA
GCT AACGAGTAA gtgi1234my name translated into
protein MICHAEL-KANE
A ? alanine P ? proline B ? aspartate Q ?
glutamine C ? cystine R ? arginine D ?
aspartate S ? serine E ? glutamate T ?
threonine F ? phenylalanine U ?
selenocysteine G ? glycine V ? valine H ?
histidine W ? tryptophan I ? isoleucine Y ?
tyrosine K ? lysine Z ? glutamine L ? leucine
X ? any M ? methionine ? translation
stop N ? asparagine - ? gap of
indeterminate length
25
Where do we get DNA sequence information? DNA
Sequencing Methods -conversion of
biological/bioanalytical data into sequence
information There are automated, high-throughput
sequencing centers that COMPLETELY automate
(robotics and information systems) DNA
sequencing, preliminary identification and
publishing.
26
DNA Sequencing (old method)
5-AAACCAGGCCGATAAGGTACTACACGAAAAAAA-3
TTTTTTT
AAACCAGGCCGATAAGGTACTACACGAAAAA


Step 1. Extend complementary sequence using
free nucleotides with limiting amounts of
radioactive terminating nucleotides. Step 2.
Run product out on a electrophoresis gel. Step
3. Place gel against radiographic film, develop.
27
DNA Sequencing new method)
http//users.rcn.com/jkimball.ma.ultranet/BiologyP
ages/D/DNAsequencing.html
Write a Comment
User Comments (0)
About PowerShow.com