Title: InterPro/prosite UCSC Genome Browser Exercise 3
1InterPro/prosite UCSC Genome
BrowserExercise 3
2Turning information into knowledge
- The outcome of a sequencing project is masses of
raw data - The challenge is to turn this raw data into
biological knowledge - A valuable tool for this challenge is an
automated diagnostic pipe through which newly
determined sequences can be streamlined
3From sequence to function
- Nature tends to innovate rather than invent
- Proteins are composed of functional elements
domains and motifs - Domains are structural units that carry out a
certain function - The same domains are
- shared between different
- proteins
- Motifs are shorter
- sequences with certain
- biological activity
4http//www.ebi.ac.uk/interpro/
5InterPro
- An integrated documentation resource for protein
families, domains and sites - Groups signatures describing the same protein
family or domain - Combines a number of databases that use different
methodologies to derive protein signature - UniProt UniProtKB Swiss-Prot, TrEMBL,
UniRef,UniParc - prosite documented DB on domains, families and
functional sites. - Pfam a DB of protein families represented by MSAs
6Member databases
- Sequence-motif methods
- Protein signature DBs with different focus
- Sequence-cluster methods
- Hierarchically clustered sequence/structure DBs
7InterPro search
8(No Transcript)
9(No Transcript)
10http//www.expasy.ch/prosite/
11prosite
- A method for determining the function of
uncharacterized translated protein sequences - Consists of a DB of annotated biologically
important sites/patterns/motifs/signature/fingerpr
ints
12prosite
- Entries are represented with patterns or profiles
profile
5 4 3 2 1
. 0 0 1 0.66 A
. 1 0 0 0 T
. 0 0.66 0 0.33 C
. 0 0.33 0 0 G
pattern
AC-A-GC-T-TC-GC
- Profiles are used in prosite when the motif is
relatively divergent, and it is difficult to
represent as a pattern
13Scanning prosite
Query sequence
Query pattern
Result all patterns found in sequence
Result all sequences which adhere to this pattern
14Patterns with a high probability of occurrence
- Entries describing commonly found
post-translational modifications or
compositionally biased regions. - Found in the majority of known protein sequences
- High probability of occurrence
15prosite sequence query
16(No Transcript)
17prosite pattern query
18(No Transcript)
19(No Transcript)
20UCSC Genome Browser
21UCSC Genome Browser - Gateway
22UCSC Genome Browser - Gateway
23UCSC Genome Browser - Gateway
24UCSC Genome Browserquery results
25UCSC Genome Browser Annotation tracks
Base position
UCSC Genes
UTR
RefSeq
mRNA (GenBank)
Intron
Exon
GeneDirection
SNPs
Repeats
26USCS Gene
27UCSC Genome Browser - movement
Zoom x3 Center
28UCSC Genome Browser Base view
29Annotation track options
dense
squish
pack
full
30Annotation track options
31BLAT
- BLAT Blast-Like Alignment Tool
- BLAT is designed to find similarity of gt95 on
DNA, gt80 for protein - Rapid search by indexing entire genome.
- Good for
- Finding genomic coordinates of cDNA
- Determining exons/introns
- Finding human (or chimp, dog, cow) homologs of
another vertebrate sequence
32BLAT on UCSC Genome Browser
33BLAT on UCSC Genome Browser
34BLAT Results
35BLAT Results
Match
Non-Match(mismatch/indel)
Indel boundaries
36BLAT Results
37BLAT Results on the browser
38Getting DNA sequence of region
39Getting DNA sequence of region