C. elegans Bioinformatics - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

C. elegans Bioinformatics

Description:

Assembly of genome from short sequences (in genome projects) ... that leads to a stop codon. in the middle of protein. Link to C. elegans Genetics Center ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 44
Provided by: vaar
Category:

less

Transcript and Presenter's Notes

Title: C. elegans Bioinformatics


1
C. elegans Bioinformatics
  • Vuokko Aarnio
  • 27.8.2008

2
Bioinformatics
  • Applies math, statistics and computer science to
    understand biological processes, usually on a
    molecular level
  • Data often from high-throughput techniques
    (sequencing, microarrays etc.)
  • Many research areas
  • Sequence alignment
  • Sequence pattern finding
  • Gene expression - microarray bioinformatics
  • Assembly of genome from short sequences (in
    genome projects)
  • Protein structure prediction from sequence
  • Visualization of protein-protein interaction
    networks
  • Modeling of evolution (genetic algorithms)
  • C. elegans genome quite well annoted and
    accessible with many bioinformatics tools

3
Overview of lecture
  • Sequence analyses
  • Sequence alignments
  • PCR primer design
  • Prediction of transcription factor binding sites
  • Wormbase
  • Microarray bioinformatics
  • Data analysis overview
  • Functional annotation
  • Data repository GEO

4
Sequence analysis - Identifier
  • Name of the sequence
  • Different bioinformatics databases and tools
    operate with different identifiers
  • Example different identifiers for a gene
  • Wormbase locus ID ahr-1
  • Ensembl Gene ID C41G7.5
  • Entrez gene ID 172788
  • EMBL (Genbank) ID Z81048
  • RefSeq DNA ID NM_001025865
  • WB Gene ID WB00000096

5
Sequence analysis - Format
  • How the sequence is written
  • Tools require the sequence in a correct format
  • Most common format FASTA

gtY48G10A.1
ATTAATCTTTAGACATCAATAATACTTGCCTCTAAAAAAGCGTTTGGCTCCGCTTGGATCACTCTCAGCATCAAGTTCTTCTTTTTTCCGGGAAGGAAACGCTCATTTTCGATTAATATTAATTTTGATCGATTCGAAATCGATTGAAACCCGTTTTTGTCATCGAAAATTCTGAAAATATCCGTATTTAGCTCGAATAAAACTATTTAATTTCCATTAAAAATCCGTTTTTAATTGAATTCCGTTCGAAATTTCCTGTTGGAAAAATAAATAAATAAATACGAAGAAGCGTGCGGCGCATTGCAAAAAGCCGTGCGGCGCATTGCGAAGGACTGTGCGGCGGGCTTGCGAAAAGGCGCGCCGCACATTGCCCTACTGAAAGCGTTCCTTACGAAAAAATCCCCTTACGAAAACGTACCCCCTCTTTAATTTCGCGAAAAATAGTTTTTTGCCGAAAATAGAGTTAATTAGGCTAAAAAGCTGTTTTACAATCAATTTTGTTAAAGAAAAACCGCAAAAAACCTGAAAATTGACGAAAAAAAGCCAAAAAAAAAAAAAATTTTGCTTTTTAGTTCTACGCAGGAAAAGTGCGGCGATGGGTTTAAAACTAGAGAAATATTAGAAATCGTTAAATTTAATAGTAGAAAATTACAAAAAACCTAGATTTTCTGTGGAAAATACACGAAAAACAACGAAAAACTTTGGGAATTAAATTAAAAATTCGAAATCTAGCAAATCGTTCTCTACGTCTCCACTCTCTACCGCGTGGCGATGGAGCGCGTTTGCTTTTTACTGATTTTAATTAATTTTAATTAATTGAATTTCAGCGTATTTTCGCTGAATTTCTAGTGTTTTCCCGATAAAAACAAATCGAAATTCAACGTTTCCACTAATTTCAAGCTTTTTTCCTCTATTTTCAGGAAGTATACGCAAAAATCTGTATTTTTCTCTGACGCCTCACTCGGCAATTTTCCACAATTTCTTATCAATTTTGTCTCTT
6
Annotation
  • Any information on a sequence
  • Can be identifier, description,...

7
Gene Ontology
  • Project that provides a controlled vocabulary to
    describe gene product attributes in any organism
  • Ontologies Biological process, molecular
    function, cellular compartment
  • Ontology term code and a common name
  • e.g. GO0007186 - GPCR protein signaling pathway
  • Gene Ontology annotation characterization of
    gene products using the ontology terms
  • Based on wet lab experiments or sequence
    similarity with other characterized genes

8
BioMart tool
  • By EBI (European Bioinformatics Institute)
  • Finds annotations from databases e.g. Ensembl
    genome database
  • Good tool to e.g. convert gene identifiers and
    download sequences
  • Also finds chromosomal locations and Gene
    Ontology terms
  • Free web interface at http//www.biomart.org

9
Using the BioMart tool
  • 1. Choose database and organism
  • 2. Define what your input is (e.g. list of
    Ensembl gene IDs)
  • 3. Specify what you want as output (e.g. gene
    sequences with Entrez gene IDs)
  • 4. Run search and export your results

10
Sequence alignments - BLAST
  • Basic Local Alignment Tool
  • Sequence similarity search program
  • Finds matching sequences in NCBI database
  • The sequence can be nucleotide, protein,
    translated, genome,...
  • Free web interface at http//www.ncbi.nlm.nih.gov/
    blast

11
Two types of Sequence alignments in BLAST
  • 1. Compare two given sequences, e.g.
  • Does your PCR product have the right sequence?
  • How closely related your protein of interest is
    to its homolog in another species?
  • 2. Compare one sequence against genome,
    transcriptome, proteome etc.
  • Does a sequence correspond to any known gene or
    regulatory area?
  • Will a PCR primer bind to one or many sites in
    the genome?

12
Example Alignment of two sequences with BLAST
button that starts alignment
sequences to be compared
13
Example The two sequences were almost the same
87 / 96 right nucleotides
3 missing nucleotides
Both sequences were in 5'-3' orientation
6 wrong nucleotides
14
Using Nucleotide BLAST
The sequence in FASTA format
What type of nucleotides (RNA, genomic DNA,
expressed sequence tags etc.)
Organism
Run search
15
Example gene cloned and sequenced - compared to
genome
Corresponds to a known gene (cyp-42A1)
The sequence was backwards
Matches closely but not perfectly
16
Multiple sequence alignment
  • Compares several given sequences
  • Builds a hierachical tree that shows how closely
    each sequence is related to the others
  • Several tools, e.g. ClustalW
  • Several tools to visualize the tree e.g.
    HyperTree, JalView,...
  • e.g. family of proteins in different species -
    which ones most closely related

17
ClustalW multiple sequence alignment tool
  • Compares each given sequence to each other given
    sequence
  • Free web interface at http//ebi.ac.uk/Tools/clust
    alW/index.html

18
Example Hierarchical tree of C. elegans CYP
proteins (HyperTree)
19
PCR primer design with Primer3
  • Finds optimized primers from sequence
  • Takes into account the desired melting
    temperature, GC content and primer length
  • Improvements made in Tartu University

20
In silico PCR predicts what will be amplified
with given primers
21
Transcription factors
translation
22
Prediction of transcription factor binding sites
  • Transcription factors bind to specific short DNA
    sequences to induce or repress transcription
  • Sometimes binding sites of a transcription factor
    can vary in terms of one or few nucleotides
  • There are several tools to predict transcription
    factor binding sites, e.g. POXO

23
Different tools in POXO
Kankainen, M. et al. Nucl. Acids Res. 2006
34W534-W540 doi10.1093/nar/gkl296
24
Finding of enriched patterns
  • Put in sequences upstream of your genes of
    interest (can be obtained from BioMart or POXO
    sequence retrieval)
  • POCO finds which patterns occur a lot
  • Compares to likelihood of finding that sequence,
    for example certain nucleotides are generally
    more common in the genome than others

25
Clustering of found patterns
  • Puts together patterns that have something in
    common
  • Forms longer patterns
  • Allows also some differences

26
Checking if a given pattern is enriched in the
sequences
  • POBO counts how many times the given pattern is
    present in the sequences
  • Compares to how many times the sequence is
    present in the "background"
  • Background different in different tools
  • POXO creates several lists of random sequences
    (same number and length as in the given sequence
    list)

27
Wormbase
  • Major publicly available database of information
    about C. elegans
  • Essential for worm researcher
  • Search e.g. info on a gene

28
Names
29
Sequence exons, introns
Sequence exons colored, introns white
30
Chromosomal location, anatomic expression pattern
31
Information of gene product function collected
from publications (RNAi, microarray), links to
the publications
32
(No Transcript)
33
Information on how a mutant allele is different
from the wild-type gene
In this example a point mutation...
...that leads to a stop codon in the middle of
protein
34
Link to C. elegans Genetics Center from where
you can request worm strains
35
Microarrays
36
Microarray bioinformatics
  • Expression levels of tens of thousands of genes
    in one experiment
  • Quantification of intensities from an image
  • Data analysis
  • Finding annotations to genes - DAVID tool
  • Using existing microarray data - GEO

37
Image quantification
  • TIGR Spotfinder
  • - freely downloadable software
  • Input image files
  • Compose a grid
  • - each spot to its own square
  • Segmentation method decides which
  • part is spot and which is background
  • Intensity value for each spot represents amount
    of RNA in the original sample

38
Data analysis
  • Several commercial programs e.g. GeneSpring
  • Normalization
  • Sometimes one label (of 2-color experiment) is
    stronger than the other or some chips or chip
    areas have been hybridized more efficiently
  • Normalization makes these different labels, chips
    or chip areas comparable
  • Statistics
  • Which genes are significantly under- or
    over-expressed

39
Finding generally overrepresented functions in
your gene list
  • DAVID annotation tool
  • Compares which annotations are over-represented
    in your gene list
  • Good for showing general trends in a large gene
    list

40
DAVID Functional Annotation Tool
  • By NIAID, National Institutes of Health
  • Finds significantly enriched annotations to gene
    products in your list
  • Gene Ontology terms
  • Identifiers
  • Protein domains
  • Pathways
  • etc.

41
Example Functional annotation chart
42
Microarray data repository - GEO
  • Gene Expression Omnibus, NCBI
  • MIAME the Minimum Information About a Microarray
    Experiment that should be provided
  • Submit your own miroarray data to GEO
  • Browse, search and retrieve microarray data
  • http//www.ncbi.nlm.nih.gov/geo/

43
Summary Bioinformatics
  • More and larger data sets available
  • -omics level approach
  • Performs functions that would be extremely
    tideous to do manually
  • Tools are easy to use
  • - information sometimes predictions based on
    similarity with other gene products -gt wet lab
    experiments still needed
Write a Comment
User Comments (0)
About PowerShow.com