C. elegans Bioinformatics

About This Presentation

Title:

C. elegans Bioinformatics

Description:

Assembly of genome from short sequences (in genome projects) ... that leads to a stop codon. in the middle of protein. Link to C. elegans Genetics Center ... –

Number of Views:47

Avg rating:3.0/5.0

Slides: 44

Provided by: vaar

Category:

more less

Transcript and Presenter's Notes

Title: C. elegans Bioinformatics

1
C. elegans Bioinformatics

Vuokko Aarnio
27.8.2008

2
Bioinformatics

Applies math, statistics and computer science to
understand biological processes, usually on a
molecular level
Data often from high-throughput techniques
(sequencing, microarrays etc.)
Many research areas
Sequence alignment
Sequence pattern finding
Gene expression - microarray bioinformatics
Assembly of genome from short sequences (in
genome projects)
Protein structure prediction from sequence
Visualization of protein-protein interaction
networks
Modeling of evolution (genetic algorithms)
C. elegans genome quite well annoted and
accessible with many bioinformatics tools

3
Overview of lecture

Sequence analyses
Sequence alignments
PCR primer design
Prediction of transcription factor binding sites
Wormbase
Microarray bioinformatics
Data analysis overview
Functional annotation
Data repository GEO

4
Sequence analysis - Identifier

Name of the sequence
Different bioinformatics databases and tools
operate with different identifiers
Example different identifiers for a gene
Wormbase locus ID ahr-1
Ensembl Gene ID C41G7.5
Entrez gene ID 172788
EMBL (Genbank) ID Z81048
RefSeq DNA ID NM_001025865
WB Gene ID WB00000096

5
Sequence analysis - Format

How the sequence is written
Tools require the sequence in a correct format
Most common format FASTA

gtY48G10A.1
ATTAATCTTTAGACATCAATAATACTTGCCTCTAAAAAAGCGTTTGGCTCCGCTTGGATCACTCTCAGCATCAAGTTCTTCTTTTTTCCGGGAAGGAAACGCTCATTTTCGATTAATATTAATTTTGATCGATTCGAAATCGATTGAAACCCGTTTTTGTCATCGAAAATTCTGAAAATATCCGTATTTAGCTCGAATAAAACTATTTAATTTCCATTAAAAATCCGTTTTTAATTGAATTCCGTTCGAAATTTCCTGTTGGAAAAATAAATAAATAAATACGAAGAAGCGTGCGGCGCATTGCAAAAAGCCGTGCGGCGCATTGCGAAGGACTGTGCGGCGGGCTTGCGAAAAGGCGCGCCGCACATTGCCCTACTGAAAGCGTTCCTTACGAAAAAATCCCCTTACGAAAACGTACCCCCTCTTTAATTTCGCGAAAAATAGTTTTTTGCCGAAAATAGAGTTAATTAGGCTAAAAAGCTGTTTTACAATCAATTTTGTTAAAGAAAAACCGCAAAAAACCTGAAAATTGACGAAAAAAAGCCAAAAAAAAAAAAAATTTTGCTTTTTAGTTCTACGCAGGAAAAGTGCGGCGATGGGTTTAAAACTAGAGAAATATTAGAAATCGTTAAATTTAATAGTAGAAAATTACAAAAAACCTAGATTTTCTGTGGAAAATACACGAAAAACAACGAAAAACTTTGGGAATTAAATTAAAAATTCGAAATCTAGCAAATCGTTCTCTACGTCTCCACTCTCTACCGCGTGGCGATGGAGCGCGTTTGCTTTTTACTGATTTTAATTAATTTTAATTAATTGAATTTCAGCGTATTTTCGCTGAATTTCTAGTGTTTTCCCGATAAAAACAAATCGAAATTCAACGTTTCCACTAATTTCAAGCTTTTTTCCTCTATTTTCAGGAAGTATACGCAAAAATCTGTATTTTTCTCTGACGCCTCACTCGGCAATTTTCCACAATTTCTTATCAATTTTGTCTCTT
6
Annotation

Any information on a sequence
Can be identifier, description,...

7
Gene Ontology

Project that provides a controlled vocabulary to
describe gene product attributes in any organism
Ontologies Biological process, molecular
function, cellular compartment
Ontology term code and a common name
e.g. GO0007186 - GPCR protein signaling pathway
Gene Ontology annotation characterization of
gene products using the ontology terms
Based on wet lab experiments or sequence
similarity with other characterized genes

8
BioMart tool

By EBI (European Bioinformatics Institute)
Finds annotations from databases e.g. Ensembl
genome database
Good tool to e.g. convert gene identifiers and
download sequences
Also finds chromosomal locations and Gene
Ontology terms
Free web interface at http//www.biomart.org

9
Using the BioMart tool

1. Choose database and organism
2. Define what your input is (e.g. list of
Ensembl gene IDs)
3. Specify what you want as output (e.g. gene
sequences with Entrez gene IDs)
4. Run search and export your results

10
Sequence alignments - BLAST

Basic Local Alignment Tool
Sequence similarity search program
Finds matching sequences in NCBI database
The sequence can be nucleotide, protein,
translated, genome,...
Free web interface at http//www.ncbi.nlm.nih.gov/
blast

11
Two types of Sequence alignments in BLAST

1. Compare two given sequences, e.g.
Does your PCR product have the right sequence?
How closely related your protein of interest is
to its homolog in another species?
2. Compare one sequence against genome,
transcriptome, proteome etc.
Does a sequence correspond to any known gene or
regulatory area?
Will a PCR primer bind to one or many sites in
the genome?

12
Example Alignment of two sequences with BLAST
button that starts alignment
sequences to be compared
13
Example The two sequences were almost the same
87 / 96 right nucleotides
3 missing nucleotides
Both sequences were in 5'-3' orientation
6 wrong nucleotides
14
Using Nucleotide BLAST
The sequence in FASTA format
What type of nucleotides (RNA, genomic DNA,
expressed sequence tags etc.)
Organism
Run search
15
Example gene cloned and sequenced - compared to
genome
Corresponds to a known gene (cyp-42A1)
The sequence was backwards
Matches closely but not perfectly
16
Multiple sequence alignment

Compares several given sequences
Builds a hierachical tree that shows how closely
each sequence is related to the others
Several tools, e.g. ClustalW
Several tools to visualize the tree e.g.
HyperTree, JalView,...
e.g. family of proteins in different species -
which ones most closely related

17
ClustalW multiple sequence alignment tool

Compares each given sequence to each other given
sequence
Free web interface at http//ebi.ac.uk/Tools/clust
alW/index.html

18
Example Hierarchical tree of C. elegans CYP
proteins (HyperTree)
19
PCR primer design with Primer3

Finds optimized primers from sequence
Takes into account the desired melting
temperature, GC content and primer length
Improvements made in Tartu University

20
In silico PCR predicts what will be amplified
with given primers
21
Transcription factors
translation
22
Prediction of transcription factor binding sites

Transcription factors bind to specific short DNA
sequences to induce or repress transcription
Sometimes binding sites of a transcription factor
can vary in terms of one or few nucleotides
There are several tools to predict transcription
factor binding sites, e.g. POXO

23
Different tools in POXO
Kankainen, M. et al. Nucl. Acids Res. 2006
34W534-W540 doi10.1093/nar/gkl296
24
Finding of enriched patterns

Put in sequences upstream of your genes of
interest (can be obtained from BioMart or POXO
sequence retrieval)
POCO finds which patterns occur a lot
Compares to likelihood of finding that sequence,
for example certain nucleotides are generally
more common in the genome than others

25
Clustering of found patterns

Puts together patterns that have something in
common
Forms longer patterns
Allows also some differences

26
Checking if a given pattern is enriched in the
sequences

POBO counts how many times the given pattern is
present in the sequences
Compares to how many times the sequence is
present in the "background"
Background different in different tools
POXO creates several lists of random sequences
(same number and length as in the given sequence
list)

27
Wormbase

Major publicly available database of information
about C. elegans
Essential for worm researcher
Search e.g. info on a gene

28
Names
29
Sequence exons, introns
Sequence exons colored, introns white
30
Chromosomal location, anatomic expression pattern
31
Information of gene product function collected
from publications (RNAi, microarray), links to
the publications
32
(No Transcript)
33
Information on how a mutant allele is different
from the wild-type gene
In this example a point mutation...
...that leads to a stop codon in the middle of
protein
34
Link to C. elegans Genetics Center from where
you can request worm strains
35
Microarrays
36
Microarray bioinformatics

Expression levels of tens of thousands of genes
in one experiment
Quantification of intensities from an image
Data analysis
Finding annotations to genes - DAVID tool
Using existing microarray data - GEO

37
Image quantification

TIGR Spotfinder
- freely downloadable software
Input image files
Compose a grid
- each spot to its own square
Segmentation method decides which
part is spot and which is background
Intensity value for each spot represents amount
of RNA in the original sample

38
Data analysis

Several commercial programs e.g. GeneSpring
Normalization
Sometimes one label (of 2-color experiment) is
stronger than the other or some chips or chip
areas have been hybridized more efficiently
Normalization makes these different labels, chips
or chip areas comparable
Statistics
Which genes are significantly under- or
over-expressed

39
Finding generally overrepresented functions in
your gene list

DAVID annotation tool
Compares which annotations are over-represented
in your gene list
Good for showing general trends in a large gene
list

40
DAVID Functional Annotation Tool

By NIAID, National Institutes of Health
Finds significantly enriched annotations to gene
products in your list
Gene Ontology terms
Identifiers
Protein domains
Pathways
etc.

41
Example Functional annotation chart
42
Microarray data repository - GEO

Gene Expression Omnibus, NCBI
MIAME the Minimum Information About a Microarray
Experiment that should be provided
Submit your own miroarray data to GEO
Browse, search and retrieve microarray data
http//www.ncbi.nlm.nih.gov/geo/

43
Summary Bioinformatics

More and larger data sets available
-omics level approach
Performs functions that would be extremely
tideous to do manually
Tools are easy to use
- information sometimes predictions based on
similarity with other gene products -gt wet lab
experiments still needed

Write a Comment

User Comments (0)