Title: BIOINFORMATIK I UEBUNG 2
1BIOINFORMATIK I UEBUNG 2
http//icbi.at/bioinf
2mRNA processing
3splicing
4Spliceosome assembly
GU
YAG
A
hnRNP
U1
GU
U4
U5
U2
SR proteins
YAG
A
kinases and phosphatases
U1
RNA helicases
Cyclophilins
200 non-snRNP proteins
GU
U6
U2
U5
YAG
A
5Different levels of regulation
6Regulation of transcription
7ChIP procedure
DNA
Farnham, Nature Rev Genetics, 2009
8microRNAs
http//www.mirbase.org/
9Ensembl BioMart
10UCSC Table Browser
11UCSC Table Browser
12Notepad and regular expressions
gt
any symbol
begin of line
0 or more times
gt . \r \n
carriage return (CR)
line feed (LF)
13Notepad and regular expressions
character meaning
\ escape used to make specials non-special
() group you can retrieve its contents e.g. with \1 for the first occurrence
any character inside is considered a match
. matches any character
match the previous character 0 or more times
match the previous character 1 or more times
n match the previous character n times
if the first character in the regex, means beginning of line inside means not
last character in the regex, means end of line
\s any space character (space, tab)
\t tab (--gt)
\r carriage return (CR)
\n line feed (LF)
14Notepad and regular expressions
gt.\r\n replace with
ACGT.\r\n replace with
(.20).\r\n replace with \1\r\n
15\r\n
replace with gt
replace with
\r\ngt repeatMaskingnone replace with
\r\n gt.\r\n
replace with .(.20)
replace with \1
16Sequence Logo
http//icbi.at/logo
17KEGG
18Protein domains
Uniprot, Prosite, Interpro, Pfam, CD, SMART
19Gene Ontology
The Gene Ontology project provides a controlled
vocabulary to describe gene and gene product
attributes in any organism.
3 organizing principles
- cellular component (e.g. mitochondrium)
- biological process (e.g. lipid metabolism)
- molecular function (e.g. hydrolase activity)
Each entry in GO has a unique numerical
identifier of the form GOnnnnnnn, and a GO term
ISS Inferred from Sequence Similarity IEP Inferred
from Expression Pattern IMP Inferred from Mutant
Phenotype IGI Inferred from Genetic
Interaction IPI Inferred from Physical
Interaction IDA Inferred from Direct
Assay RCA Inferred from Reviewed Computational
Analysis TAS Traceable Author Statement NAS Non-tr
aceable Author Statement IC Inferred by
Curator ND No biological Data available
Evidence code
Directed acyclic graph (DAG) with different
levels and 2 relations (part_of, is_a)
20Orthologs
Protein A
Homologs A B C Orthologs B1 C1
Paralogs C1 C2 C3 Inparalogs C2 C3
Outparalogs B2 C1 Xenologs A1 AB1
21Orthologous prediction
22Ortholog databases
- YOGY (eukarYotic OrtholoGY) is a web-based
resource and integrates 5 independent resources
(Sanger) - COG Cluster of ortholog groups of proteins and
KOG for 7 eukaryotic genomes (NCBI), - Inparanoid (Center Stockholm Bioinformatics)
- HomoloGene (NCBI)
- OrthoMCL use Markov Clustering algorithm
(University of Pennsylvania)
23Multiple sequence alignment (CLUSTALW)
Progressive tree alignment
Jalview
24Exercise 2-1 REGULATORY GENOMICS
Pyruvate Carboxylase as example Ensembl
Biomart 1.1 For the human transcript NM_000920
(pyruvate carboxylase) find official gene symbol,
number of exons, Ensembl transcript ID, Ensembl
gene ID, 3'UTR sequence as fasta file, length of
3'UTR microRNA target prediction 1.2 Is there a
complementary sequence within the 3'UTR of PC to
postion 2-8 in the sequence of microRNA
hsa-mir-182. UCSC genome browser 1.3 Position
of transcript start site and transcription end of
Pyruvate carboxylase (NM_000920) in hg19
assembly
25Exercise 2-1 REGULATORY GENOMICS
Find splicing signals 1.4 Get sequences
(10bp/-10bp) around intron-exon borders and
exon-intron borders from pyruvate carboxylase
using UCSC table browser and Notepad 1.5
Construct in both cases sequence logo and
frequency plot. Can you identify (regulatory)
sequence motifs? Regulatory motifs
(transcription factor binding sites) 1.6 We know
from Chromatin immunoprecipitation (ChIP-seq)
experiments in a mouse cell line that the
transcription factor Pparg is binding near the
pyruvate carboxylase gene and hence potentially
regulate its transcription (ppar.wig). Show
binding region as custom track in UCSC genome
browser and extract sequence.
26Exercise 2-2 PROTEIN FUNCTION
Identify function /processes/pathways for a
protein 2.1 What is the function of pyruvate
carboxylase and in which pathways and processes
this enzyme is involved? Show pathway maps and
find Enzyme ID (EC) using KEGG Identify
functional domains and Gene Ontology Annotation
of the protein sequence using Uniprot, Prosite,
Pfam Find orthologs and perform multiple
sequence alignment 2.2 Find ortholog protein
sequences in Mus musculus, Rattus norvegicus,
Saccharomyces cervisiae, perform multiple
sequence alignment using ClustalW, and visualize
with Jalview.