Data Mining in Ensembl with BioMart - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Data Mining in Ensembl with BioMart

Description:

Such as: human gene (IDs), chromosome and base pair position. No programming ... For all mouse genes on chromosome 10 that are protein coding, I would like to ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 40
Provided by: xos63
Category:

less

Transcript and Presenter's Notes

Title: Data Mining in Ensembl with BioMart


1
Data Mining in Ensembl with BioMart
2
Simple Text-based Search Engine
3
Mouse Gene Gives Us Results
4
A More Complex Query is Not as Useful
5
BioMart- Data mining
  • BioMart is a search engine that can find multiple
    terms and put them into a table format.
  • Such as human gene (IDs), chromosome and base
    pair position
  • No programming required!

6
General or Specific Data-Tables
  • All the genes for one species
  • Or only genes on one specific region of a
    chromosome
  • Or genes on one region of a chromosome
    associated with a disease

7
BioMart Data Sets
  • Ensembl genes
  • Vega genes
  • SNPs
  • Markers
  • Phenotypes
  • Gene expression information
  • Gene ontology
  • Homology predictions
  • Protein annotation

8
Web Interface
With BioMart, quickly extract gene-associated
information from the Ensembl databases.
9
Information Flow
  • Choose the species of interest (Dataset)
  • Decide what you would like to know about the
    genes (Attributes)
  • (sequences, IDs, description)
  • Decide on a smaller geneset using Filters.
  • (enter IDs, choose a region )

10
Web Interface
Choose the species of interest
Choose what information to view.
Choose the gene set using what we know.
Three main stages Dataset, Attributes and
Filters.
11
The First Step Choose the Dataset
Homo sapiens genes are the default.
12
The Second Step Attributes
Four output pages.
Attributes are what we want to know about the
genes.
13
The SNP Attribute Page
Output variation information such as SNP
reference ID and alleles.
14
Filters Allow Gene Selection
Choose the gene set by region, gene ID(s),
protein/domain type.
15
Export Sequence or Tables
Genes and attributes are exported as sequence
(Fasta format) or tables.
16
Query
  • For all mouse genes on chromosome 10 that are
    protein coding, I would like to know the IDs in
    both Ensembl and MGI.
  • In the query
  • Attributes what we want to know.
  • Filters what we know

17
Query
  • For all mouse genes on chromosome 10 that are
    protein coding, I would like to know the IDs in
    both Ensembl and MGI.
  • In the query
  • Attributes what we want to know.
  • Filters what we know

18
Query
  • For all mouse genes on chromosome 10 that are
    protein coding, I would like to know the IDs in
    both Ensembl and MGI.
  • In the query
  • Attributes what we want to know.
  • Filters what we know

19
A Brief Example
Change dataset to mouse Mus musculus
20
A Brief Example
Dataset has changed.
21
Attributes (Output Options)
Click Attributes.
Click on GENE.
Attributes allow us to choose what we wish to
know. IDs are found in the Features page.
22
Attributes (Output Options)
Ensembl Gene ID is selected
Default options selected Ensembl Gene ID and
Transcript ID
23
Attributes (Output Options)
Markersymbol ID will give us the MGI ID
Scroll down to select MGI symbol. Also select the
accession number.
24
The Results Table
Results give us Gene IDs for all mouse genes in
the Ensembl database.
25
Select a Smaller Gene Set
Expand the REGION panel
Select Filters
Instead of all mouse genes, select protein coding
genes on chromosome 10.
26
Select Genes on Chromosome 10
Select chromosome 10
Instead of all mouse genes, select protein coding
genes on chromosome 10.
27
Select Protein Coding Genes
Gene type protein coding
Filters are set to chromosome 10 and
protein-coding genes. Genes must meet BOTH
criteria to be in the result table.
28
Results (Preview)
For the full result table Go
This is a preview- if you are happy with the
table, click Go.
29
Full Result Table
Transcript ID
MGI symbol
MGI Accession Number
Ensembl Gene ID
30
Original Query
  • For all mouse genes on chromosome 10 that are
    protein coding, I would like to know the IDs in
    both Ensembl and MGI.
  • In the query
  • Attributes columns in the Result Table
  • Filters what we know

31
Other Export Options (Attributes)
  • Sequences UTRs, flanking sequences, cDNA and
    peptides, etc
  • Gene IDs from Ensembl and external sources (MGI,
    Entrez, etc.)
  • Microarray data
  • Protein Functions/descriptions (Interpro, GO)
  • Orthologous gene sets
  • SNP/ Variation Data

32
Central Server
www.biomart.org
33
WormBase
34
HapMap
35
(No Transcript)
36
Uniprot, MSD
37
GRAMENE
Rice, Maize, Arabidopsis genomes
38
How to Get There
  • Either www.biomart.org/biomart/martview
  • Or click on BioMart from Ensembl

39
Arek Kasprzyk Benoît Ballester Syed
Haider Richard Holland Damian Smedley
Thanks
Q

A
Write a Comment
User Comments (0)
About PowerShow.com