EMBOSS as a DAS Client - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

EMBOSS as a DAS Client

Description:

Title: EMBL-EBI Powerpoint Presentation Author: External Services Last modified by: Sanger Institute Created Date: 10/13/2005 9:35:17 AM Document presentation format – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 22
Provided by: External1
Learn more at: https://biotnet.org
Category:
Tags: das | emboss | catalogue | client

less

Transcript and Presenter's Notes

Title: EMBOSS as a DAS Client


1
EMBOSS as a DAS Client
  • Peter Rice pmr_at_ebi.ac.uk
  • Mahmut Uludag uludag_at_ebi.ac.uk
  • 3rd March 2011.

2
EMBOSS A quick introduction
  • European Molecular Biology Open Software Suite
  • Open source package for sequence analysis
  • ANSI C source code
  • GPL licensed applications, LGPL libraries
  • 200 applications
  • 100 third party applications in 15 associated
    packages
  • Project started 1996 at Sanger Centre and HGMP
  • Now based at EBI
  • Release 6.3.0 15th July 2010
  • Funded by UK-BBSRC and EMBL-EBI

3
EMBOSS history
  • Project started at Sanger Centre and SEQNET
    August 1996
  • Alan moved from SEQNET 1997 (Wellcome funding)
  • Peter moved to Lion Bioscience 2000
    (CCP11-BBSRC/MRC)
  • Peter moved to EBI 2003
  • HGMP closed 2005 AlanJon moved to EBI
  • BBSRC funding (limited) 2006-2009
  • BBSRC BBR funding 2009-2011
  • Major new developments
  • New data types
  • New data sources
  • Built-in ontologies

4
EMBOSS command line interface
  • EMBOSS applications run from the command line
  • This is not the only interface
  • There are over 100 interfaces and packaged
    systems available
  • Web interfaces
  • Graphical user interfaces (GUIs)
  • Web services
  • All applications have a command definition file
    (.acd)
  • Defines all inputs, outputs, and other options
  • Read at startup
  • Contains all command line options with
    descriptions
  • Template for any other interface

5
EMBOSS command line example
  • antigenic
  • Input protein sequence(s) uniprotactb1_fugru
  • Minimum length of antigenic region 6
  • Output report actb1_fugru.antigenic
  • antigenic uniprotactb1_fugru -auto

6
EMBOSS ACD File
integer minlen standard "Y" minimum
"1" maximum "50" default "6"
information "Minimum length of antigenic
region" endsection required section
output information "Output section type
"page report outfile parameter "Y"
rformat "motif" multiple "Y"
taglist "intposMax_score_pos" endsection
output
  • application antigenic
  • documentation "Finds antigenic sites in
    proteins"
  • groups "ProteinMotifs"
  • section input information "Input section
    type "page
  • seqall sequence
  • parameter "Y"
  • type proteinstandard"
  • endsection input
  • section required information "Required
    section type "page

7
EMBOSS ACD File with EDAM Annotation
integer minlen standard "Y" minimum
"1" maximum "50" default "6"
information "Minimum length of antigenic
region" relations "EDAM0001249 data
Sequence length endsection
required section output information "Output
section type "page report outfile
parameter "Y" rformat "motif" multiple
"Y" taglist "intposMax_score_pos"
relations "EDAM0001534 data Peptide
immunogenicity report endsection output
  • application antigenic
  • documentation "Finds antigenic sites in
    proteins"
  • groups "ProteinMotifs"
  • relations "EDAM0000201 topic Immunological
    analysis"
  • relations "EDAM0000416 operation Epitope
    mapping
  • section input information "Input section
    type "page
  • seqall sequence
  • parameter "Y"
  • type proteinstandard"
  • relations "EDAM0001219 data Pure protein
    sequence"
  • relations "EDAM0000849 data Sequence
    record"
  • relations "EDAM0002178 data 1 or more
  • endsection input

8
Documentation books
  • Three books at typesetting stage.
  • Administrators Manual
  • Users Manual
  • Developers Manual
  • Concomitant major revision of EMBOSS website.
  • Automation of website content addition.
  • Books to form basis of new website content.

9
EMBOSS Sequences
  • Uniform Sequence Address (USA) URL-style naming
  • Derived from the familiar "VMS logical name"
    syntax used by SRS and GCG.
  • database entryname
  • embl ecompa ID or accession can be used in
    this way
  • uniprot-id opsd_bovin SRS syntax for query by
    ID
  • embl-acc x13776 SRS syntax for query by
    accession
  • format filename
  • fasta /users/pmr/paamir.fa Filename with
    specific format
  • ecoompa.genbank With no format, can try all
    formats
  • format filename entryname
  • fasta unfinished AH6.1 Most formats allow
    multiple sequences
  • Also _at_listfile
  • and asisgctgactgactgatg
  • Queries database-fieldquery SRS syntax for id,
    acc, sv, des, key, org

10
New data resources
  • Aim to read all public data resources
  • Follow cross-references (explicit and implied)
  • UniProt
  • EMBL/GenBank/DDBJ
  • Other
  • Servers
  • Multiple data resources through a single server
    definition
  • DAS, Ensembl, BioMart, WsEbeye, DbFetch, SRS
  • Cache files of resource definitions for server
  • Data resource catalogue (drcat)
  • 600 data resources
  • Query terms and URLs
  • EDAM annotation of resources, formats,
    identifiers, terms

11
Data resource catalogue (drcat)
  • ID ArachnoServer
  • Acc DB-0145
  • Name ArachnoServer
  • Desc Spider toxin database
  • URL http//www.arachnoserver.org
  • Cat Organism-specific databases
  • Taxon 6845 Arachnida
  • EDAMres 0000621 Organism-specific
  • EDAMdat 0002400 Toxin annotation
  • EDAMid 0002578 ArachnoServer ID
  • Xref SP_explicit ArachnoServer IDToxin name
  • Query Toxin annotation HTML ArachnoServer
    ID www.arachnoserver.org/toxincard.html?ids
  • Example ArachnoServer ID AS000014
  • CCmisc BMC Genomics 10375-375(2009) Pubmed
    19674480

12
EMBOSS Data Types
  • Sequences
  • Nucleotide (DNA and RNA)
  • Protein
  • Features
  • Attached to sequences
  • Independent data objects
  • Bio-Ontologies (OBO)
  • Taxonomy (NCBI)
  • Data Resources
  • Assembled reads
  • Text
  • Text, HTML, XML

13
New data types
  • Reuse USA syntax
  • Server Dbname identifier Database has an
    access method
  • Server Dbname field query General field
    names
  • Data types features, bio-ontologies, taxonomy,
    etc.
  • Access methods HTTP, DAS, BioMart, Ensembl, ...
  • Multiple types and formats for a server/resource
  • type sequence features
  • format embl fasta

14
EMBOSS Query Language
  • Query fields are now made general
  • Any field queriable by the access method (DAS,
    SRS, )
  • Any index created by indexing applications
  • Any query term in the data resource catalogue
  • Multiple queries combined
  • For one data resource
  • AND, OR, to combine queries

15
DAS Server Definitions
  • SERVER das
  • method "dassource"
  • type "sequence, features"
  • url "http//www.dasregistry.org/das/
    "
  • comment "access sequence/feature sources
    listed on das registry
  • (http//www.dasregistry.org/das/)"
  • cachefile "server.dassource"

16
DAS Server Definitions
  • SERVER ensembldas
  • method "dassource"
  • type "sequence, features"
  • url "http//www.ensembl.org/das/"
  • comment "access sequence/feature sources on
    ensembl das server
  • (http//www.ensembl.org/das/)"
  • cachefile "server.ensembldas"

17
DAS Example
  • DB Ensembl_Human_Genes
  • method das
  • type "Sequence, Features
  • taxon "9606
  • format "das, dasgff
  • url http//www.ebi.ac.uk/das-srv/genedas/da
    s/ Homo_sapiens.Gene_ID.reference
  • example "ENSG00000139618
  • comment "The Ensembl human Gene_ID
    reference source, serving sequences and
    non-location features.
  • hasaccession "N
  • identifier "segment
  • fields "segment, type, category,
    categorize, feature_id

18
Ensembl DAS Example
  • DB Felis_catus_CAT_prediction_transcript
  • method das
  • type "Nucfeatures
  • taxon "9685
  • format "dasgff
  • url http//www.ensembl.org/das/Felis_catus.
    CAT.prediction_transcript
  • example "scaffold_2099871550
  • comment "Annotation source for Felis_catus
    prediction_transcript
  • hasaccession "N
  • identifier "segment
  • fields "segment, type, category,
    categorize, feature_id

19
EMBOSS Query Language
  • das ensembl_human_genes ENSG00000139618
  • ensembldas Felis_catus_CAT_prediction_transcript
    scaffold_209987 1550
  • das Homo_sapiens_GRCh37_transcript 10
    3288961132973347
  • das uniprot P00280
  • das cath 5pti
  • das uniparc UPI000000000A
  • das Homo_sapiens_GRCh37_reference-
    segment 11 type supercontig

20
EMBOSS Query Language Future
  • Ontology-based searches of data resources
  • Taxonomy
  • EDAM terms
  • Resources
  • Data types
  • Identifiers
  • Descriptions
  • Search for applications matching data types
  • Sequences and features
  • Nucleotide and protein
  • Support for DAS advanced query ...

21
Acknowledgements
  • EBI Peter Rice, Alan Bleasby, Jon Ison, Mahmut
    Uludag, Martin Senger, Tom Oinn, Jaina Mistry,
    Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam
  • RFCGR/HGMP Alan Bleasby, Jon Ison, Tim Carver,
    Hugh Morgan, Claude Beazley, Lisa Mullan, Damian
    Counsell, Gary Williams, Val Curwen, Mark Faller,
    Sinead OLeary, Thon deBoer, Martin Bishop
  • Sanger Institute Ian Longden, Richard
    Bruskiewich, Simon Kelley
  • LION Mahmut Uludag, Thomas Laurent, Bijay
    Jassal, Bren Vaughan, Thure Etzold
  • National bioinformatics service providers in
    Norway, Spain, Italy, Netherlands, Germany,
    Belgium, Russia, China, Canada, Australia,
    Argentina
  • Others Catherine Letondal, Don Gilbert, Rodger
    Staden, Bill Pearson, Webb Miller, Marie-Laetitia
    Denayer, Amandine Schurmann, Gabriele Weiler,
    Luke McCarthy, David Mathog, David Bauer,
    Henrikki Almusa, Thomas Siegmund, Scott Markel,
    Darryl Leon, Bastien Chevreux, Ivo Hofacker, ...
  • IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun,
    LION bioscience, SciTegic, Cambridge University
    Press
  • Open-Bio Foundation, Sourceforge, Debian, Fedora,
    CEH
  • ... And the British Antarctic Survey
  • http//emboss.sourceforge.net
  • http//emboss.open-bio.org/wiki/Latest_development
    s
Write a Comment
User Comments (0)
About PowerShow.com