Peer-to-peer bioinformatics iCapture - PowerPoint PPT Presentation

About This Presentation
Title:

Peer-to-peer bioinformatics iCapture

Description:

To facilitate context-driven hypotheses on the function of non-coding polymorphisms ... ORCA. Mauve. LAGAN. DIALIGN. Conreal. ClustalW. Projects involving Chinook ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 35
Provided by: stephe436
Category:

less

Transcript and Presenter's Notes

Title: Peer-to-peer bioinformatics iCapture


1
Peer-to-peer bioinformaticsiCapture Lunch and
Learn November 2004 Stephen Montgomery
2
Searching for Regulatory Variation
3
GOAL
  • To facilitate context-driven hypotheses on the
    function of non-coding polymorphisms
  • Utilizing diverse annotation
  • State-of-the-art regulatory analysis tools
  • Characterized regulatory regions and rSNPs

4
Utilizing diverse annotation
p53 alignment
snp density
5
State-of-the-art regulatory analysis tools
  • How do we get the best tools into our platform to
    help decode regulatory modules.
  • Storing all the results of various predictions
    across a genome can be unwieldy.
  • How do we facilitate access to the information
    encoded in the MANY bioinformatics tools?

6
How are bioinformatics tools developed?
Data and Innovation
Requirements
Biologists
Bioinformaticians
Computer Scientists
Innovation
Organization and Interpretation
7
Each community has its own culture
Biologists
Bioinformaticians
Computer Scientists
8
An interaction map of biologists and
bioinformaticians
Bioinformaticians
Biologists
9
Things get more complicated
  • Each individual has
  • Access to different resources
  • Computational / Monetary / Personnel
  • Finite time available
  • A different social network
  • Professional obligations
  • Each group
  • Organizational boundaries
  • Toolkit (suites and scripts)
  • Method of providing tools (OS, Internet,
    Interfaces)

10
An improved interaction map of biologists and
bioinformaticians
Bioinformaticians
Biologists
Improved Communication
Access to communities Access to resources Retain
sub-organization
11
A community-based approach to bioinformatics
analysis
  • Use the principles of peer-to-peer technology
  • Allow biologists to easily discover and run
    bioinformatics tools
  • Create a dynamic, reliable network for analysis
  • Reduce overlapping integration efforts
  • Improve communication within/outside
    organizations
  • Address problems relevant to bioinformatics
  • Attribution
  • Resource distribution
  • Specialized data

12
discover and run jobs here or through bioperl
13
(No Transcript)
14
Chinook Architecture
15
(No Transcript)
16
Architecture Outline
  • Chinook Configuration
  • Chinook Data

17
Chinook Configuration
  • Add new services
  • Set Server Mode
  • Open ports
  • Customize Advertisements
  • Run

18
Adding services (XML-Based)
19
Adding services (GUI-based)
20
Chinook Server ModesRMI vs. Web services
  • Web Services
  • Runs over Tomcat and Axis (SOAP engine)
  • Harder to set-up, More management features
  • Language independent
  • RMI
  • Uses the RMI registry (included in JDK/JRE)
  • Easier to set-up
  • Only allows Java to Java communication

21
How the P2P works JXTA Advertisements
22
Customizing your advertisements
  • Edit the advertisements/ directory in your
    Chinook installation to include your endpoint
    location.

23
Running Chinook
  • Use Ant
  • ant p2p-start
  • ant server-start
  • From an IDE
  • From an Installer

24
Chinook Data
Adding new data objects to services Defining Data
objects Providing Database support Client
Interaction
25
Chinook Data / Databases
  • Each application defines its own data objects
  • ltdata_entry_setgt
  • ltnamegtdna_sequencelt/namegt
  • ltmaximum_countgt2lt/maximum_countgt
  • ltminimum_countgt2lt/minimum_countgt
  • ltdata_entry_type_namegtDNA_LOCATIONlt/data_entry_ty
    pe_namegt
  • ltdata_entry_type_namegtDNA_FILElt/data_entry_type_n
    amegt
  • ltset_output_class_namegtca.bcgsc.chinook.parsing.s
    etoutput.impl.DataEntrySetOutputterImpllt/set_outpu
    t_class_namegt
  • lt/data_entry_setgt

26
Data Interpretation
  • ltdata_entrygt
  • ltdata_entry_type_namegtDNA_LOCATIONlt/data_entry_ty
    pe_namegt
  • ltdata_entry_class_namegtca.bcgsc.chinook.common.da
    taentry.objects.impl.SequenceCoordinatelt/data_entr
    y_class_namegt
  • ltdata_entry_validation_class_namegtca.bcgsc.chinoo
    k.common.dataentry.validation.impl.SequenceCoordin
    ateValidationlt/data_entry_validation_class_namegt
  • ltdata_entry_gui_support_class_namegtca.bcgsc.chino
    ok.common.dataentry.guisupport.impl.SequenceCoordi
    nateGUISupportlt/data_entry_gui_support_class_namegt
  • ltdata_entry_loader_class_namegtca.bcgsc.chinook.co
    mmon.dataentry.loader.impl.SequenceCoordinateLoade
    rlt/data_entry_loader_class_namegt
  • lt/data_entrygt

27
Database support
  • Avoiding empty text boxes
  • ltdatabasegt
  • lttypegtENSEMBLlt/typegt
  • ltparsing_classgtca.bcgsc.chinook.server.database.i
    mpl.EnsemblDatabaseTypelt/parsing_classgt
  • ltconnectiongt
  • ltnamegtdbx440lt/namegt
  • lthostgtensembldb.ensembl.orglt/hostgt
  • ltusergtanonymouslt/usergt
  • ltportgt3306lt/portgt
  • lt/connectiongt
  • ltminimum_versiongt20lt/minimum_versiongt
  • lt/databasegt

28
Database support
  • Finding DNA_LOCATION
  • public interface DNASequenceDatabase
  • public static final String DATA_TYPE
    "DNA_LOCATION"
  • /
  • Get the DNA sequence for the genomic location
  • _at_param species, the species
  • _at_param version, the version
  • _at_param chromosome, the chr number or contig
  • _at_param abs_start, absolute start - genomic
    coordinate
  • _at_param abs_end, absolute end - genomic
    coordinate
  • _at_param strand, "-1", or "1"
  • _at_return the DNA sequence
  • _at_throws DatabaseConnectionException, if cannot
    connect to database
  • _at_throws SequenceNotFoundException, if cannot
    obtain sequence
  • /

29
Client Interaction
30
Use cases of Chinook
  • Grid/Cluster computing.
  • Internally connect teams / individuals.
  • Collaborate with remote individuals.
  • Provide an API layer to your algorithms.
  • Insert bioinformatics analysis into applications.
  • Show off your tools.

31
Algorithms integrated into Chinook
ClustalW Genscan
Conreal Sim4
DIALIGN MSCAN
LAGAN ANN-Spec
Mauve Recursive Gibbs Motif Sampler
ORCA MEME
Shuffle-LAGAN Motifsampler
T-Coffee RSAT oligo analysis
Promoterwise STUBB
Primer3 Teiresias
Eponine wConsensus
ELPH ContigMerger
32
Projects involving Chinook
  • OrthoSEQ plans to provides analysis through the
    Chinook/Bioperl Perl API.
  • Sockeye uses Chinook to deliver state-of-the-art
    alignment, PCR prediction, and regulatory
    analysis
  • Pegasys plans to provide pipeline management to
    subset of services advertised by Chinook.
  • Bio-Linux planning to integrate a subset of their
    algorithms.

33
Future Plans
34
Acknowledgements
  • GENOME SCIENCES CENTRE
  • Steve Jones
  • Tony Fu
  • Jun Guan
  • Keven Lin
  • Asim Siddiqui
  • Genereg team _at_ GSC
  • Mark Mayo
  • Bernard Li
  • CENTRE FOR MOLECULAR MEDICINE AND THERAPEUTICS
  • Wyeth Wasserman
  • Jonathan Lim
  • UBC BIOINFORMATICS CENTRE
  • Francis Ouellette
  • Graham McVicker
  • Sohrab Shah

Funding MSFHR, Genome Canada VISIT
http//smweb.bcgsc.bc.ca OR http//www.bcgsc.bc.ca
/chinook
Write a Comment
User Comments (0)
About PowerShow.com