Title: Peer-to-peer bioinformatics iCapture
1Peer-to-peer bioinformaticsiCapture Lunch and
Learn November 2004 Stephen Montgomery
2Searching for Regulatory Variation
3GOAL
- To facilitate context-driven hypotheses on the
function of non-coding polymorphisms - Utilizing diverse annotation
- State-of-the-art regulatory analysis tools
- Characterized regulatory regions and rSNPs
4Utilizing diverse annotation
p53 alignment
snp density
5State-of-the-art regulatory analysis tools
- How do we get the best tools into our platform to
help decode regulatory modules. - Storing all the results of various predictions
across a genome can be unwieldy. - How do we facilitate access to the information
encoded in the MANY bioinformatics tools?
6How are bioinformatics tools developed?
Data and Innovation
Requirements
Biologists
Bioinformaticians
Computer Scientists
Innovation
Organization and Interpretation
7Each community has its own culture
Biologists
Bioinformaticians
Computer Scientists
8An interaction map of biologists and
bioinformaticians
Bioinformaticians
Biologists
9Things get more complicated
- Each individual has
- Access to different resources
- Computational / Monetary / Personnel
- Finite time available
- A different social network
- Professional obligations
- Each group
- Organizational boundaries
- Toolkit (suites and scripts)
- Method of providing tools (OS, Internet,
Interfaces)
10An improved interaction map of biologists and
bioinformaticians
Bioinformaticians
Biologists
Improved Communication
Access to communities Access to resources Retain
sub-organization
11A community-based approach to bioinformatics
analysis
- Use the principles of peer-to-peer technology
- Allow biologists to easily discover and run
bioinformatics tools - Create a dynamic, reliable network for analysis
- Reduce overlapping integration efforts
- Improve communication within/outside
organizations - Address problems relevant to bioinformatics
- Attribution
- Resource distribution
- Specialized data
12discover and run jobs here or through bioperl
13(No Transcript)
14Chinook Architecture
15(No Transcript)
16Architecture Outline
- Chinook Configuration
- Chinook Data
17Chinook Configuration
- Add new services
- Set Server Mode
- Open ports
- Customize Advertisements
- Run
18Adding services (XML-Based)
19Adding services (GUI-based)
20Chinook Server ModesRMI vs. Web services
- Web Services
- Runs over Tomcat and Axis (SOAP engine)
- Harder to set-up, More management features
- Language independent
- RMI
- Uses the RMI registry (included in JDK/JRE)
- Easier to set-up
- Only allows Java to Java communication
21How the P2P works JXTA Advertisements
22Customizing your advertisements
- Edit the advertisements/ directory in your
Chinook installation to include your endpoint
location.
23Running Chinook
- Use Ant
- ant p2p-start
- ant server-start
- From an IDE
- From an Installer
24Chinook Data
Adding new data objects to services Defining Data
objects Providing Database support Client
Interaction
25Chinook Data / Databases
- Each application defines its own data objects
- ltdata_entry_setgt
- ltnamegtdna_sequencelt/namegt
- ltmaximum_countgt2lt/maximum_countgt
- ltminimum_countgt2lt/minimum_countgt
- ltdata_entry_type_namegtDNA_LOCATIONlt/data_entry_ty
pe_namegt - ltdata_entry_type_namegtDNA_FILElt/data_entry_type_n
amegt - ltset_output_class_namegtca.bcgsc.chinook.parsing.s
etoutput.impl.DataEntrySetOutputterImpllt/set_outpu
t_class_namegt - lt/data_entry_setgt
26Data Interpretation
- ltdata_entrygt
- ltdata_entry_type_namegtDNA_LOCATIONlt/data_entry_ty
pe_namegt - ltdata_entry_class_namegtca.bcgsc.chinook.common.da
taentry.objects.impl.SequenceCoordinatelt/data_entr
y_class_namegt - ltdata_entry_validation_class_namegtca.bcgsc.chinoo
k.common.dataentry.validation.impl.SequenceCoordin
ateValidationlt/data_entry_validation_class_namegt - ltdata_entry_gui_support_class_namegtca.bcgsc.chino
ok.common.dataentry.guisupport.impl.SequenceCoordi
nateGUISupportlt/data_entry_gui_support_class_namegt
- ltdata_entry_loader_class_namegtca.bcgsc.chinook.co
mmon.dataentry.loader.impl.SequenceCoordinateLoade
rlt/data_entry_loader_class_namegt - lt/data_entrygt
27Database support
- Avoiding empty text boxes
- ltdatabasegt
- lttypegtENSEMBLlt/typegt
- ltparsing_classgtca.bcgsc.chinook.server.database.i
mpl.EnsemblDatabaseTypelt/parsing_classgt - ltconnectiongt
- ltnamegtdbx440lt/namegt
- lthostgtensembldb.ensembl.orglt/hostgt
- ltusergtanonymouslt/usergt
- ltportgt3306lt/portgt
- lt/connectiongt
- ltminimum_versiongt20lt/minimum_versiongt
- lt/databasegt
28Database support
- Finding DNA_LOCATION
- public interface DNASequenceDatabase
- public static final String DATA_TYPE
"DNA_LOCATION" - /
- Get the DNA sequence for the genomic location
- _at_param species, the species
- _at_param version, the version
- _at_param chromosome, the chr number or contig
- _at_param abs_start, absolute start - genomic
coordinate - _at_param abs_end, absolute end - genomic
coordinate - _at_param strand, "-1", or "1"
- _at_return the DNA sequence
- _at_throws DatabaseConnectionException, if cannot
connect to database - _at_throws SequenceNotFoundException, if cannot
obtain sequence - /
29Client Interaction
30Use cases of Chinook
- Grid/Cluster computing.
- Internally connect teams / individuals.
- Collaborate with remote individuals.
- Provide an API layer to your algorithms.
- Insert bioinformatics analysis into applications.
- Show off your tools.
31Algorithms integrated into Chinook
ClustalW Genscan
Conreal Sim4
DIALIGN MSCAN
LAGAN ANN-Spec
Mauve Recursive Gibbs Motif Sampler
ORCA MEME
Shuffle-LAGAN Motifsampler
T-Coffee RSAT oligo analysis
Promoterwise STUBB
Primer3 Teiresias
Eponine wConsensus
ELPH ContigMerger
32Projects involving Chinook
- OrthoSEQ plans to provides analysis through the
Chinook/Bioperl Perl API. - Sockeye uses Chinook to deliver state-of-the-art
alignment, PCR prediction, and regulatory
analysis - Pegasys plans to provide pipeline management to
subset of services advertised by Chinook. - Bio-Linux planning to integrate a subset of their
algorithms.
33Future Plans
34Acknowledgements
- GENOME SCIENCES CENTRE
- Steve Jones
- Tony Fu
- Jun Guan
- Keven Lin
- Asim Siddiqui
- Genereg team _at_ GSC
- Mark Mayo
- Bernard Li
- CENTRE FOR MOLECULAR MEDICINE AND THERAPEUTICS
- Wyeth Wasserman
- Jonathan Lim
- UBC BIOINFORMATICS CENTRE
- Francis Ouellette
- Graham McVicker
- Sohrab Shah
Funding MSFHR, Genome Canada VISIT
http//smweb.bcgsc.bc.ca OR http//www.bcgsc.bc.ca
/chinook