Title: Chinook
1Chinook Peer-to-peer Bioinformatics Services
Montgomery SB, Fu T, Siddiqui AS, Jones SJM
1. Introduction
The Chinook Client cont.
Bioinformatics techniques are used to identify
complex, re-occurring relationships in genetic
data. Genome sequencing projects and
high-throughput expression analyses have
contributed large amounts of data both
complicating analysis and demanding higher-level
coordination of computational resources.
Furthermore, the variety of available
bioinformatics tools and algorithms, and their
diverse modes of usage create a situation where
most users have trouble discerning where to
invest their time and resources. Chinook
resolves these issues by creating a virtual
network for bioinformatics analyses. A user is
able to dynamically resolve available
bioinformatics services (algorithms) over the
Internet or their local network. The user can
then validate a server's authenticity and submit
bioinformatics analyses to peers that publish
their ability to perform desired services.
Information like bandwidth, jobs in queue and the
location of Chinook services are reported to
clients to aid in their job submission process.
A user is also able to visit the service
creator's website to identify what the particular
service does. Chinook allows a service provider
to create a new service by simply editing an XML
file as long as the new service has a standard
output format, no additional programming is
required. The Chinook server runs over the JXTA
peer-to-peer network in both Java Remote Method
Invocation (RMI) mode or through Apache Axis web
services. Chinook has been successfully
integrated into the Sockeye Comparative Genome
Browser (Montgomery et al.). Chinook creates a
virtual community where researchers can rapidly
hone in on applications of interest to run them
across multiple service providers while shifting
the responsibility for application maintenance
from the client to the developer. The Chinook
virtual network will be used to facilitate
high-throughput gene regulatory analyses and the
dynamic multiple alignment of co-expressed and
orthologous genes.
JXTA JXTA technology is a set of open protocols
that allow any connected device on the network
ranging from cell phones and wireless PDAs to PCs
and servers to communicate and collaborate in a
P2P manner.
4
300 dpi
Figure 4. Once a services has been selected, the
user is able to enter the required parameters and
sequences needed to run the job. If the service
allows it, new sequences can be added, removed or
validated by right-clicking on the sequence
table. Also, string-based parameters allow
regular expression validation valid parameters
appear as green while invalid parameters appears
as red. Further functionality allows sequences
to be loaded from files or from EnsEMBL.
EnsEMBL-related parameters appear as pull-down
lists to ease parameter entry.
150 dpi
600 dpi
2. The Chinook Network
The Chinook network is established using the
JXTA protocol. Each user or service provider on
startup attempts to register to the Chinook
peergroup. The discovery of Chinook peers across
the Internet is facilitated by a special type of
peer called a rendezvous peer rendezvous peers
are responsible for keeping a list of previously
discovered peers. Finding other members on the
Chinook network can be facilitated by static
rendezvous peers or through dynamic discovery
mechanisms. By using these discovery mechanisms,
service providers are able to launch Chinook
servers within the confines of a few computers, a
large-scale grid network, or the Internet.
Sockeye A Java-based application that uses 3D
graphics technology to facilitate the
visualization of annotation and conservation
across multiple sequences. This software uses
the EnsEMBL database project to import sequence
and annotation information from several
eukaryotic species. For more information visit
http//www.bcgsc.bc.ca/sockeye
Figure 5. Once the parameters have been
submitted, the job is run by the service
provider. A user is able to watch the process
and monitor the execution time on the jobs table.
When the job returns successfully, they can
right click to view the report. Reports can be
visible in condensed form or detailed form
depending on the initial input parameters.
4. The Chinook Server
The Chinook server is a highly-configurable
application that is able to publish services
through the JXTA network as belonging to the
Remote Method Invocation (RMI) protocol or Apache
Axiss implementation of the Web Services
protocol. Running a Chinook server requires
little effort as a server start-up can be as
complicated as running two commands.
Furthermore, every service is defined in XML
allowing for the easy addition of new services.
Below is an example of an XML service definition
that runs MLAGAN. Future versions of the server
will support application-specific metadata and
potentially allow for the input of biological
data in input parameter sets as either an LSID
and/or a generic object.
Figure 1. Each Chinook peer is aware of all the
peers in the network. A peer started within a
local network behind a firewall requires a proxy
to get the peer lists held in rendezvous peers on
the Internet. If a peer is unable to find a
Chinook rendezvous peer on start-up, that peer
will become a new rendezvous peer.
3. The Chinook Client
The Chinook client has been designed to run as
standalone or as part of another application,
like Sockeye (Montgomery et al.). On starting,
the Chinook client connects to the Chinook
peergroup and begins polling for service
advertisements. These advertisements contain
information about the location of Chinook servers
and what services they provide.
Existing Chinook Services (Oct. 2003) Chinook
currently supports 1) LAGAN 2) MLAGAN 3)
CLUSTALW 4) WATER (Emboss Smith-Waterman) Chinook
Services in Progress 1) RepeatMasker 2)
SLAGAN 3) CMSGSC Primer Prediction (McKay, S) 4)
Gibbs Sampler 5) TFFIND
Figure 6. The application specification of a
Chinook service in XML. A service provider is
also able to set system-wide parameters in XML or
define new services using this format. For
detailed instructions on creating new services,
see http//smweb.bcgsc.bc.ca
5. Applications to Genetics, Sockeye
Figure 2. The starting window of the Chinook
client. Here, services have been discovered
publishing their ability to perform LAGAN and
Clustalw alignments. A user is able to monitor
how many jobs that the service provider is
running and what the average bandwidth is for
communication between itself and the service
provider. Furthermore, the client provides
information on who the Chinook service provider
is and who published the original software.
Chinook is a unique collaborative tool for
performing complicated bioinformatics analyses.
It is my future goal to encapsulate regulation
prediction algorithms with tools to analyze
association studies for functional variants
within Chinook. By being able to analyze and
correlate predicted regulatory regions with new
information from association studies and
disequilibrium mapping, Chinook will be capable
of performing analyses which provides new
insights into the mechanisms of disease
susceptibility while easily providing a large
number of these useful tools to researchers.
This step is clearly doable as Chinook has
already demonstrated its adaptability in
performing comparative genomics analyses in the
Sockeye application.
Figure 7. A multiple alignment using Human,
Mouse, Rat and Fugu of the Huntingtons disease
protein generated by (m)LAGAN and Clustalw and
displayed in Sockeye. Sockeye generates this
alignment dynamically by allowing a user to
select regions of various genomes to align.
These regions are send to Chinook where
alignments are generated and condense reports are
returned. The condensed reports contain the gaps
which are displayed here. It is interesting to
note that the Clustalw alignment doesnt properly
align the first exon here.
8
Figure 3. By clicking on the publisher name, the
Chinook client launches a lightweight Internet
browser. This allows a user to browse
information regarding who published the original
software which Chinook is now providing a service
for. Furthermore, it allows service providers to
acknowledge credit to the original developers.
References and Acknowledgements
Thanks to Genome Canada and all the dedicated
people at the CMSGSC to whom I am extremely
grateful. Brudno, M et al. (2003). LAGAN and
Multi-LAGAN efficient tools for large-scale
multiple alignment of genomic DNA. Genome Res 13
721-731 Chenna, R et al. (2003). Multiple
sequence alignment with the Clustal series of
programs. Nucleic Acids Res 31 3497-3500
Montgomery, SB et al. (2003) Sockeye A 3D
Environment for Comparative Genomics (submitted
to Genome Research)
Once the user of the Chinook client finds a
service of interest, they are able to run it by
right-clicking on it. Currently though, Chinook
doesnt ontologically classify services beyond
providing a type. A version is currently being
developed that allows users to hone it on
services which they know and understand, for
example, Alignment, or Primer prediction, or
Motif-scanning.