Microarrays data analysis using Java visualization - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Microarrays data analysis using Java visualization

Description:

It is used to control the location of the view, set the projection model, set ... It uses an array (pointList) to store 3D location data of each point. ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 22

Provided by: DrI61

Category:

more less

Transcript and Presenter's Notes

Title: Microarrays data analysis using Java visualization

1
Microarrays data analysis using Java visualization

By Rui Ding
------University of Minnesota, Morris

2
Background

Cancer classification is a very important
research areas.
Class prediction and class discovery are the goal
of cancer classification.
Use gene expression and make DNA microarray data
to do cancer classification.
Two common used Microarray.
Statistical methods used in Microarry data.
Visualization techniques used in Mircroarray data
Our dataset is applied to human acute leukemias
data. 7128genes for each cases

3
Backgroundintroduction to visualization technique

The term visualization refers to an approach to
transforming scientific data into understandable
figures or graphs
At present, visualization has been widely used in
many different areas
Web-based visualization.
Why we use visualization?
Java and Java visualization

4
Backgroundour objective

Perform a Bayesian variable selection
Perform a principal component analysis
Use Java visualization technique to visualize
gene data for classification
Create the Java visualization website

5
Bayesian probit model with variable selection

We model the expression data using a binary
probit model.
Bayesian approaches to statistical inference
impose prior distributions on the unknown
parameters.
We use a Bayesian variable selection to identify
important genes. One of Bayesian variable
selection is the stochastic search variable
selection (SSVS) by utilizing mixture priors for
the regression coeficients to perform the
selection

6
Principle Component Analysis

Principle Component Analysis (PCA) is a method
that reduces data dimensions.
The purpose of PCA in this research is to reduce
7128 gene variables in 72 different people to a
smaller number of principle component.

7
Principle Component AnalysisPrinciple Component

Linear combination of observed variables
Y1 a11(X1) a12(X2) a1p(Xp)
Then, the ith principal component is
Yi ai1(X1) ai2(X2) aip(Xp)
we should decide how many components we need to
keep
we could extract either two principal components
or three principal components
The first component accounts for a maximal amount
of total variance in the observed variable.
The second component is the second most of the
amount of total variance in the observed
variable, meanwhile, it is uncorrelated with the
first component.
The remaining components follow the same rule as
the second principal component.

8
Principle ComponentCovariance matrix and
Correlation matrix

There are two important matrix used in the PCA,
covariance matrix and correlation matrix.

9
Principle ComponentEigenvector and criterion in
solving PCA

Principal components and eigenvector
Kaiser criterion (eigenvalue-one criterion)
Using a specified proportion of variance

10
Principle Component AnalysisImplementation of
human leukemia data

Extract three coordinates of 3D plot for JAVA
visualization from Human Acute Leukemia data set
7218 gene expressions (Test dataset and Train
dataset) in three classes of leukemia 38 cases
of B-cell acute lymphoblastic leukemia (ALL), 9
cases of T-cell ALL and 25 cases of acute myeloid
leukemia (AML).
Gene expression levels were measured using
Affymetrix high-density oligonucleotide arrays.
We obtained three principal components for 72
cases which are a linear combination of 7218 gene
expressions variables with coefficients equal to
the components of an eigenvector of the
covariance matrix.
All computations for principle component analysis
were performed on a personal computer using
SASAnalyst (version 9.2).

11
Java visualizationjava 3D

Java 3D was developed by Sun company it is a
convenient approach to program interface used for
writing three-dimensional graphics applications
and applets.
High performance in Java 3D
Application of Java 3D

12
Java visualizationScene organization

3D Universe At the top of the tree structure in
Java 3D, it controls the real-time rendering of
the whole scene.
Scene Root it stores all scene data and does a
real-time rendering
View Port A class of universe. It is used to
control the location of the view, set the
projection model, set and display a substitution,
etc. We use the orthogonal projection model to
transform from 3D to 2D.
Objects In our system, we have two types of
objects atoms, represented as spheres, and
coordinate axis, represented as cylinders.
Lighting This category sets up
environment/direction light in this system.

13
System overview

(1) control module
(2) data module
(3) visualization module

14
Visualization Module

The role of the visualization module is to be in
charge of visualizing data in a webpage.
We use applet in our implementation.
Two dimensional raw data.
Three dimensional raw data.
User is able to zoom in /out, translate, or
rotate the image in the display panel

15
Control Module

Scheduling between web pages, data forwarding,
handling errors and so on.
Our system involves only four pages.
Two pages are static
One is the data visualization page
The error handling page
The applet introduced in the last section display
the figure by requesting the data from the
server.
This creates communication problems between an
applet and the server.
We use HTTP to communicate

16
Data Module

The task of this data module is to parse the data
file. This system involves the existence leukemia
data files.
For atom structure data, we take into account the
higher-level data structure.
We use object-oriented method to parse the file.
It lets the object achieve self-analysis, then
the code is easier to maintain.
We construct the class structure
(1)The ConfigModel class corresponds to the whole
point structure data file. This class uses the
configElemList property to save each
configuration file's information.
(2) configElem represents a configuration
information. Its attributes specify the code of
this configuration section. pointNum represents
number of point. Similarly, this class uses an
array (pointModelList) to store the point model
data as the current configuration.
(3) PointModel class represents a point model. We
have one point models in this system. Therefore,
we have constructed this class as an abstract
class. It uses an array (pointList) to store 3D
location data of each point.
(4) Point class represents the x, y, z coordinate
in the space.

17
ResultBeysian Probit Model

Using the Bayesian probit model with variable
selection, we selected 6 out of 7128 genes.
The probability of these six genes are 962th
gene 0.56, 2118th gene 0.63, 2402th gene 1.00,
3433th gene 1.00, 4993th gene 0.86, 5603th
gene 0.41.
These six genes did not help to classify two (AML
and ALL) or three groups (AML, B-cell, and
T-cell), but those genes can be important
biomarker which may be used to see how well the
body responds to a treatment for a disease or
condition.

18
ResultPCA

3D figure shows two groups, AML(blue), and
ALL(red).
PCA method is good statistical technique for gene
classification

19
Conclusion and Future Work

Two statistical methods to do microarray data
analysis
Bayesian probit model with variable selection, we
selected 6 genes out of 7128 genes which can be
important biomarker.
we performed principle component analysis with
7128 genes.
The results of the PCA are much better to
classify two (AML and ALL) groups compared to the
method by the Bayesian probit model with variable
selection.
Our future study will be how to improve the
classification by the Bayesian probit model.

20
Question