Title: Microarrays data analysis using Java visualization
1Microarrays data analysis using Java visualization
- By Rui Ding
- ------University of Minnesota, Morris
2Background
- Cancer classification is a very important
research areas. - Class prediction and class discovery are the goal
of cancer classification. - Use gene expression and make DNA microarray data
to do cancer classification. - Two common used Microarray.
- Statistical methods used in Microarry data.
- Visualization techniques used in Mircroarray data
- Our dataset is applied to human acute leukemias
data. 7128genes for each cases
3Backgroundintroduction to visualization technique
- The term visualization refers to an approach to
transforming scientific data into understandable
figures or graphs - At present, visualization has been widely used in
many different areas - Web-based visualization.
- Why we use visualization?
- Java and Java visualization
4Backgroundour objective
- Perform a Bayesian variable selection
- Perform a principal component analysis
- Use Java visualization technique to visualize
gene data for classification - Create the Java visualization website
5Bayesian probit model with variable selection
- We model the expression data using a binary
probit model. - Bayesian approaches to statistical inference
impose prior distributions on the unknown
parameters. - We use a Bayesian variable selection to identify
important genes. One of Bayesian variable
selection is the stochastic search variable
selection (SSVS) by utilizing mixture priors for
the regression coeficients to perform the
selection
6Principle Component Analysis
- Principle Component Analysis (PCA) is a method
that reduces data dimensions. - The purpose of PCA in this research is to reduce
7128 gene variables in 72 different people to a
smaller number of principle component.
7Principle Component AnalysisPrinciple Component
- Linear combination of observed variables
- Y1 a11(X1) a12(X2) a1p(Xp)
- Then, the ith principal component is
- Yi ai1(X1) ai2(X2) aip(Xp)
- we should decide how many components we need to
keep - we could extract either two principal components
or three principal components - The first component accounts for a maximal amount
of total variance in the observed variable. - The second component is the second most of the
amount of total variance in the observed
variable, meanwhile, it is uncorrelated with the
first component. - The remaining components follow the same rule as
the second principal component.
8Principle ComponentCovariance matrix and
Correlation matrix
- There are two important matrix used in the PCA,
covariance matrix and correlation matrix.
9Principle ComponentEigenvector and criterion in
solving PCA
- Principal components and eigenvector
- Kaiser criterion (eigenvalue-one criterion)
- Using a specified proportion of variance
10Principle Component AnalysisImplementation of
human leukemia data
- Extract three coordinates of 3D plot for JAVA
visualization from Human Acute Leukemia data set - 7218 gene expressions (Test dataset and Train
dataset) in three classes of leukemia 38 cases
of B-cell acute lymphoblastic leukemia (ALL), 9
cases of T-cell ALL and 25 cases of acute myeloid
leukemia (AML). - Gene expression levels were measured using
Affymetrix high-density oligonucleotide arrays. - We obtained three principal components for 72
cases which are a linear combination of 7218 gene
expressions variables with coefficients equal to
the components of an eigenvector of the
covariance matrix. - All computations for principle component analysis
were performed on a personal computer using
SASAnalyst (version 9.2).
11Java visualizationjava 3D
- Java 3D was developed by Sun company it is a
convenient approach to program interface used for
writing three-dimensional graphics applications
and applets. - High performance in Java 3D
- Application of Java 3D
12Java visualizationScene organization
- 3D Universe At the top of the tree structure in
Java 3D, it controls the real-time rendering of
the whole scene. - Scene Root it stores all scene data and does a
real-time rendering - View Port A class of universe. It is used to
control the location of the view, set the
projection model, set and display a substitution,
etc. We use the orthogonal projection model to
transform from 3D to 2D. - Objects In our system, we have two types of
objects atoms, represented as spheres, and
coordinate axis, represented as cylinders. - Lighting This category sets up
environment/direction light in this system.
13System overview
- (1) control module
- (2) data module
- (3) visualization module
14Visualization Module
- The role of the visualization module is to be in
charge of visualizing data in a webpage. - We use applet in our implementation.
- Two dimensional raw data.
- Three dimensional raw data.
- User is able to zoom in /out, translate, or
rotate the image in the display panel
15Control Module
- Scheduling between web pages, data forwarding,
handling errors and so on. - Our system involves only four pages.
- Two pages are static
- One is the data visualization page
- The error handling page
- The applet introduced in the last section display
the figure by requesting the data from the
server. - This creates communication problems between an
applet and the server. - We use HTTP to communicate
16Data Module
- The task of this data module is to parse the data
file. This system involves the existence leukemia
data files. - For atom structure data, we take into account the
higher-level data structure. - We use object-oriented method to parse the file.
It lets the object achieve self-analysis, then
the code is easier to maintain. - We construct the class structure
- (1)The ConfigModel class corresponds to the whole
point structure data file. This class uses the
configElemList property to save each
configuration file's information. - (2) configElem represents a configuration
information. Its attributes specify the code of
this configuration section. pointNum represents
number of point. Similarly, this class uses an
array (pointModelList) to store the point model
data as the current configuration. - (3) PointModel class represents a point model. We
have one point models in this system. Therefore,
we have constructed this class as an abstract
class. It uses an array (pointList) to store 3D
location data of each point. - (4) Point class represents the x, y, z coordinate
in the space.
17ResultBeysian Probit Model
- Using the Bayesian probit model with variable
selection, we selected 6 out of 7128 genes. - The probability of these six genes are 962th
gene 0.56, 2118th gene 0.63, 2402th gene 1.00,
3433th gene 1.00, 4993th gene 0.86, 5603th
gene 0.41. - These six genes did not help to classify two (AML
and ALL) or three groups (AML, B-cell, and
T-cell), but those genes can be important
biomarker which may be used to see how well the
body responds to a treatment for a disease or
condition.
18ResultPCA
- 3D figure shows two groups, AML(blue), and
ALL(red). - PCA method is good statistical technique for gene
classification
19Conclusion and Future Work
- Two statistical methods to do microarray data
analysis - Bayesian probit model with variable selection, we
selected 6 genes out of 7128 genes which can be
important biomarker. - we performed principle component analysis with
7128 genes. - The results of the PCA are much better to
classify two (AML and ALL) groups compared to the
method by the Bayesian probit model with variable
selection. - Our future study will be how to improve the
classification by the Bayesian probit model.
20Question
- Please feel free to ask questions
21Thanks for you time
- Thanks Jong-Min!
- Thanks Engin and Jon!
- Thanks all people that help me to complete the
goal of seminar. - End