Microarrays data analysis using Java visualization - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Microarrays data analysis using Java visualization

Description:

It is used to control the location of the view, set the projection model, set ... It uses an array (pointList) to store 3D location data of each point. ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 22
Provided by: DrI61
Category:

less

Transcript and Presenter's Notes

Title: Microarrays data analysis using Java visualization


1
Microarrays data analysis using Java visualization
  • By Rui Ding
  • ------University of Minnesota, Morris

2
Background
  • Cancer classification is a very important
    research areas.
  • Class prediction and class discovery are the goal
    of cancer classification.
  • Use gene expression and make DNA microarray data
    to do cancer classification.
  • Two common used Microarray.
  • Statistical methods used in Microarry data.
  • Visualization techniques used in Mircroarray data
  • Our dataset is applied to human acute leukemias
    data. 7128genes for each cases

3
Backgroundintroduction to visualization technique
  • The term visualization refers to an approach to
    transforming scientific data into understandable
    figures or graphs
  • At present, visualization has been widely used in
    many different areas
  • Web-based visualization.
  • Why we use visualization?
  • Java and Java visualization

4
Backgroundour objective
  • Perform a Bayesian variable selection
  • Perform a principal component analysis
  • Use Java visualization technique to visualize
    gene data for classification
  • Create the Java visualization website

5
Bayesian probit model with variable selection
  • We model the expression data using a binary
    probit model.
  • Bayesian approaches to statistical inference
    impose prior distributions on the unknown
    parameters.
  • We use a Bayesian variable selection to identify
    important genes. One of Bayesian variable
    selection is the stochastic search variable
    selection (SSVS) by utilizing mixture priors for
    the regression coeficients to perform the
    selection

6
Principle Component Analysis
  • Principle Component Analysis (PCA) is a method
    that reduces data dimensions.
  • The purpose of PCA in this research is to reduce
    7128 gene variables in 72 different people to a
    smaller number of principle component.

7
Principle Component AnalysisPrinciple Component
  • Linear combination of observed variables
  • Y1 a11(X1) a12(X2) a1p(Xp)
  • Then, the ith principal component is
  • Yi ai1(X1) ai2(X2) aip(Xp)
  • we should decide how many components we need to
    keep
  • we could extract either two principal components
    or three principal components
  • The first component accounts for a maximal amount
    of total variance in the observed variable.
  • The second component is the second most of the
    amount of total variance in the observed
    variable, meanwhile, it is uncorrelated with the
    first component.
  • The remaining components follow the same rule as
    the second principal component.

8
Principle ComponentCovariance matrix and
Correlation matrix
  • There are two important matrix used in the PCA,
    covariance matrix and correlation matrix.

9
Principle ComponentEigenvector and criterion in
solving PCA
  • Principal components and eigenvector
  • Kaiser criterion (eigenvalue-one criterion)
  • Using a specified proportion of variance

10
Principle Component AnalysisImplementation of
human leukemia data
  • Extract three coordinates of 3D plot for JAVA
    visualization from Human Acute Leukemia data set
  • 7218 gene expressions (Test dataset and Train
    dataset) in three classes of leukemia 38 cases
    of B-cell acute lymphoblastic leukemia (ALL), 9
    cases of T-cell ALL and 25 cases of acute myeloid
    leukemia (AML).
  • Gene expression levels were measured using
    Affymetrix high-density oligonucleotide arrays.
  • We obtained three principal components for 72
    cases which are a linear combination of 7218 gene
    expressions variables with coefficients equal to
    the components of an eigenvector of the
    covariance matrix.
  • All computations for principle component analysis
    were performed on a personal computer using
    SASAnalyst (version 9.2).

11
Java visualizationjava 3D
  • Java 3D was developed by Sun company it is a
    convenient approach to program interface used for
    writing three-dimensional graphics applications
    and applets.
  • High performance in Java 3D
  • Application of Java 3D

12
Java visualizationScene organization
  • 3D Universe At the top of the tree structure in
    Java 3D, it controls the real-time rendering of
    the whole scene.
  • Scene Root it stores all scene data and does a
    real-time rendering
  • View Port A class of universe. It is used to
    control the location of the view, set the
    projection model, set and display a substitution,
    etc. We use the orthogonal projection model to
    transform from 3D to 2D.
  • Objects In our system, we have two types of
    objects atoms, represented as spheres, and
    coordinate axis, represented as cylinders.
  • Lighting This category sets up
    environment/direction light in this system.

13
System overview
  • (1) control module
  • (2) data module
  • (3) visualization module

14
Visualization Module
  • The role of the visualization module is to be in
    charge of visualizing data in a webpage.
  • We use applet in our implementation.
  • Two dimensional raw data.
  • Three dimensional raw data.
  • User is able to zoom in /out, translate, or
    rotate the image in the display panel

15
Control Module
  • Scheduling between web pages, data forwarding,
    handling errors and so on.
  • Our system involves only four pages.
  • Two pages are static
  • One is the data visualization page
  • The error handling page
  • The applet introduced in the last section display
    the figure by requesting the data from the
    server.
  • This creates communication problems between an
    applet and the server.
  • We use HTTP to communicate

16
Data Module
  • The task of this data module is to parse the data
    file. This system involves the existence leukemia
    data files.
  • For atom structure data, we take into account the
    higher-level data structure.
  • We use object-oriented method to parse the file.
    It lets the object achieve self-analysis, then
    the code is easier to maintain.
  • We construct the class structure
  • (1)The ConfigModel class corresponds to the whole
    point structure data file. This class uses the
    configElemList property to save each
    configuration file's information.
  • (2) configElem represents a configuration
    information. Its attributes specify the code of
    this configuration section. pointNum represents
    number of point. Similarly, this class uses an
    array (pointModelList) to store the point model
    data as the current configuration.
  • (3) PointModel class represents a point model. We
    have one point models in this system. Therefore,
    we have constructed this class as an abstract
    class. It uses an array (pointList) to store 3D
    location data of each point.
  • (4) Point class represents the x, y, z coordinate
    in the space.

17
ResultBeysian Probit Model
  • Using the Bayesian probit model with variable
    selection, we selected 6 out of 7128 genes.
  • The probability of these six genes are 962th
    gene 0.56, 2118th gene 0.63, 2402th gene 1.00,
    3433th gene 1.00, 4993th gene 0.86, 5603th
    gene 0.41.
  • These six genes did not help to classify two (AML
    and ALL) or three groups (AML, B-cell, and
    T-cell), but those genes can be important
    biomarker which may be used to see how well the
    body responds to a treatment for a disease or
    condition.

18
ResultPCA
  • 3D figure shows two groups, AML(blue), and
    ALL(red).
  • PCA method is good statistical technique for gene
    classification

19
Conclusion and Future Work
  • Two statistical methods to do microarray data
    analysis
  • Bayesian probit model with variable selection, we
    selected 6 genes out of 7128 genes which can be
    important biomarker.
  • we performed principle component analysis with
    7128 genes.
  • The results of the PCA are much better to
    classify two (AML and ALL) groups compared to the
    method by the Bayesian probit model with variable
    selection.
  • Our future study will be how to improve the
    classification by the Bayesian probit model.

20
Question
  • Please feel free to ask questions

21
Thanks for you time
  • Thanks Jong-Min!
  • Thanks Engin and Jon!
  • Thanks all people that help me to complete the
    goal of seminar.
  • End
Write a Comment
User Comments (0)
About PowerShow.com