Title: Applications in Bioinformatics Identifying Differential Gene Expression
1Applications in Bioinformatics Identifying
Differential Gene Expression
2Acknowledgments
- Collaborators
- John Oleskiewicz, GVSU
- Dr. Kyle Furge, Van Andel Research Institute
- Research Support
- Science Math Summer Research Award
- Travel Support
- For presentation at IEEE Bioinformatics
Conference, Stanford University, Palo Alto,
August 2002.
3First Thoughts
- Perhaps bioinformatics--the shotgun marriage
between biology, mathematics, computer science,
and engineering--is like an elephant that
occupies a large chair in the scientific living
room ... There are probably many biologists who
feel that a major product of this bioinformatics
elephant is large piles of waste material." - -- Sylvia J. Spengler Science Feb.
18, 2000.
4Bioinformatics in Context
- Evolution of the Scientific Method
- Empirical science observation
- Theoretical science modeling
- Computational science simulation
-
- in vivo ? in vitro ? in silica
- Computational Biology analytical computations
- Bioinformatics information management, data
mining
5Overview
- Gene expression
- DNA ? transcription ? mRNA ? translation ?
protein -
- Genomic gains/losses
- Clinical prognosis
- Differences in gene expression level
- Regulatory gene defects
- DNA mutations
- Single Nucleotide Polymorphisms
6Microarray Technology
- Lithographic technique
- cDNA microchip
- Robotic application
- Oligonucleotides
- Multiple patient samples
7Microarray Implementation
DNA clones
mRNA
Control
Sample
Reverse transcription (cDNA)
PCR amplification
Fluorescent labeling
Purification
Hybridization
8Microarray Scanning
Laser excitation
C C D detection
Computer analysis
Scanned image
9Microarray Analysis
- Rows ? genes
- Columns ? samples
- Identifies
- Expression patterns
- Expression levels
10Project Goal GAT
- Problem
- Hundreds of samples ? thousands of genes
- Goal
- Improve existing analysis tools
- Develop parallel version
- Model
- CIT (Cluster Identification Tool)
- Method
- Reverse engineering
- Consultation
- Current papers
11Experimental Process
- Objectives
- Group experimental samples by user-defined
parameters - Calculate statistical metrics on gene expression
data (t-stat, discrimination score,
p-value, false discovery rate) - Employ statistical discrimination and permutation
analysis to identify genes that differentiate
between the groups - Generate lists of discriminating genes sorted by
significance
12Discriminant Analysis
- Finding significant discriminating genes
- Identify differentially expressed genes
students t-statistic -
- Measure significance of difference
- compare sample group t-stat to a random
distribution - Calculate d-score, P-value
- Estimate FDR
13Identifying DifferentialGene Expression
Group1
Group2
Unsorted
GAT
14Performance Comparison
- Machine 1.5 GHz P4 w/256 MB
- Microarray 4550 genes
- Samples 8 melanoma cell lines, 52 others
- Analysis Package
- VARI CIT 9.4
- Stanford SAM
- Alternative statistical approach
- GAT Linux 2.4 w/gcc
- GAT Windows 2000 Pro w/Visual C
15Performance Execution Time
16Distributed Mode GAT
10.152.31.1 FreeBSD listening port 3307 processed
1101
Server 1
Perl Client Connecting... 10.152.31.1 10.152.31.2
10.152.31.3
10.152.31.2 Linux listening port 3308 processed
700
Server 2
GAT Client
10.152.31.3 Windows 2000 listening port
3307 processed 110
Server N
17Experimental Parameters
- Machines 900 MHz PIII w/256 MB
- Microarray 4550 genes
- Samples 8 melanoma cell lines, 52 others
- Analysis package
- Distributed Mode GAT
- Parallel algorithm for concurrent execution
- Load balancing on multiple Servers
18Performance Parallel Version
19Performance Scalability
20Summary
- GAT
- produces output consistent with existing data
mining and analysis tools - is faster than comparable tools
- scales to accommodate increasingly large files
- has a modular, extensible design
- has a distributed framework that can be applied
to other methods of gene expression analysis - is an open-source project
21Future Work
- Incorporate a GUI
- Investigate scalability issues (MCBI)
- Add additional functions to the C statistical
core
http//www.csis.gvsu.edu/den