Applications in Bioinformatics Identifying Differential Gene Expression - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Applications in Bioinformatics Identifying Differential Gene Expression

Description:

Perhaps bioinformatics--the shotgun marriage between biology, mathematics, ... VARI: CIT 9.4. Stanford: SAM. Alternative statistical approach. GAT Linux 2.4 w/gcc ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 22
Provided by: cisG5
Category:

less

Transcript and Presenter's Notes

Title: Applications in Bioinformatics Identifying Differential Gene Expression


1
Applications in Bioinformatics Identifying
Differential Gene Expression
2
Acknowledgments
  • Collaborators
  • John Oleskiewicz, GVSU
  • Dr. Kyle Furge, Van Andel Research Institute
  • Research Support
  • Science Math Summer Research Award
  • Travel Support
  • For presentation at IEEE Bioinformatics
    Conference, Stanford University, Palo Alto,
    August 2002.

3
First Thoughts
  • Perhaps bioinformatics--the shotgun marriage
    between biology, mathematics, computer science,
    and engineering--is like an elephant that
    occupies a large chair in the scientific living
    room ... There are probably many biologists who
    feel that a major product of this bioinformatics
    elephant is large piles of waste material."
  • -- Sylvia J. Spengler Science Feb.
    18, 2000.

4
Bioinformatics in Context
  • Evolution of the Scientific Method
  • Empirical science observation
  • Theoretical science modeling
  • Computational science simulation
  • in vivo ? in vitro ? in silica
  • Computational Biology analytical computations
  • Bioinformatics information management, data
    mining

5
Overview
  • Gene expression
  • DNA ? transcription ? mRNA ? translation ?
    protein
  • Genomic gains/losses
  • Clinical prognosis
  • Differences in gene expression level
  • Regulatory gene defects
  • DNA mutations
  • Single Nucleotide Polymorphisms

6
Microarray Technology
  • Lithographic technique
  • cDNA microchip
  • Robotic application
  • Oligonucleotides
  • Multiple patient samples

7
Microarray Implementation
DNA clones
mRNA
Control
Sample
Reverse transcription (cDNA)
PCR amplification
Fluorescent labeling
Purification
Hybridization
8
Microarray Scanning
Laser excitation
C C D detection
Computer analysis
Scanned image
9
Microarray Analysis
  • Rows ? genes
  • Columns ? samples
  • Identifies
  • Expression patterns
  • Expression levels

10
Project Goal GAT
  • Problem
  • Hundreds of samples ? thousands of genes
  • Goal
  • Improve existing analysis tools
  • Develop parallel version
  • Model
  • CIT (Cluster Identification Tool)
  • Method
  • Reverse engineering
  • Consultation
  • Current papers

11
Experimental Process
  • Objectives
  • Group experimental samples by user-defined
    parameters
  • Calculate statistical metrics on gene expression
    data (t-stat, discrimination score,
    p-value, false discovery rate)
  • Employ statistical discrimination and permutation
    analysis to identify genes that differentiate
    between the groups
  • Generate lists of discriminating genes sorted by
    significance

12
Discriminant Analysis
  • Finding significant discriminating genes
  • Identify differentially expressed genes
    students t-statistic
  • Measure significance of difference
  • compare sample group t-stat to a random
    distribution
  • Calculate d-score, P-value
  • Estimate FDR

13
Identifying DifferentialGene Expression
Group1
Group2
Unsorted
GAT
14
Performance Comparison
  • Machine 1.5 GHz P4 w/256 MB
  • Microarray 4550 genes
  • Samples 8 melanoma cell lines, 52 others
  • Analysis Package
  • VARI CIT 9.4
  • Stanford SAM
  • Alternative statistical approach
  • GAT Linux 2.4 w/gcc
  • GAT Windows 2000 Pro w/Visual C

15
Performance Execution Time
16
Distributed Mode GAT
10.152.31.1 FreeBSD listening port 3307 processed
1101
Server 1
Perl Client Connecting... 10.152.31.1 10.152.31.2
10.152.31.3
10.152.31.2 Linux listening port 3308 processed
700
Server 2
GAT Client
10.152.31.3 Windows 2000 listening port
3307 processed 110
Server N
17
Experimental Parameters
  • Machines 900 MHz PIII w/256 MB
  • Microarray 4550 genes
  • Samples 8 melanoma cell lines, 52 others
  • Analysis package
  • Distributed Mode GAT
  • Parallel algorithm for concurrent execution
  • Load balancing on multiple Servers

18
Performance Parallel Version
19
Performance Scalability
20
Summary
  • GAT
  • produces output consistent with existing data
    mining and analysis tools
  • is faster than comparable tools
  • scales to accommodate increasingly large files
  • has a modular, extensible design
  • has a distributed framework that can be applied
    to other methods of gene expression analysis
  • is an open-source project

21
Future Work
  • Incorporate a GUI
  • Investigate scalability issues (MCBI)
  • Add additional functions to the C statistical
    core

http//www.csis.gvsu.edu/den
Write a Comment
User Comments (0)
About PowerShow.com