CS491JH: Data Mining in Bioinformatics

About This Presentation

Title:

CS491JH: Data Mining in Bioinformatics

Description:

CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 31

Provided by: titanBiot

Category:

more less

Transcript and Presenter's Notes

Title: CS491JH: Data Mining in Bioinformatics

1

CS491JH Data Mining in Bioinformatics
Introduction to Microarray Technology
Technology Background
Data Processing Procedure
Characteristics of Data
Data integration and Data mining

2
Substrates for High Throughput Arrays
Single label P33
Single label biotin streptavidin
Dual label Cy3, Cy5
3
GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, labeled RNA target
Oligonucleotide probe
24µm
Millions of copies of a specific oligonucleotide
probe
1.28cm
gt200,000 different complementary probes
Image of Hybridized Probe Array
4
GeneChip Expression Array Design
Gene Sequence
Probes designed to be Perfect Match
Probes designed to be Mismatch
5
Procedures for Target Preparation
Cells
Labeled transcript
AAAA
IVT (Biotin-UTP Biotin-CTP)
L
L
L
L
Poly (A)/ Total RNA
cDNA
Fragment (heat, Mg2)
L
L
Wash Stain
Hybridize (16 hours)
L
L
Scan
Labeled fragments
6
Microarray Technology
7
Printing Arrays on 50 slides
8
Ratio of expression of genes from two sources
Total or
9
GSI Lumonics
10
Cattle and Soy Controls
Beta Actin
PKG
HPRT
Beta 2 microglobulin
Rubisco
AB binding protein
Major latex protein homologue (MSG)
Array of cattle and soy spiking controls. 50 ug
of cattle brain total RNA was labeled with Cy3
(green). 1 ul each of in vitro transcribed soy
Rubisco (5 ng), AB binding protein (0.5 ng) and
MSG (0.05 ng) were labeled with Cy5. The two
labeled samples were cohybridized on superamine
slides (Telechem, Inc.). To the right of each
set of spots are five negative controls (water).
11
Fetal Spleen-Cy3
Adult Spleen-Cy5
IgM
IgM
MYLK
MYLK
IgM heavy chain
IgM heavy chain
COL1A2
COL1A2
12
GenePix Image Analysis Software
Placenta vs. Brain 3800 Cattle Placenta Array
cy3 cy5
13
(No Transcript)
14
Microarray Data Process

Experimental Design
Image Analysis raw data
Normalization clean data
Data Filtering informative data
Model building
Data Mining (clustering, pattern recognition, et
al)
Validation

15
Scatterplot of Normalized Data
Fetal
Adult
16
gt0.3
lt-0.3
17
Characteristics of Data Data can be viewed as a
NxM matrix (N gtgt M) N is the number of genes M
is the number of data points for each gene Or
Nx(MK) K is the number of Features describing
each gene(genome location, functional
description, metabolic pathway et al)
18
Model for Data Analysis

Gene Expression is a Dynamic Process
Each Microarray Experiment is a snap shot of the
process
Need basic biological knowledge to build model
For Example
Assumption In most of experiments, only a
small set of genes (100s/1000s) have been
affected significantly.

19
Need for Data Mining
Data Mining

Data volumes are too large for traditional
analysis methods
Large number of records and high dimensional
data
Only small portion of data is analyzed
Decision support process becomes more complex

Functions of Data Mining
Use the data to build predictors prediction,
classification, deviation detection,
segmentation Generates more sophisticated
summaries and reports to aid understanding of the
data find clusters, partitions in data
20
Data Mining Methods
Classification, Regression (Predictive
Modeling) Clustering (Segmentation) Association
Discovery (Summarization) Change and deviation
detection Dependency Modeling Information
Visualization
21
Clustered display of data from time course of
serum stimulation of primary human fibroblasts.
Cholesterol Biosynthesis
Cell Cycle
Immediate Early Response
Signaling and Angiogenesis
Wound Healing and Tissue Remodeling
Eisen et al. Proc. Natl. Acad. Sci. USA 95
(1998) pg 14865
22
(No Transcript)
23
(No Transcript)
24
Self Organizing Maps
25
Molecular Classification of Cancer
26
(No Transcript)
27
Gene Expression Profile of Aging and Its
Retardation by Caloric Restriction Cheol-Koo
Lee, Roger G. Klopp, Richard Weindruch, Tomas A.
Prolla
28
Expression Landscape of cell-cycle regulated
genes in yeast
29
Multi-dimension data visualization
30
(No Transcript)

Write a Comment

User Comments (0)