Title: Deep Learning for Big Data
1Deep Learning for Big Data
- P. Baldi
- University of California, Irvine
Department of Computer Science Institute for
Genomics and Bioinformatics Center for Machine
Learning and Intelligent Systems
2Intelligence in Brains and Machines
3Intelligence in Brains and Machines
LEARNING
4Intelligence in Brains and Machines
LEARNING DEEP LEARNING
5Cutting Edge of Machine Learning Deep Learning
in Neural Networks
- Engineering applications
- Computer vision
- Speech recognition
- Natural Language Understanding
- Robotics
6Computer Vision - Image Classification
-
- Imagenet
- Over 1 million images, 1000 classes, different
sizes, avg 482x415, color - 16.42 Deep CNN dropout in 2012
- 6.66 22 layer CNN (GoogLeNet) in 2014
- 4.9 (Google, Microsoft) super-human performance
in 2015
Sources Krizhevsky et al ImageNet
Classification with Deep Convolutional Neural
Networks, Lee et al Deeply supervised nets 2014,
Szegedy et al, Going Deeper with convolutions,
ILSVRC2014, Sanchez Perronnin CVPR 2011,
http//www.clarifai.com/ Benenson,
http//rodrigob.github.io/are_we_there_yet/build/c
lassification_datasets_results.html
7Deep Learning Applications
- Engineering
- Computer Vision (e.g. image classification,
segmentation) - Speech Recognition
- Natural Language Processing (e.g. sentiment
analysis, translation) - Science
- Biology (e.g. protein structure prediction,
analysis of genomic data) - Chemistry (e.g. predicting chemical reactions)
- Physics (e.g. detecting exotic particles)
- and many more
8Deep Learning in Biology Mining Omic Data
9Deep Learning in Biology Mining Omic Data
Solved
C. Magnan and P. Baldi. Sspro/ACCpro 5.0 Almost
Perfect Prediction of Protein Secondary Structure
and Relative Solvent Accessibility. Problem
Solved? Bioinformatics, (advance access June 18),
(2014).
10Deep Learning in Biology Mining Omic Data
C. Magnan and P. Baldi. Sspro/ACCpro 5.0 Almost
Perfect Prediction of Protein Secondary Structure
and Relative Solvent Accessibility. Problem
Solved? Bioinformatics, (advance access June 18),
(2014).
11Deep Learning in Biology Mining Omic Data
C. Magnan and P. Baldi. Sspro/ACCpro 5.0 Almost
Perfect Prediction of Protein Secondary Structure
and Relative Solvent Accessibility. Problem
Solved? Bioinformatics, (advance access June 18),
(2014).
12P. Di Lena, K. Nagata, and P. Baldi. Deep
Architectures for Protein Contact Map
Prediction. Bioinformatics, 28, 2449-2457, (2012)
Deep Learning
13Deep Learning Chemical Reactions
RCHCH2 HBr ? RCH(Br)CH3
14Deep Learning Chemical Reaction ReactionPredictor
M. Kayala, C. Azencott, J. Chen, and P. Baldi.
Learning to Predict Chemical Reactions. Journal
of Chemical Information and Modeling, 51, 9,
22092222, (2011).
M. Kayala and P. Baldi. ReactionPredictor
Prediction of Complex Chemical Reactions at the
Mechanistic Level Using Machine Learning. Journal
of Chemical Information and Modeling, 52, 10,
25262540, (2012).
15Deep Learning in Physics Searching for Exotic
Particles
16(No Transcript)
17Daniel Whiteson
Peter Sadowski
18Higgs Boson Detection
- Deep network improves AUC by 8
Nature Communications, July 2014
BDT Boosted Decision Trees in TMVA package
19THANK YOU
20Deep Learning in Chemistry Predicting Chemical
Reactions
RCHCH2 HBr ? RCH(Br)CH3
- Many important applications (e.g. synthesis,
retrosynthesis, reaction discovery) - Two different approaches
- Write a system of rules
- Learn the rules from big data
-
21Writing a System of Rules Reaction Explorer
J. Chen and P. Baldi. No Electron Left-Behind a
Rule-Based Expert System to Predict Chemical
Reactions and Reaction Mechanisms. Journal of
Chemical Information and Modeling, 49, 9,
2034-2043, (2009).
- ReactionExplorer System has about 1800 rules
- Covers undergraduate organic chemistry curriculum
- Interactive educational system
- Licensed by Wiley from ReactionExplorer and
distributed world-wide -
-
Jonathan Chen
22Problems
- Very tedious
- Non-scalable
- Limited coverage (undergraduate)
-
-