Math Models for Learning and Discovery - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Math Models for Learning and Discovery

Description:

Title: Probability Density Based Indexing for High-Dimensional Nearest Neighbor Queries Author: Kristin Bennett Last modified by: student Created Date – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 26
Provided by: Kristin250
Category:

less

Transcript and Presenter's Notes

Title: Math Models for Learning and Discovery


1
Math Models for Learning and Discovery
  • Kristin P. Bennett
  • Mathematical Sciences Department
  • Rensselaer Polytechnic Institute

2
The Learning Problem
  • The problem of understanding intelligence is said
    to be the greatest problem in science today and
    the problem for this century as deciphering
    the genetic code was for the second half of the
    last onethe problem of learning represents a
    gateway to understanding intelligence in man and
    machines.
  • -- Tomasso Poggio and Steven Smale 2003

3
What do these problems have in common?
  • Design and Discovery of Pharmaceuticals
  • Target Marketing in Business
  • Diagnosis of Breast Cancer
  • Discovery of Novel Superconductors
  • Detection of Anthrax using TZ spectroscopy
  • Modeling and predicting global trade
  • RNA Transcription

4
DRUG TRIVIA (2000 old info)
  • In USA 25B/yr for RD of pharmaceuticals (33
    clinicals)
  • Worth their weight in gold
  • 10-15 years from conception ? market for drug
  • Development cost 0.5B/drug
  • First-year sales gt 1B/drug
  • 1 drug approved/5000 compounds tested
  • 1 out of 100 drugs succeeds to market
  • 19 Alzheimers drugs in development
  • 20,000,000 Americans with Alzheimer by 2050

DDASSL
RENSSELAER
5
Drugs Worth weight in GOLD
DDASSL
RENSSELAER
6
TOWARDS TREATING THE HIV EPEDIMIC
  • HIV Reverse-Transcriptase Inhibition modeling
  • Have a few Molecules that have been tested
  • Can we predict if new molecule will inhibit HIV?

7
What do we know?
  • The bioactivities of a small set of molecules
  • Many Possible Descriptors for each molecules

  • Molecular Weight
  • Electrostatic Potential
  • Ionization Potential
  • Can we predict molecules bioactivity?

8
Database Marketing
  • Bank has 1.7 billion portfolio of home
    mortgages.
  • When customer refinances, they may lose customer.
  • Questions will a customer refinance?
  • If so, offer that customer a good deal on
    refinancing.

9
What do we know?
  • For many customers, we know if they refinanced or
    not.
  • We know attributes of customer
  • Income
  • Age
  • Residential Area
  • Payment History
  • Can we predict behavior of future customers?

10
Breast Cancer Diagnosis
  • Fine needle aspirate of breast tumor.

Is tumor benign or malignant?
11
What do we know?
  • For patients in initial study, we know whether
    tumor was benign or malignant.
  • Have a digital image of tumor aspirate.
  • Know characteristics doctors look at
  • Uniformity of cell shape
  • Uniformity of cell size
  • Cell Mitosis

12
What do we know?
  • For patients in initial study, we know whether
    tumor was benign or malignant.
  • Have a digital image of tumor aspirate.
  • Know characteristics doctors look at
  • Uniformity of cell shape
  • Uniformity of cell size
  • Cell Mitosis

13
Superconductivity
  • Superconductivity is the ability of a material to
    conduct current with no resistance and extremely
    low loss.
  • A few high temperature superconductors have been
    found.
  • What other compounds are superconductors?

14
Applications of Superconductivity
  • Magnetic Resonance Imaging

15
Applications of Superconductivity
  • Maglev Trains

16
Applications of Superconductivity
  • Very small and efficient motors
  • Better power transmission cables
  • Better cellular phone service
  • Find a cheap high-temperature superconductor
  • and you will get the NOBEL PRIZE.

17
What do we know?
  • Many compounds have been tested to see if they
    are superconductors.
  • Many descriptors exists for these compounds based
    on molecular properties.

18
What do all these problems have in common?
  • Each problem
  • Can be posed as a yes or no question.
  • Has examples known to be of the yes type or the
    no type.
  • Each example has an associated set of
    descriptors.

Learn Classification Function !
19
Data Mining
  • Each problem has data.
  • Our job is to mine information from this data.
  • Information depends on the question asked.
  • In this case we must produce a predictive yes/no
    model (a.k.a. a classification model) based on
    the data.

20
Mathematical Model
  • Have data
  • Construct predictive function
  • f(x)?y
  • Solve mathematical model to find f
  • Want f to generalize well on future data

21
Types of Learning Problems
  • Classification
  • Regression
  • Clustering
  • Ranking

22
Data Mining
  • Classification yes/no models
  • Start with examples of yes and no.
  • Associate a set of descriptors with each example.
    Descriptors must be appropriate for the
    question you are asking.
  • Construct a model to split the two sets
  • Use the model to predict new examples.

23
Learning Model
  • What kind of learning task is it?
  • What sort of f should we use?
  • Kernel function
  • What loss function to use?
  • What regularization function?
  • How can we solve this learning model?
  • How well will the model predict new points?

24
Class information
  • See course web page
  • http//www.rpi.edu/bennek/class/mmld/index.htm

25
Assignment for Friday
  • Read and be prepared to discuss
  • Chapter 1, Shaw-Taylor and Cristianini
  • Lecturer Gautam Kunapuli
Write a Comment
User Comments (0)
About PowerShow.com