CSE 591: Machine learning and Applications - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

CSE 591: Machine learning and Applications

Description:

When does a customer buy, what does he buy, how often he pays on time, etc ... Intuition: how does your brain store these pictures? Model selection ... – PowerPoint PPT presentation

Number of Views:209
Avg rating:3.0/5.0
Slides: 31
Provided by: jiep
Category:

less

Transcript and Presenter's Notes

Title: CSE 591: Machine learning and Applications


1
CSE 591 Machine learning and Applications
  • Jieping Ye
  • Department of Computer Science Engineering
  • Arizona State University

2
Brief Introduction
  • Dr. Jieping Ye
  • Assistant Professor at CSE Dept.
  • Affiliated with the Center for Evolutionary
    Functional Genomics at the Biodesign Institute
  • Research interests machine learning, data mining
    and their applications to bioinformatics
  • Dimensionality reduction
  • Semi-supervised learning
  • Kernel learning
  • Biological image analysis

3
Outline of lecture
  • Course information
  • Project
  • Introduction to ML
  • Course schedule
  • Survey

4
Course Information
  • Instructor Dr. Jieping Ye
  • Office BY 568
  • Phone 727-7451
  • Email jieping.ye_at_asu.edu
  • Web http//www.public.asu.edu/jye02/CLASSES/Spri
    ng-2007/
  • Time TTh 440am555pm
  • Office hours TTh 1000 am -- 1145 am
  • Location BYAC 270
  • TA Jianhui Chen
  • Office hours 330 pm 430 pm, Th

5
Course information (Contd)
  • Prerequisite Basics of linear algebra, a,
    algorithm design and analysis.
  • Course textbook No textbook is required. (Papers
    and other materials are available at the class
    web page)
  • Objective An in-depth understanding of some of
    the important machine learning methods and their
    applications in bioinformatics and other domains.
  • Topics Clustering, regression, classification,
    semi-supervised learning, feature reduction,
    manifold learning, ranking, and kernel learning.

6
Reference books
  • Pattern Classification. Duda, et al. , 2000.
  • The Elements of Statistical Learning Data
    Mining, Inference, and Prediction. Hastie, et
    al., 2001.
  • Kernel Methods in Computational Biology.
    Scholkopf, et al., editors. 2004.
  • Kernel Methods for Pattern Analysis. Taylor and
    Cristianini, 2004.
  • Introduction to Data Mining. Tan, et al., 2005.

7
Grading
  • Homework (3) 30
  • Project 40. Two to three students form a group
    to carry out a small research project.
  • A survey of the state-of-art in an area related
    to this course
  • Machine learning techniques for specific
    applications
  • A comparative study of several well-known
    algorithms.
  • Design of a novel algorithm related to this
    course.
  • Exam (1) 20. There will be one open-book exam
    on 3/22/07.
  • Class participation 10. Students are required
    to attend the lecture and participate in the
    class discussion.
  • A 90100, A- 8589, B 8084, B 7079, C
    6070

8
Project
  • Project proposal is due on 2/08/07
  • One half to one page
  • Topics, references, and plan
  • The intermediate project report is due on 4/05/07
  • Five to ten pages
  • The final project report is due on 4/26/07
  • Fifteen to twenty pages
  • Project presentation
  • About 5 minutes

9
Programming languages
  • Matlab
  • Tutorials
  • http//www.math.ufl.edu/help/matlab-tutorial/
  • http//www.math.mtu.edu/msgocken/intro/node1.html
  • R (Statistics)
  • http//www.r-project.org/
  • Or other languages

10
What is machine learning?
  • Machine learning is the study of computer systems
    that improve their performance through
    experience.
  • Learn existing and known structures and rules.
  • Discover new findings and structures.
  • Face recognition
  • Bioinformatics
  • Supervised learning vs. unsupervised learning
  • Semi-supervised learning

11
Machine learning versus data mining
  • A lot of common topics
  • Clustering
  • Classification
  • Many others
  • Different focuses
  • ML focuses more on theory (statistics)
  • DM focuses more on applications

12
Clustering
  • Finding groups of objects such that the objects
    in a group will be similar (or related) to one
    another and different from (or unrelated to) the
    objects in other groups

13
Applications of Cluster Analysis
  • Understanding
  • Group genes and proteins that have similar
    functionality, or group stocks with similar price
    fluctuations
  • Summarization
  • Reduce the size of large data sets

Clustering precipitation in Australia
14
Classification Definition
  • Given a collection of records (training set )
  • Each record contains a set of attributes, one of
    the attributes is the class.
  • Find a model for class attribute as a function
    of the values of other attributes.
  • Goal previously unseen records should be
    assigned a class as accurately as possible.
  • A test set is used to determine the accuracy of
    the model. Usually, the given data set is divided
    into training and test sets, with training set
    used to build the model and test set used to
    validate it.

15
Classification Example
categorical
categorical
continuous
class
Learn Classifier
Training Set
16
Classification Application
  • Fraud Detection
  • Goal Predict fraudulent cases in credit card
    transactions.
  • Approach
  • Use credit card transactions and the information
    on its account-holder as attributes.
  • When does a customer buy, what does he buy, how
    often he pays on time, etc
  • Label past transactions as fraud or fair
    transactions. This forms the class attribute.
  • Learn a model for the class of the transactions.
  • Use this model to detect fraud by observing
    credit card transactions on an account.

17
Character Recognition
  • Given a digit representation.
  • What is its class?
  • ATT have used
  • Neural Networks
  • Support Vector Machines
  • Error rates 1.4
  • Inputs are 28x28 greyscale images.

18
Other applications
  • Face recognition
  • Protein function prediction
  • Cancer detection
  • Document categorization

19
Data representation
  • Traditional algorithms work on vectors.
  • Images can be represented as matrices or vectors.
  • Abstract data
  • Graphs
  • Sequences
  • 3D structures

20
Kernel Methods Basic ideas
21
Applications in bioinformatics
  • Protein sequence
  • Protein structure

22
Data integration
mRNA expression data
hydrophobicity data
protein-protein interaction data
sequence data (gene, protein)
Genome-wide data
23
Curse of dimensionality
  • Large sample size is required for
    high-dimensional data.
  • Query accuracy and efficiency degrade rapidly as
    the dimension increases.
  • Strategies
  • Feature reduction
  • Feature selection
  • Manifold learning
  • Kernel learning

24
Manifold learning
  • A manifold is a topological space which is
    locally Euclidean.

25
Intuition how does your brain store these
pictures?
26
Model selection
  • Choose the best model from a set of different
    models to fit to the data
  • Support Vector Machines (SVM), Linear
    Discriminant Analysis (LDA)
  • Models are specified by certain parameters.
  • How to choose the best parameters?
  • Cross-validation (leave one out, k-fold CV)

27
Machine learning applications
  • Bioinformatics Hugh amount of biological data
    from the human genome project and human
    proteomics initiative.
  • Goal Understanding of biological systems at the
    molecular level from diverse sources of
    biological data.
  • Challenge Scalability, multiple sources,
    abstract data.
  • Applications Microarray data analysis, Protein
    classification, Mass spectrometry data analysis,
    Protein-protein interaction.
  • Others Computer vision, information retrieval,
    image processing, text mining, web mining, etc.

28
Course schedule
29
Survey
  • Why are you taking this course?
  • What would you like to gain from this course?
  • What topics are you most interested in learning
    about from this course?
  • Any other suggestions?

30
Next class
  • Topics
  • Basics of linear algebra
  • Basics of probability
  • Readings (available at the class webpage)
  • Mini tutorial on the Singular Value Decomposition
Write a Comment
User Comments (0)
About PowerShow.com