Title: COMP60431 Machine Learning
1COMP60431 Machine Learning
- Advanced Computer Science MSc
- Lecturers
- Magnus Rattray Gavin Brown
2What is Machine Learning?
- Software that adapts to (learns from) data
- Concerned with creating and using mathematical
data structures that allow a computer to
exhibit behaviour that would normally require a
human.
3Applications
- Speech and hand-writing recognition
- Autonomous robot control
- Data mining and bioinformatics
- Playing games
- Fault detection
- Clinical diagnosis
- Spam email detection
- Inverse kinematics
- Applications are diverse, algorithms are generic.
4What will you be doing?
- Introduce the concepts and details behind various
ML methods, including how they work, and use
existing software packages to illustrate how they
are used on data. - Projects explore the field, reinvent if you
want ?
5Machine Learning Methods
- Learning from labelled data (supervised learning)
(e.g. trying to predict the weather from a
dataset of historical patterns) - Learning from unlabelled data (unsupervised
learning) (e.g. trying to identify natural
patterns in sales of books on Amazon.com) - Learning from sequential data
- (e.g. Speech recognition, DNA sequence
analysis)
6Statistical Learning
- Different Machine learning methods can be unified
within a framework of statistics - Data is considered to be from a probability
distribution. - Typically, we dont expect perfect learning but
only probably correct learning. - Statistical concepts are the key to measuring our
future expected performance. - Important
- If youre not prepared to get into a bit of maths
(linear algebra, calculus, statistics) dont take
this course.
7Example 1 Hand-written digits
- Data Greyscale images
- Task Classification (0, 1, 2, 3..9)
- Problem features
- Highly variable inputs from same class, including
some weird inputs.
8US Postal Service Digits
Methods K-Nearest Neighbour or Support Vector
Machines
9Example 2 Predicting heart disease
- -- 1. age -- 2. sex
-- 3. chest pain type (4 values)
-- 4. resting blood pressure -- 5. serum
cholestoral in mg/dl -- 6. fasting
blood sugar gt 120 mg/dl -- 7.
resting electrocardiographic results (values
0,1,2) -- 8. maximum heart rate achieved
-- 9. exercise induced angina --
10. oldpeak ST depression induced by exercise
relative to rest -- 11. the slope of the
peak exercise ST segment -- 12. number
of major vessels (0-3) colored by flourosopy
10Example 2 Predicting heart disease
(2 of full dataset shown)
11Example 2 Predicting heart disease
Heuristics that make us smart
12Example 3 DNA microarrays
- DNA from 10,000 genes attached to a glass slide
called a microarray. - Green and red labels attached to mRNA from two
different sample tissues.
13DNA microarrays
- Tasks Sample classification, gene
classification, visualisation and clustering of
genes/samples. - Problem features
- Very high-dimensional data (many features) but
relatively small number of examples (samples) - Extremely noisy data (noise signal)
- Lack of good domain knowledge
14DNA microarrays
Projection of 10,000 dimensional data onto 2D
using PCA effectively separates cancer subtypes.
15Relevant disciplines
- Algorithms
- Artificial intelligence
- Control
- Physics
- Information theory
- Dynamical systems
- Neurobiology
- Signal processing
- Statistics
- Linear algebra
- Etc, etc ..
- Researchers in ML come from a variety of
different backgrounds.
16Prerequisites
- Need Reasonable knowledge of calculus and
matrix/vector algebra. - Dont need Previous experience of Matlab
programming this will be learned during the
course.
17Module structure
- Assessed exercises (20)
- Project (30)
- January examination (50)
- Period 1 (Tuesdays)
- 28th Sept 3rd Nov
18Resources
- Well provide full slides and notes.
- If you want a book, this is a suggestion
- E. Alpaydin
- Introduction to Machine Learning
19What now ?
- Web page
- http//intranet.cs.man.ac.uk/mlo/comp60431/
- The course begins on Tuesday 29th Sept.
- If you want to take the course
- check primer tutorial on the required maths,
- practice with Matlab (tutorial on website)
20Questions?
21Example Speech recognition
- Data features from spectral analysis of speech
signals (two in this simple example). - Task Classification of vowel sounds in words of
the form h-?-d, e.g. head, hid, had etc. - Problem features
- Highly variable data with same classification
- Good feature selection is very important
- This task is a small part of a larger task
22Method Multilayer neural network