Title: Making robust computer vision in games
1Making robust computer vision in games
Presented by Diarmid Campbell
2Introduction
- Who I am Diarmid Campbell
- What I do Run the Vision RD group
- Where we do it Sonys London development studio
- What we do Research computer vision for camera
based games - This talk Making robust computer vision in games
3Contents
- What we do and why
- The development process
- Testing and videos
- Computer Vision Concepts
- A robust head tracker
- Marker based Augmented Reality
- The problems we faced
- A demo of EyePet
4Camera based games
- Camera mounted on the TV
- You see yourself on the TV
- Game is overlaid on top of you
5Past games on PS2
6Computer Vision is hard
Computer vision makes you want to kill
yourself -Dr Nick Lord 2009
7Why is it hard?
- Humans mange it effortlessly
- Image is a 2D array of numbers
- Take 5 images and plot them as a height map
8Pick the odd one out
9Pick the odd one out
Odd one out
10Pick the odd one out
Odd one out
11Pick the odd one out
Odd one out
12Factors affecting the pixels
- Background objects in scene
- Orientation/position of objects
- Lighting/Shadows
- Occlusion
13George is in the pixels
- Not interested in those
- George was hidden in the pixels
- Here is an image, what is it of?
- The general computer vision problem is hard
- If we constrain the problem, it is much easier
(but still hard)
14Robust Inputs
- We can use computer vision as an input mechanism
- Motion detection in EyeToy games
- Robustness is how consistently an input mechanism
does what the player is expecting - An input mechanism must be robust
15Importance of robustness
- If your fire button only worked 9 times out of
10, you would chuck your controller out.
16Importance of robustness
17Importance of robustness
- Imagine your gun is a champagne bottle
18Importance of robustness
- Each button click shakes it
- Eventually the top blows off
- The lack of robustness is hidden
19Importance of robustness
- Perhaps you need to now fight tortoises instead
of warriors
20Importance of robustness
- The mechanic is now robust
- But it is laggy and unresponsive
- Cannot rely on split-second timing
21Importance of robustness
- Illustrates a general point
- If the game copes well with non-robust inputs
- It will also cope well with someone not playing
it well - It creates a skill ceiling
- Manifests itself as lack of game-play depth
22Importance of robustness
- If you want a deep skill base game mechanics
- Robust input is essential
23The Development Process
Computer Vision Researcher
Game designer
Tell me what the game mechanic is and Ill make
you a state of the art solution
Give me something that works and Ill see what
we can make thats fun
24The chicken and the egg
- You cannot do one before the other
- Both development timelines happen in parallel
- We are still figuring it out
- Here are some guidelines
25Research timeline
Something up and running
Convinced we can create the technology
Vision tech beta before game reaches alpha
26Required infrastructure
- Prototyping environment
- Matlab
- Octave
- Be able to capture videos
- Runtime algorithms
- Open CV
- VXL
27Videos and testing
28Videos and testing
- Computer vision is hard because many variables
affect the images - The lighting
- The players clothes
- The wallpaper
- Spectators
- 3D cameras have their own pros and cons
29Representative videos
- Videos allow us to capture these variables and
test - Videos MUST be representative
- Works in 99 of cases
- Useless if that 1 appear in 50 of living rooms
- Make videos early in development
- Demo head tracker capturing
30Head detection videos
- We run it through different algorithms
- Cell SDK face detector
- Show failure modes
- When it fails we can find the frame it failed in
and debug
31Regression testing
- Automated testing
- Run through load of videos
- Compare with expected results
- Expected results could be is head visible?
32When videos arent enough
- SCEA RD labs invented the forthcoming
PlayStationMove controller - Uses a camera and other sensors to track the
controller - Videos were good early on
- But cannot change a video
- Lighting
- Backgrounds
- Camera settings
33Solution
34Reasons to buy a robot arm(as if you really need
persuading)
- Can test the same motion under many different
conditions - Can try special hard cases
35Computer Vision Concepts
36Computer Vision Concepts
- Videos tell us when it fails
- How do we fix it?
- This is the field of computer vision
- I cannot go into details of techniques
- Instead I will explain
- The common concepts
- How they link together
- This should help if you
- Read papers
- Talk to experts
37Feature extraction
- Images contain a lot of information
- This one is 900K
38Feature extraction
- Instead of using pixels directly extract high
level properties of groups of pixels - Result in less data which is more relevant to the
problem at hand
39Feature extraction
- PS3 Demo Basic image
- PS3 Demo Canny edge detector
- Invariant to lighting changes
- Store additional gradient info
- PS3 Demo Motion
- Used in all our camera games
- PS3 Demo Feature points
- Store image patch for each one
- Can match them frame to frame
40Likelihood functions
- Given that we have observed these features, what
is the probability that we are observing what we
modelled - Conditional probability
- Bayesian statistics underpins most vision
algorithm
41Cost functions Likelihood functions
- Some terminology
- Sometimes you will here about Cost functions
- They are the same concept
- Likelihood goes up with a good match
- Cost goes down
- One is (conceptually) the inverse of the other
42Cost functions
- Sum of Squared Differences (SSD)
SSD
1532
High cost bad match
SSD
12
Low cost good match
43Cost functions
- Sum of Squared Differences (SSD)
44Classifiers
- Compares observed features to a number of models
- Tells you which model fits the features best
Which model fits best
45Classifiers Face example
46Classifiers Face example
Classifier
- Classic detector (Voila-Jones)
- Models are trained on example images
47Classifiers Face example
48Detectors
- We have a model (with associated state)
- Given some observed features
- Detector returns
- Is the object present? What its state?
- Its state (X,Y position/rotation/Human pose)
49Detectors Faces again
- Viola-Jones face detector
- Scans a box over the image
- Different positions and sizes
- Runs the classifier and returns any positives
- Recall face detection demo
50Trackers
- We have a model, some observed features and the
previous state - Tracker returns the next state
51Trackers Face example
- PS3 Demo SSD tracker
- PS3 Demo Wand game
- If we move quickly the tracker gets stuck in a
local minimum
52Learning more
- Computer Vision Conferences
- ICCV
- CVPR
- ECCV
- Read papers accepted by conferences
- Get friendly with an academic
- Or hire one!
53Robust Head Tracking
54Track rotation and scale
- The SSD based tracker did not track rotation and
scale - Next iteration of tracker does
- X, Y position
- Scale
- ? in plane rotation
- PS demo Hager tracker
- (swap demo)
55Track rotation and scale
- Tracked more types of movement
- But very fragile
- Problem
- A 2D image patch is not a good model of a head
56Track rotation and scale
- Does not deal with out-of-plane rotation
57Track rotation and scale
- Even in-plane rotation is not right
58Colour histograms
- Lets move away from comparing pixels and think
about features - Consider these images of the same objects
59Colour histograms
- If we compared them pixel for pixel they would
seem very different - But look at a histogram of the colours that
appear in them and they look the same
60Colour histograms
- Histograms are a feature that throw away all
spatial information
61Where we are now
- Current system uses
- Colour histograms
- Keeps approximate spatial information
62Where we are now
- It has a foreground and a background model each
with its own histograms
63Where we are now
64Marker based Augment Reality (AR)
65Marker based AR
- Marker based AR is in a published game EyePet
66Camera setup
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
67What the player sees on the TV
Real
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
Virtual
68Marker based AR
- We shipped a magic card with the game
- Allows the players to manipulate virtual objects
in 3D
69Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
70Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
71Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
72Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
73Finding the marker
- Actually, just keep pairs of quads
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
74Finding the marker
- Take corner positions
- Calculate a 2D transform
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
75Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
76Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
77Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
78Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
79Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
80Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
81Finding the marker
- Decompose the 2D transform
- Camera projection
- Model view matrix
- Use a Kalman filter
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
82Finding the marker
Topics to discuss Camera based games What is
EyePet? Improving the tech Future research
83Problems we faced
84Picking the right threshold
- Threshold to find black and white regions
- But which one?
- Many clever solutions didnt work
- Brute force approach
- Try lots (around 60) thresholds
85Picking the right threshold
- PS3 Demo Thresholds
- PS3 Demo AR Thresholds
86Light sensitive matching
- Pattern matching used Sum of Square Differences
(SSD)
SSD 14
SSD 874
SSD 2242
- Brightness of image affected the score
87Light sensitive matching
- Use Normalised Cross Correlation (NCC) instead
SSD 0.9
SSD 0.9
SSD 0.8
88Light sensitive matching
- New way to look at images
- An image is an array of numbers
- We can list out every number and it becomes a
vector
100
10,000
100
89Light sensitive matching
- This is a co-ordinate vector in image space
- Every 100X100 image corresponds to a single
unique point in image space
90Light sensitive matching
- This is a co-ordinate vector in image space
- Every 100X100 image corresponds to a single
unique point in image space - Brightening an image corresponds to scaling the
position vector
91Light sensitive matching
- When comparing two images
- SSD corresponds to the distance between them in
image space - NCC corresponds to their angle
?
SSD
?
92Light sensitive matching
- Linear algebra is the other pillar of computer
vision - Feature extracting is just a transformation from
one space to another - Image space -gt Feature space
- Classifiers are often just planes which divide up
the space (e.g. into a region that contains faces
and a region that doesnt)
93Occlusion
- It is easy to occlude the marker with your fingers
94Occlusion
- Put big red handle on and instruct the player to
hold it - Also put handle on the back
95Occlusion Another approach(still in research
phase)
- Edge based tracking
- Uses AR Marker to initialise
- Then tracks using edge features
- PS3Demo
- (load EyePet)
96False positives
- When not occluded, we find the marker (almost)
all the time - Our home videos showed this
- False positives were a problem
- Not represented in our videos
- Added some Hollywood films to the video tests
- We knew that no markers were present
97False positives
- Saved out all spurious frames
98False positives
- Made a number of tweaks to algorithm
- E.g. Pattern matching whole marker, not just the
centre pattern - 20 times less false detections
99EyePet Demo
100EyePet Demo
- Use motion detection for normal interaction
- Call
- Jump
- Stroke
- Use AR card for health monitor
- Screen-facing case
- Needs stimulation
- Trampoline
- Finally
- Give him a shower
101Summary
- What we do and why
- The development process
- Testing and videos
- Computer Vision Concepts
- A robust head tracker
- Marker based Augmented Reality
- The problems we faced
- A demo of EyePet
102The End(please fill out your questionnaires)