New Directions in Analysis and Visualization - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

New Directions in Analysis and Visualization

Description:

Research Methods Festival, St Catherine's College, Oxford. High-End Computing Terascale Resource ... How many species of water vole (Arvicola) in UK? Measurement data ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 35
Provided by: robert452
Category:

less

Transcript and Presenter's Notes

Title: New Directions in Analysis and Visualization


1
New Directions in Analysis and Visualization
Visual Analytics
  • Dr Jeremy Walton
  • NAG Ltd, Oxford
  • jeremy.walton_at_nag.co.uk

2
Overview
  • Introduction
  • NAG, HECToR
  • Visualization
  • distribution, collaboration, steering
  • Data mining
  • classification, exploratory analysis
  • The ADVISE project
  • large data, interactive analysis

3
Overview
  • Introduction
  • NAG, HECToR
  • Visualization
  • distribution, collaboration, steering
  • Data mining
  • classification, exploratory analysis
  • The ADVISE project
  • large data, interactive analysis

4
NAG profile
  • Products
  • Mathematical, statistical, data analysis
    components
  • 3D visualization, compilers tools
  • HPC software engineering services
  • HECToR support
  • Users
  • Academic researchers
  • Professional developers
  • Analysts / modelers
  • Founded 1976
  • Not-for-profit company

5
High-End Computing Terascale Resource
  • Latest high-end computing service for UK
  • funded by EPSRC, NERC BBSRC
  • will run from 2007-2013
  • Partners
  • Hardware Cray Inc
  • Service Provision University of Edinburgh HPCx
    Ltd
  • hardware hosting, user services, help desk
  • CSE Support NAG Ltd
  • technical assessment of project application
  • porting / tuning / optimisation of user codes
  • training courses (inc. visualization)
  • best practice guides, documentation, FAQs

6
Overview
  • Introduction
  • NAG, HECToR
  • Visualization
  • distribution, collaboration, steering
  • Data mining
  • classification, exploratory analysis
  • The ADVISE project
  • large data, interactive analysis

7
Visualization toolkits
  • Help construct visualization applications
  • no wheel-reinvention, stone canoes, chocolate
    teapots
  • Proprietary supported commercial systems
  • e.g. Excel, IRIS Explorer, Spotfire
  • Open source, freely available software
  • e.g. OpenDX, InfoVis

8
NAGs IRIS Explorer
  • General purpose toolkit for data visualization
  • Reusable building blocks (modules)
  • Connect modules to build application
  • Point-and-click development
  • Visual programming approach
  • Build, execute, reshape
  • Add new modules, if required

9
in action
10
Make the connections
11
Add more modules...
12
...and even more
13
Some examples
14
Trendalyzer (Gapminder)
15
Worldmapper area
16
Worldmapper deaths by disease
17
Many eyes shared visualization
18
Overview
  • Introduction
  • NAG, HECToR
  • Visualization
  • distribution, collaboration, steering
  • Data mining
  • classification, exploratory analysis
  • The ADVISE project
  • large data, interactive analysis

19
NAG Data Mining Tools
  • Data Cleaning
  • Data imputation - adding missing values
  • Outlier detection - finding suspect data records
  • Data Transformation
  • Scaling Data - before distance computation
  • Principal Component Analysis - reducing of
    variables
  • Model fitting
  • Cluster analysis - finding interesting groups
  • Classification techniques - of groups is known
  • Regression no groups - outcome is continuous
  • Linear / Non-linear / Time series

20
Example exploratory data analysis
  • How many species of water vole (Arvicola) in UK?
  • Measurement data
  • Presence / absence of 13 skull characteristics
  • 300 observations, each in one of 14 regions
  • 3 groups
  • A. terrestris / A. sapidus / unclassified UK
    cases
  • Treatment
  • Average data within each region
  • Gives 14 data points in 13 dimensions
  • How to display dataset?

21
2D scatterplots
22
Analysis
  • 2D scatterplots?
  • Structure is unclear
  • (13 x 12) / 2 78 plots needed
  • Principal components analysis?
  • 2 PCs explain 49 of the variance
  • 3 PCs explain 65 of the variance
  • Should be gt 85 for confident representation
  • Fishers iris dataset (4 variables) is 95
  • Alternative technique
  • Metric scaling

23
Metric scaling
  • 14 data points one for each region
  • Each point has values for 13 variables
  • Construct 14 by 14 dissimilarity matrix, ?
  • ?ij distance between points i j in 13D space
  • ? is symmetric, with zero diagonal elements
  • Want to find a new matrix, ?
  • set of 14 new data points in 3D space that
    preserve ?
  • Project ? to ? using metric scaling
  • Display data points in 3D

24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Exploratory data analysis conclusions
  • 2D scatterplots dont indicate group structure
  • cf. iris dataset
  • 3D PCA unreliable here
  • Metric scaling of ? used to reduce D from 13 to 3
  • 3D visualization reveals group structure
  • Distinct A. sapidus group
  • UK sample represents only A. terrestris

29
Overview
  • Introduction
  • NAG, HECToR
  • Visualization
  • distribution, collaboration, steering
  • Data mining
  • classification, exploratory analysis
  • The ADVISE project
  • large data, interactive analysis

30
The ADVISE project
  • DTI-funded research project, started March 2007
  • NAG / VSN / University of Leeds
  • Merge visualization statistics (visual
    analytics)
  • use statistics to identify key characteristics of
    dataset
  • understand the characteristics through
    visualization
  • User community
  • pharmaceuticals
  • environmental science
  • engineering
  • Initial user meeting held September 2007

31
Large datasets
  • Size matters (but isnt everything)
  • Developers viewToo large for our current
    system
  • Problems of
  • performance
  • robustness
  • Users viewToo large for me to understand
  • Current ADVISE datasets are only a few GB
  • complications (e.g comparing several) could raise
    this
  • HECToR users have TB datasets

32
ADVISE ideas
  • Retention of visual programming interface
  • Re-use of algorithmic base
  • IRIS Explorer modules
  • GenStat statistics functionality (from VSN)
  • Three layered architecture
  • User interface
  • Web service middleware
  • Visualization components
  • Distribution, tailored user interface,
    collaboration

33
ADVISE progress
  • Porting IE modules to standalone environment
  • some of these use GenStat for statistics
  • New system used to revisit air quality demo
  • early (IEEE Viz 96) web-based visualization
  • new system more efficient
  • Working with real user data

34
Conclusions
  • NAG offers software components for developers
  • no wheel-reinvention, stone canoes, chocolate
    teapots
  • Visualization data mining crucial for analysis
  • distribution, steering, classification,
    exploration
  • interactivity / interrogation important
  • integration is an ongoing field of activity
  • ADVISE project
  • developing a new system for visual analysis
  • working with real user problems
  • improving understanding of data
Write a Comment
User Comments (0)
About PowerShow.com