Spreadsheet structure inspection using low level access and visualisation - PowerPoint PPT Presentation

About This Presentation
Title:

Spreadsheet structure inspection using low level access and visualisation

Description:

Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 16
Provided by: Daniel1372
Category:

less

Transcript and Presenter's Notes

Title: Spreadsheet structure inspection using low level access and visualisation


1
Spreadsheet structure inspection using low level
access and visualisation
  • Daniel Ballinger, Robert Biddle and James Noble
  • School of Mathematical and Computing Sciences
  • Email db, robert, kjx_at_mcs.vuw.ac.nz
  • http//www.mcs.vuw.ac.nz/db/honours.html

2
Motivation
  • Spreadsheets are a common form of end-user
    programming.
  • Unfamiliar spreadsheets can contain daunting
    amounts of information in the layout and
    inter-cell dependencies.
  • Methods for studying these structures are usually
    limited to what the application provides.
  • It is difficult to get a global sense of the
    structure of an individual formula that may have
    dependencies spread out all over the spreadsheet
    table. Users have to track down individual cell
    dependencies one by one, tacking back and fourth
    all over the spreadsheet.
  • - Bonnie Nardi, A Small Matter of Programming
    (1993)

3
Our Proposed Solution
  • Working outside the spreadsheet application
    allows for greater flexibility in addressing
    issues.
  • The flexibility allows for visualisations to aid
    in end-user understanding of spreadsheets beyond
    what the application is capable of.
  • A set of visualisations as interface enhancements
    that allow the user to progress from an abstract
    levels towards actual details present.
  • The greater flexibility is a trade-off with
    direct interaction.
  • We focused on Microsoft Excel due to its large
    market share.

4
Spreadsheet elements requiring extraction
  • Artefacts of interest are derived from low-level
    structures.
  • The basic unit of interest is any occupied cell.
  • Each occupied cell will have a value, and
    optionally a formula.
  • Formula should be in the same format shown to the
    user (not RPN).
  • Building our own BIFF reader would be a sizeable
    project in itself. Hence we use third party
    software.

5
IBM alphaWorks ExcelAccessor
  • A Java Bean to access and modify to contents of
    spreadsheets using a Windows DLL.
  • Requires using Windows native code and Excel to
    be installed. Limiting portability.
  • Excellent ability to extract all details
    correctly.
  • Prone to irregular crashes on larger corpus tasks.

6
Andy Khans JExcelAPI
  • Pure Java integrates better with toolkit.
  • Open source allows for easier expansion and bug
    fixes.
  • Some element types produce problems with
    extraction. E.g. Array functions, intersections,
    absolute references becoming relative, and
    earlier BIFF formats.

7
Spreadsheet Application Toolkit
Query
Gobbler
Google
  • Find and store spreadsheets from the Internet.
  • Extract low level structures. E.g. Cell values
    and formulas.
  • Analyse spreadsheet structures. Either individual
    or corpus.
  • Conveying the findings through visualisation.

URLs
Fetcher
Web Servers
XLS files
Extractor
BIFF Reader
Toolkit Files
Algorithms
Analyser
Processed Data
Metrics
Visualisation Tools
Image
8
Aspects of spreadsheet structure and use
  • The spreadsheet paradigm has two main
    characteristics
  • The spatial relationships between cells
  • The logical relationships created by formula
  • These characteristics are not always disjoint.
  • User problems are mapped onto a 2D table that
    shields them from low-level details of
    programming and allows for more natural
    expression of many problems.

9
Spreadsheet layout Real-estate Utilisation 2D
  • Understanding layout is an important first step
    in learning about a new spreadsheet.
  • Actual values and formulas are only shown as
    occupied cells.
  • The visualisation layout mimics that of Excel,
    with columns along the top of the x-axis and rows
    running down the y-axis.
  • Cells with a higher occupancy level are coloured
    towards the red end of the colour spectrum.

10
Spreadsheet layout Real-estate Utilisation 3D
  • Occupancy data is projected into 3D to create a
    surface map.
  • Discrete to continuous data transformation helps
    smooth the effects of spikes.
  • Coloured to give a Topographical terrain effect.
  • Full benefit is seen with user interaction.

11
Formula Inspection Data Flow
  • Visualising formula components that are
    extracted.
  • Fully trace worksheets in one view.
  • Single Cell, Range, Union, and Intersection.

Basic Referencing Components
12
Formula in a summation example
  • Visualising formula in a fairly common summation
    example
  • The relative complexity of the bottom-right
    formula is clear from the larger circle.

Igarashis Static Global View
13
Formula Inspection - Dependency Types
Row Absolute
  • Excel allows for combinations of relative and
    absolute positioning.
  • Understanding the referencing type is important
    when replicating formula and identifying regular
    patterns.

Fully Absolute
Column Absolute
Relative (default)
14
Related Work
  • Takeo Igarashi
  • Spreadsheets augment a visible tabular layout
    with invisible formulas.
  • Created visualisations to help reveal the hidden
    dataflow graphs and superficial tabular layouts
    of spreadsheets.
  • Markus Clermont
  • Most end-users are not trained programmers.
  • Many spreadsheets exist beyond being simple
    scratch pads.
  • Raymond Panko
  • Studies of empirical data into spreadsheet
    errors.
  • Found error rates can be disturbingly high.
  • Errors attributed to over confidence and lack of
    formal checking.
  • Margaret Burnett
  • The importance of scalability in visualisations.
  • Forms/3 and an embedded testing methodology.

15
Summary and Future Work
  • We created a Java toolkit to extract artefacts
    from spreadsheets and then convert the basic
    information into visualisations.
  • These visualisations are used to augment the
    information provided by Excel in helping users
    understand spreadsheets.
  • Future work will include detailed user studies
    and corpus analysis to find larger patterns.
  • We must also address visualisation scalability
    for larger, more complex, spreadsheets.

http//www.mcs.vuw.ac.nz/db/honours.html
Write a Comment
User Comments (0)
About PowerShow.com