Title: Developing open source GIS: what are the challenges?
1Developing open source GIS what are the
challenges?
- Gilberto Câmara
- INPE Brasil
- www.terralib.org
Institute for Geoinformation TU Wien 16 June
2004
2The Promise of Open Source
- When an OSS project reaches a critical size we
obtain many benefits - Robustness
- Given enough eyeballs, all bugs are shallow.''
- Cooperation
- Somebody finds the problem and somebody else
understands it' (Linus Thorvalds) - Continuous Improvement
- Treating your users as co-developers is your
least-hassle route to rapid code improvement and
effective debugging
3Naïve view of open source projects
- Software
- Product of an individual or small group
(peer-pressure) - Based on a kernel with plausible promise
- Development network
- Large number of developers, single repository
- Open source products
- View as complex, innovative systems (Linux)
- Incentives to participate
- Operate at an individual level (self-esteem)
- Wild-west libertarian (John Waynes of the modern
era)
4Idealized model of OS software
Networks of committed individuals
5The Reality of Open Source
- Previous existence of conceptual designs of
similar products (the potential for reverse
engineering) - Design is the hardest part of software (Fred
Brooks) - Problem granularity (the potential for
distributed development) - Effective peer-production requires high
granularity
6Potential for Reverse Engineering
- Post-mature
- A private company develops a software product.
- Product becomes popular and it becomes part of
the public commons. - Others develop a public domain equivalent
(e.g.,Open Office) - Standards-led
- Standards consolidate a technology
- Allow compatible solutions to compete in the
marketplace. - SQL database standard (e.g.,mySQL and
PostgreSQL). - POSIX standard (guidance to Linux)
- OpenGIS specifications (e.g.,Degree, MapServer,
GeoServer)
7Potential for Distributed Development
- Parts of a software product
- kernel and additional functions that use it (its
periphery). - Operating systems (Linux)
- well-defined kernel for process control
- periphery consisting of programs such as device
drivers, applications, compilers and network
tools. - Database management systems
- strong kernel of highly integrated functions
(such as the parser, scheduler, and optimizer) - much smaller periphery.
8Potential for Distributed Development
- Each type of software product - periphery/kernel
ratio - constrains the potential for distributed
development - Kernel
- a tightly-organized and highly-skilled
programming team. - Periphery
- More widespread programmers of various skills
- Example
- Out of more than 400 developers, the top 15
programmers of the Apache web server contribute
88 of added lines Mockus, 2002 2293.
9Four Types of Open Source Software
- High reverse engineering, high distribution
potential - High reverse engineering, low distribution
potential - Low reverse engineering, high distribution
potential - Low reverse engineering, low distribution
potential
10Type 1 High-High
- High reverse engineering, high distribution
potential - Archetypical open source projects
- The Linux model.
- Developers
- May have a separate job
- Time allocated in agreement with their employer.
- community-led projects.
11Type 2 High-Low
- High reverse engineering, low distribution
potential - Large number of projects
- Databases, office automation tools, web services.
- Large presence of private companies
- products similar to market leaders.
- reduced risk in reverse engineering.
- main design decisions take place within the
institution - Examples
- mySQL and PostgreSQL DBMS,
- GNOME from Ximian
- corporation-led projects.
12Type 3 Low/High
- Low reverse engineering, high distribution
potential - Stable kernel, innovative periphery
- usually there is no commercial counterpart
- share a relatively simple software kernel
- Origin
- academic environments
- Examples
- GRASS GIS software and the R suite of statistical
tools. - collaborative projects
13Type 4 Low/Low
- Low reverse engineering, low distribution
potential - Innovative kernel, small periphery
- Small teams under a public RD contract
- addressing specific requirements
- aiming to demonstrate novel scientific work.
- High mortality rate
- most of them are restricted to the lifetime of a
research grant. - innovative products.
14High-Low
High-High
mySQL OpenOffice
Potential Rev Eng
Linux
PostgreSQL
perl
Apache
GRASS
Postgres
R
NCSA browser
Low-Low
Low-High
Potential Distrib Develop
15High-Low
High-High
Potential Rev Eng
corporate
communitary
innovative
collaborative
Low-Low
Low-High
Challenges?
Potential Distrib Develop
16Lessons from Open Source Projects
- It's fairly clear that one cannot code from the
ground up in bazaar style . One can test, debug
and improve in bazaar style, but it would be very
hard to originate a project in bazaar mode. Linus
didn't try it. Your nascent developer community
needs to have something runnable and testable to
play with (Eric Raymond)
17Moving from the Low-Low Quadrant
- Software in the Low-Low quadrant
- Unsustainable in the long run
- Moving from an innovative to a collaborative
project - Sharing innovation
- Transforming a crude prototype into a modular,
well designed system - How do you build innovation into a modular
design?
18Moving from the Low-Low Quadrant
- Perfection in design is achieved not when there
is nothing more to add, but rather when there is
nothing more to take away. (Saint-Exupery) - How do you achive perfection in information
science? - Good scientific foundation
- Usually, sound mathematical abstractions
- What is the situation in GIS?
19Do we have a solid foundation for GIS?
selection projection cartesian prod union
difference
id
name
year
SELECT name FROM faculty WHERE year gt 1960
relations
relational algebra
SQL query language
Operations on ST types
?
Spatio-temporal data types
Spatial algebra
GIS language
20Challenges for geoinformation
Source Gassem Asrar (NASA)
21The Road Ahead Smart Sensors
SMART DUST Autonomous sensing and communication
in a cubic millimeter
Source Univ Berkeley, SmartDust project
22Knowledge gap for spatial data
source John McDonald (MDA)
23Whats the Current Status of Open Source GIS?
- High-Low products
- Standards-based
- Spatial DBMS mySQL, PostgreSQL
- OpenGIS Web MapServer, Degree
- Low-high products
- Stable kernel, innovation at the periphery
- GRASS and R
- What about GIScience challenges?
- spatio-temporal data models, geographical
ontologies, spatial statistics and spatial
econometrics, dynamic modelling and cellular
automata, environmental modelling, neural
networks for spatial data
24TerraLib Open source GIS library
- Data management
- All of data (spatial attributes) is in database
- Functions
- Spatial statistics, Image Processing, Map Algebra
- Innovation
- Based on state-of-the-art techniques
- Same timing as similar commercial products
- Web-based co-operative development
- http//www.terralib.org
25Operational Vision of TerraLib
TerraLib ? MapObjects ArcSDE cell spaces
spatio-temporal models
26TerraLib applications
- Cadastral Mapping
- Improving urban management of large Brazilian
cities - Public Health
- Spatial statistical tools for epidemiology and
health services - Social Exclusion
- Indicators of social exclusion in inner-city
areas - Land-use change modelling
- Spatio-temporal models of deforestation in
Amazonia - Emergency action planning
- Oil refineries and pipelines (Petrobras)
27TerraCrime
28Palm-top
29Exemplos de Produtos Web
30TerraLib Structure
Java Interface
COM Interface
OGIS Services
C Interface
Functions
kernel
Spatio-Temporal Data Structures
File and DBMS Access
Visualization Controls
I/O Drivers
DBMS
External Files
31Spatio-Temporal Data Types
32Events
time
Near in space, near in time?
y
x
33Dynamical Spatial Model
f ( I (t) )
f ( I (t1) )
f ( I (t2) )
f ( I (tn ))
F
F
. .
A dynamical spatial model is a mathematical
representation of a real-world process when a
location changes in response to external forces
(Burrough)
34Spatial Simulation
Reality - Bauru in 1988
35Cell Spaces Old Wine, New Bottle
36Regression with Spatial Data Understanding
Deforestation in Amazonia
37Future Deforestation Scenarios
38Modelling anisotropic space
Spatial relations in Amazonia are not isotropic!
39Desigining for Extensibility
- Algorithms
- basic core of most successful GIS
- large number of them do not depend on some
particular implementation of a data structure - based a few fundamental semantic properties of
the structure - properties can be - for example - the ability to
get from one element of the data structure to the
next, and to compare two elements of the data
structure . - Spatial analysis algorithms
- can be abstracted away from a particular data
structure and described only in terms of their
properties.
40Same Algorithm, Different Geometries
41Generic GIS Programming
- How to decouple algorithms from data structures ?
- Idea Iterators (inteligent pointers)
- Algoritms are not classes !!
- Decide which algorithms you want parametrize
them so they work for a variety of suitable types
and data structures
Algorithms
Iterators
Geometries
42Scientific Challenges for Innovation in GIS
- How can we design an algebra for ST types?
- What are the spatial-temporal data types?
- How do we design a language for spatial
modelling? - Requires a caracterization of measurents
- Cognitively meaningful interfaces
- Representation of Space
- How do we represent anisotropic space?
- Extensibility of Models and Algorithms
- How do we design for extensibility?
43Why am I here today in TU-Wien?
- Innovation in GISystems
- Requires addressing challenges in GIScience
- Cooperation with prof. Andrew Frank
- Generic GIS Programming
- Semantics of Geographical Measurements
- Spatio-Temporal Types and Algebras
- Methods for Representation of Anisotropic Space
44Result of Sound Scientific Work
High-Low
High-High
mySQL OpenOffice
Potential Rev Eng
Linux
PostgreSQL
perl
Apache
GRASS
Postgres
R
NCSA browser
TerraLib
Low-Low
Low-High
Potential Distrib Develop
45Conclusions
- Open Source software model
- The Linux example is not applicable to all
situations - Moving from the individual level to the
organization level - Geoinformation
- Innovative open source GIS software has a large
role - Sound research is needed to support innovation
-
- Cooperation in GIScience is fundamental
- The problem is enormous...requires a combination
of RD - We are few RD groups
- Cooperation is the only way to ensure a future
for GIScience