Title: Tony Rees
1C-squares - a new approach to representing,
querying, displaying and exchanging dataset
spatial extents at the metadata level
Tony Rees Divisional Data Centre CSIRO Marine
Research, Australia (Tony.Rees_at_csiro.au)
2Talk Outline
- Introduce myself, my agency, our approach to data
and metadata - Review characteristics of metadata, and current
handling of spatial extents in metadata records - Describe limitations of bounding rectangles
representation for non-rectangular / patchy data - The C-squares approach
- Current c-squares resources / future possibilities
3Acknowledgements ...
- CMR staff and colleagues in Australia, Europe and
USA for helpful discussions - WMO and Australian Blue Pages for nomenclature
for the squares and their subdivisions - Miroslaw Ryba (CMR) for programming used in the
c-squares mapper and search interface - David Hastings / NOAA GLOBE Task Team and CSIRO
Atmospheric Research for images used as base maps - Doug Nebert / FGDC for hosting my US visit and
interest in the system
4Author/Agency Background
- From CSIRO Marine Research in Australia (located
in Hobart, Tasmania, 2 other locations c. 300
staff)
5CMRs Data and Metadata Storage- similar to many
other agencies ...
6Metadata functions
- Dataset discovery - by providing a filtered
subset of all possible records (according to
user-specified criteria)
- Dataset description - permits a degree of
resource appraisal (will this data be what I
need?) - Dataset surrogate - may enable some questions to
be answered, and/or statistics compiled, without
need to access the actual data - Should also provide access route to the data if
required (online link or contact point) - C-squares assists each of the first three
points above.
7Bounding Rectangles Representation- and
overlapping rectangles search method
- Current metadata systems hold a bounding
rectangle (bounding box) for each dataset (N, S,
E, W bounding coordinates)
- Spatial searching is carried out by an
overlapping rectangles test
cases (1) and (2) include the tacit assumption
that the data rectangle is actually filled with
data all overlaps with the data rectangle are
inferred to be overlaps with the actual data.
8The California Problem
- The State of California is a classic (previously
cited) case where the bounding rectangle is a
poor fit to the real spatial extent ...
search regions in Nevada, a little of
Arizona, plus offshore Pacific Ocean will all
intersect this data rectangle (false hits)
9False Hits from Overlapping Rectangles Searches
Potential problems can be deconstructed into 3
contributing ones ... (a) Filled polygons, but a
poor fit to their bounding rectangle
(b) Multiple discrete polygons
(c) Incompletely filled polygons
10Consequences of False Hits ...
- Can get nonsensical results (sea ice at the
Equator, marine species in the desert)
- Time / effort wasted accessing inappropriate
datasets - Cannot use resultsets quantitatively, e.g.
- how many records / species occur in this defined
region - compare content of one defined region with
another - sum the results of consecutive searches
- etc.
11Authors Agency Data (typical)
12C-squares approach
- gives flexibility to represent a variety of
dataset shapes, also patchiness (gaps in data
coverage)
13Highlighted Squares
can be expressed as a set of codes (labels) in
an ASCII string, e.g. code1 code2
code5 code7 code13 code14 code15
code21 (etc.)
- List of codes is potentially more succinct
(concise) than original data - codes potentially terse in themselves
- multiple points in single square only coded once
- empty cells not coded
- Now has capability for increased precision of
querying (on individual square, not bounding
rectangle)
14What Notation to Use?( choosing a taxonomy of
space)
- Available coding systems (global grids)
- Lat/long-based systems
- 10 x 10 degree squares (WMO squares, Marsden
Squares) - 6 x 4 degree squares (International Map of the
World) - 2 x 1 degree squares (Maidenhead locators)
- Equal-area systems
- UTM grids
- other National or local grids (e.g. US, UK
national systems local mapsheet refs) - commercial products (e.g. Go2, MapPlanet)
- Duttons Quaternary Triangular Mesh (basis for
MS Encarta) - ...Other numeric systems (e.g. postcodes,
numbered features or zones) - unsuitable because
of local usage only, and/or lack of scalability
15Basis for C-squares Codes ...
- WMO (World Meteorological Organization) 10 x 10
degree squares chosen as starting point for codes
- Subsequent subdivisions are base 10 (with
intermediate base 2 divisions embedded), for
compatibility with decimal degrees - Name C-squares (Concise Spatial Query and
Representation System) - any square (at any resolution) encoded according
to this method can also be termed a c-square.
16WMO 10 x 10 degree squares - Numbering Principle
180W
180E
0E/W
90N
90N
1817
NW (7xxx)
NE (1xxx)
Equator
Equator
SE (3xxx)
SW (5xxx)
90S
90S
17WMO 10 x 10 degree squares in practice(examples)
(Maps courtesy R. Curry/WHOI)
18Basis for Recursive Subdivision(e.g. in NW
global quadrant)
(Principle as used in Australian Blue Pages
metadata system, 1996)
- 10 x 10 deg. square - e.g. 7307
- divided as follows (Blue Pages nomenclature)
- 73074 (5 x 5 deg. square)
- 7307487 (1 x 1 deg. square)
- C-squares then extends this principle
recursively, e.g. ... - 73074873 (0.5 x 0.5 deg. square)
- 7307487393 (0.1 x 0.1 deg. square)
- etc.
(NB, arrangement is mirror image across 0º
latitude and 0º longitude 100 is always closest
to the global origin, 499 is furthest away)
19Actual Size Examples 10 x 10, 5 x 5 degree
squares
20Actual Size Examples 5 x 5, 1 x 1 degree
squares(1 x 1 degree squares are approx. 110 x
70 km)
follows template
73074
7307487 bounded by 38º N ( 7307487 ) and 77º
W ( 7307487 ) 7307487393 would be
bounded by 38.9º N ( 7307487393 ) and 77.3º W
( 7307487393 )
21Actual Size Examples 0.1 x 0.1 degree
squares(approx. 11 x 7 km)
7307496 (part)
7307497 (part)
39.1
39.0
7307486 (part)
7307487 (part)
38.9
follows template
38.8
77.0
77.1
77.2
77.3
77.4
76.9
76.8
22Efficiency via Data Reduction Available ...
- Global coverage requires up to ...
- 648 10 x 10 degree squares
- 64,800 1 x 1 degree squares
- 259,200 0.5 x 0.5 degree squares
- To reduce the number of codes required to
represent large areas without compromising
resolution, a wildcard notation is permitted,
e.g. - 3414 to indicate 34141 through 34144 (4
codes) - 3414 to indicate 3414100 through 3414499
(100 codes) - 3414 to indicate 34141001 through
34144994 (400 codes) - (etc.)
- Result is similar to a quadtree approach (only
subdivide as far as necessary, to match varying
levels of detail required)
23Real-world c-squares implementation (example 1)
24Real-world c-squares implementation (example 2)
603 squares, at 0.1 deg. resolution 7838
characters / 8 Kb
25Encode - Decode methods
- Encoders currently available (3 versions)
- original at CSIRO Marine Research (Oracle PL/SQL)
- another in use at OBIS, USA (Java)
- another at FishBase, ICLARM (ColdFusion)
- source code for all three available via
c-squares website - (all these are for encoding point data)
- Decoding - not needed for searching (see
following slide), or for mapping if the c-squares
mapper is invoked (mapper does the decoding) - otherwise, is a very simple algorithm if needed
(or can do by inspection!)
26C -squares search mechanism (behind-the-scenes)
- Look for a text match between search dataset
extent (expressed as c-square/s) and c-squares
string for any dataset, e.g. - does 3111499 (or 31114, or 3111) appear
anywhere in the string
301349731114683111478311147931114883111
48931114993112122311212331121313112132
(etc.)
- Advantage 1 needs no special, vector-based
searching overhead ( simple text search) - Advantage 2 nested nomenclature means that
searching can be carried out at any level of the
hierarchy equal to, or greater than, the encoded
resolution - Advantage 3 search precision is now potentially
to the level of an individual c-square (much
better than bounding rectangle).
27C -squares search interface(example from CMRs
MarLIN metadata system)
- Point-and-click user interface, e.g.
28C-squares Search Result
29View Metadata Record (initial portion) ...
30C-squares Search Result (continued)
- If no c-squares string held, defaults to standard
bounding rectangles search, returned as
possible match, e.g.
(this way, c-squares and non- c-squares
enabled records can co-exist in the same metadata
repository or in distributed searches)
31C-squares as Explicit Spatial Extent Code/s
- C-squares can also be quoted explicitly in
metadata records, or any other web document
referring to a point or region
32 Can Then Utilize Capabilities of a Standard
Internet Search Engine, e.g.
33C-squares applicable to a Variety of Data Types,
e.g.
34Pause to Take Stock ...
- Light, portable, metadata-friendly system for
describing a wide variety of dataset footprint
types - Could be expressed as an XML element (e.g.
ltcsquaresgt lt/csquaresgt) - Codes can be easily derived from lats/longs in
decimal degrees (and vice versa) - Can be used for visualization of dataset spatial
extents via web link to the c-squares mapper (or
similar) - Amenable to text searching via current text / web
search technology - no additional hardware or
software overhead needed - Improves reliability of search resultsets, fewer
or no false hits (results suitable for
quantitative analysis) - Could provide an interoperable nomenclature for
previously binned data (e.g. into 0.1 x 0.1
degree cells, etc.)
35C-squares Potential Uses ...
36C-squares Potential Uses - continued
spatially enabled web pages ?? - (like
dot.geo concept, but requiring no
administrative / hardware overhead)
37Strengths / Weaknesses ...
- Strengths ...
- C-squares is a concise and flexible method of
encoding simple to moderately complex forms - Encoding/decoding is easy and follows previously
documented methods also directly related to lats
and longs in decimal degrees - Spatial searching is a standard text string
matching operation - already supported by most
database search applications (and web search
engines) - C-squares mapper utility available via simple
web call - Can be used as adjunct to bounding coordinates
searches - No proprietary software or hardware required to
implement the system - Potentially globally applicable and
interoperable equally suitable to marine and
terrestrial data.
38Strengths / Weaknesses ...
- Weaknesses
- WMO square nomenclature (and subdivisions) are
only one of several available (competing?)
taxonomies of space - further effort may be
needed to promote it as a common/interoperable
solution - C-squares is not an equal-area system - not
amenable to rapid computation of areas or
distances - Coding is inefficient near the poles (needs
larger number of codes for same size areas) - Strings can become quite long for large, complex
regions (e.g. Pacific Ocean) - need to be able
to incorporate data reduction using wildcard
method - Encoding algorithms not yet developed for line/
polygon vector data, only for points - Method can be ambiguous at boundaries of natural
features or administrative areas (since these
will not always coincide neatly with c-square
boundaries).
39Resources Currently Available
- C-squares website www.marine.csiro.au/csquares/ -
includes - C-squares draft specification and general
background - Sample code for lat/long to c-squares conversion
- On-line lat/long to c-squares converter
- How to link to the c-squares mapper
- Sample presentations, and links to c-squares
enabled metadata records
- Abstracts, presentations from 2 conferences (May,
November 2002)
- Paper describing c-squares submitted for
publication in Oceanography, late 2002
(anticipated publication date March 2003)
40Some Questions to Consider ...
- Does the system have value in the context of the
present audiences needs? - Who would be potential users?
- What mechanisms could / should be utilized to
promote it?
- Who might have an interest in further concept /
system development, if needed? - Is there a place for c-squares in formal metadata
standards?