Geographic Information Retrieval (GIR) - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Geographic Information Retrieval (GIR)

Description:

Geographic Information Retrieval (GIR) Ranking Methods for Digital Libraries Shorefactor = 1 abs(fraction of query region approximation that is onshore – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 2
Provided by: ray8156
Category:

less

Transcript and Presenter's Notes

Title: Geographic Information Retrieval (GIR)


1
Geographic Information Retrieval (GIR) Ranking
Methods for Digital Libraries
Ray R. Larson and Patricia Frontiera School of
Information Management Systems and College of
Environmental Design, University of California,
Berkeley -- ray_at_sims.berkeley.edu
  • The Geographic Footprint
  • In GIR applications, the geographic footprint is
    typically the only quantitative spatial
    characteristic that is encoded and utilized.
  • The Footprint is a geometric representation of
    the extent of the geographic content of the
    information object being described. Usually
    expressed in geographic coordinates (i.e.
    latitude and longitude
  • Points maintain a general sense of location but
    not extent or shape
  • Polygons identify location, extent, and shape
    with varying degree of precision
  • The minimum aligned bounding rectangle (MBR) is
    the most commonly used polygonal spatial
    representantion in GIR systems.
  • Spatial query formation
  • A spatial approach to GIR requires a geographic
    interface to support spatial thinking and query
    formation.
  • Spatial Queries key issues
  • Communicating with the user if the user selects
    a place name from a list, what type of geometric
    approximation is used to represent the query
    region (a point, a simple bounding box polygon, a
    complex polygon?)
  • Level of detail in a graphic interface needs to
    be sufficient to support geographic queries.
  • How can queries for more complex spatial
    characteristics be supported?
  • Density, dispersion, pattern
  • Spatial Query Example 1st and 2nd generation
    interfaces from the FGDC/NSDI Efforts

Geographic data are an extremely important
resource for a wide range of scientists,
planners, policy makers, and analysts who study
natural and planned environments. Notably, the
landscape of geographic analysis has been
changing rapidly from data and computation poor
to data and computation rich. Developments in
digital electronic technologies, such as
satellites, integrated GPS units, digital
cameras, and miniature sensors, are dramatically
increasing the types and amounts of digitally
available raw geographic data and derived
information products. At the same time, advances
in computer hardware, software and network
technologies continue to improve our ability to
store and analyze these large, complex data sets.
These factors are contributing to a growing
political, social, scientific and economic
awareness of the value of geographic information
and driving new applications for its use. In
response to this, geographic digital libraries
that specialize in providing access to these data
are growing in number, collection size, and
sophistication. Moreover, mainstream digital
libraries, i.e. those that deal with primarily
text materials, are increasingly considering
geographic access methods for information
resources that, while not specifically about
geographic features, have important geographic
characteristics. Simply stated, most of the
objects in digital libraries are, to a greater or
lesser extent, about or related to particular
places on or near the surface of the Earth.
Place name georeferencing is extremely
effective because names are the primary means by
which people refer to geographic locations.
However, place names have well-documented lexical
and geographical problems. Lexical problems
include lack of uniqueness, alternate names or
spellings, and name changes. Geographical
problems include boundaries that change over
time, places with ambiguous boundaries, and
geographic features or areas of interest without
known place names. Unlike place names,
geographic coordinate representations provide an
unambiguous and persistent method for locating
geographic areas or features. However, the use
of coordinates presents many challenges in terms
of storage, indexing, processing and user
interface design that only recently have begun to
be investigated in the context of geographic
information retrieval (GIR). .
  • Spatial Similarity Measures Matching and Spatial
    Ranking
  • Spatial similarity can be considered as a
    indicator of relevance documents whose spatial
    content is more similar to the spatial content of
    query will be considered more relevant to the
    information need represented by the query.
  • Need to consider both Qualitative, non-geometric
    spatial attributes and Quantitative, geometric
    spatial attributes
  • Three basic approaches to spatial similarity
    measures and ranking
  • Method 2 Topological Overlap
  • Spatial searches are constrained to only those
    candidate GIOs that either
  • are completely contained within the query region,
  • overlap with the query region,
  • or, contain the query region.
  • Each category is exclusive and all retrieved
    items are considered relevant.
  • The result set cannot be ranked
  • categorized topologoical relationship only,
  • no metric refinement
  • Method 1 Simple Overlap
  • Candidate geographic information objects (GIOs)
    that have any overlap with the query region are
    retrieved.
  • Included in the result set are any GIOs that are
    contained within, overlap, or contain the query
    region.
  • The spatial score for all GIOs is either relevant
    (1) or not relevant (0).
  • The result set cannot be ranked
  • topological relationship only, no metric
    refinement
  • Method 3 Degree of Overlap
  • Candidate geographic information objects (GIOs)
    that have any overlap with the query region are
    retrieved.
  • A spatial similarity score is determined based on
    the degree to which the candidate GIO overlaps
    with the query region.
  • The greater the overlap with respect to the query
    region, the higher the spatial similarity score.
  • This method provides a score by which the result
    set can be ranked
  • topological relationship overlap
  • metric refinement area of overlap
  • Our Approach Is a Probabilistic Estimate of
    Probability of Relevance based on Logistic
    Regression from from a sample of data with
    relevance judgements.
  • Test Data
  • 2554 metadata records indexed by 322 unique
    geographic regions (represented as MBRs) and
    associated place names.
  • 2072 records (81) indexed by 141 unique CA place
    names
  • 881 records indexed by 42 unique counties (out of
    a total of 46 unique counties indexed in CEIC
    collection)
  • 427 records indexed by 76 cities (of 120)
  • 179 records by 8 bioregions (of 9)
  • 3 records by 2 national parks (of 5)
  • 309 records by 11 national forests (of 11)
  • 3 record by 1 regional water quality control
    board region (of 1)
  • 270 records by 1 state (CA)
  • 482 records (19) indexed by 179 unique user
    defined areas (approx 240) for regions within or
    overlapping CA
  • 12 represent onshore regions (within the CA
    mainland)
  • 88 (158 of 179) offshore or coastal regions
  • Geographic Approximations for CA Counties, UDAs,
    and training sample
  • X1 area of overlap(query region, candidate GIO)
    / area of query region
  • X2 area of overlap(query region, candidate GIO)
    / area of candidate GIO 
  • X3 1 abs(fraction of overlap region that is
    onshore fraction of candidate GIO that is
    onshore)
  • Where Range for all variables is 0 (not similar)
    to 1 (same)
  • Geographic Information Retrieval (GIR)
  • Definitions Geographic information retrieval
    (GIR) is concerned with spatial approaches to the
    retrieval of geographically referenced, or
    Georeferenced information objects (GIOs)
  • Information objects that are about specific
    regions or features on or near the surface of the
    Earth.
  • Geospatial data are a special type of
    georeferenced information that encodes a specific
    geographic feature or set of features along with
    associated attributes
  • maps, air photos, satellite imagery, digital
    geographic data, etc
  • Georeferencing and GIR
  • Within a GIR system, e.g., a geographic digital
    library, information objects can be georeferenced
    by place names or by geographic coordinates (i.e.
    longitude latitude)
  • GIR is not GIS
  • GIS is concerned with spatial representations,
    relationships, and analysis at the level of the
    individual spatial object or field.
  • GIR is concerned with the retrieval of geographic
    information resources (and geographic information
    objects at the set level) that may be relevant to
    a geographic query region.
  • Spatial Approaches to GIR

Geodata.gov
NSDI Clearinghouse
  • 42 of 58 counties referenced in the test
    collection metadata
  • 10 counties randomly selected as query regions to
    train LR model
  • 32 counties used as query regions to test model

The Geodata.gov site provides better support for
a query on wetlands near Petaluma because of the
increase cartographic detail that appears as the
user zooms in. ( you cant even find Petaluma on
the NSDI site and you can get even more lost if
you zoom in further).
  • These results suggest
  • Convex Hulls perform better than MBRs
  • Expected result given that the CH is a higher
    quality approximat
  • A probabilistic ranking based on MBRs can perform
    as well if not better than a non-probabiliistic
    ranking method based on Convex Hulls
  • Since any approximation other than the MBR
    requires great expense, this suggests that the
    exploration of new ranking methods based on the
    MBR are a good way to go.

Acknowledgements
This research was sponsored at U.C. Berkeley by
the National Science Foundation and the Joint
Information Systems Committee (UK) under the
International Digital Libraries Program award
IIS-99755164. Additional Support was provided by
the Institute for Museum and Library Services as
part of the Going Places in the Catalog project.
Conservative Approximations
Write a Comment
User Comments (0)
About PowerShow.com