Databases and Database Design - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Databases and Database Design

Description:

'Software that controls the organization, storage, retrieval, security and ... but slightly different) char, varchar, number, float, text, BLOB (image), etc. ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 33

Provided by: ggr28

Category:

more less

Transcript and Presenter's Notes

Title: Databases and Database Design

1
Databases andDatabase Design

Overview of key relational database concepts and
how they relate to GIS.

2
Definitions

Database Management Systems (DBMS)
Software that controls the organization,
storage, retrieval, security and integrity of
data in a database. It accepts requests from the
application and instructs the operating system to
transfer the appropriate data. (Computer Desktop
Encyclopedia, 2007)
DBMS may work with traditional programming
languages (COBOL, C, etc.) or they may include
their own programming language for application
development
Key features Data Security Data Integrity
Interactive Query Interactive Data Manipulation
Data Independence
May be Network, Hierarchical, Object, Relational.
Relational is by far the most commonly-used and
well-established, and handles most real-world
data management problems very well.

3
Definitions

Relational Database Management Systems (RDBMS)
A database that maintains a set of separate,
related files (tables), but combines data
elements from the files for queries and reports
when required. The concept was developed in 1970
by Edgar Codd, whose objective was to accommodate
a user's ad hoc request for selected data. See
Codd Article for details. (Computer Desktop
Encyclopedia, 2007)
Data stored in separate tables, each containing
tabular data like a spreadsheet, joined together
as needed.
In the early days of RDBMS, many vendors claimed
to offer relational databases when they did not
Codd came up with 12 rules defining the
requirements for a database system to be truly
relational.
Concept of data normalization is key to
understanding how relational databases are
designed.

4
DBMS Perspective
5
Tables

Relational tables have these properties
Values Are Atomic
Column Values Are of the Same Kind
Each Row is Unique
The Sequence of Columns is Insignificant
The Sequence of Rows is Insignificant
Each column must have a unique name (within the
table)
Represents a single entity.

6
Columns

Table columns have these properties
They have a data type (similar to variables, but
slightly different) char, varchar, number,
float, text, BLOB (image), etc.
Column values should be independent from each
other.
Values may be required (not null) or nullable.
(Some databases differentiate between null and
empty).
Columns may be indexed to improve access speed.
A column may be used as the basis of the order of
the physical data (clustered index).

7
Normalization

The most elementary structure for a database is a
single table, which corresponds to a spreadsheet
or FeatureClass.
In this model there is one kind of entity
(consisting of rows) which has a uniform set of
attributes (consisting of columns).
Suppose our database is being used to store
information on land parcels. A basic
non-normalized structure might look something
like see Access sample.

8
Normalization

While this might adequately capture the data we
need, it has some drawbacks
There will be lots of redundant data if many
parcels are added. The township, county, and
owner names, for example, will all be replicated
for each row where these values are the same.
What happens if a single parcel is zoned for
multiple uses? Or if one parcel has several
owners? One solution would be to replicate the
parcel row and then put in some kind of indicator
that this is referring to the same parcel, but a
different zoning or owner.
How do you know for sure which parcel is which?
One solution would be to ensure that each
description is unique.
Update Access sample extra columns

9
Normalization

Even though this solves some of the initial
problems, it is obviously still very limited
why only three coverage types, for example? How
would one manage all the redundant data? What if
we wanted to store addresses for each owner? The
table would quickly become huge and unusable.
The solution is to begin normalizing the data.
You can look at this as a three-step process
implementing each of the Normal Forms.
First normal form rules
A row of data cannot contain repeating groups of
similar data (atomicity) and
Each row of data must have a unique identifier or
primary key.
Update Access sample remove redundant columns

10
Normalization

Removing columns solves the first problem, but,
Description no longer uniquely identifies each
row. What identifies each row now? It is a
combination of the parcel description and all the
other columns that used to be redundant. Each
row now represents a unique Parcel / Owner / Land
Cover / Zoning type.
We are now ready to apply the second normal form
rule, which states that No attribute can be
dependent on only a portion of the primary key.
Every column must depend only on the entire
primary key if it is dependent on one or more
other columns, these should be moved into new
tables.
The dead giveaway for a table not being in second
normal form (2NF) is that column values are
required to be repeated in multiple rows. In our
example, one parcel with five owners will require
that the description column be identical for each
of five rows. Therefore, it fails the second
normal form criteria.
Update Access sample add primary key fields,
create new tables

11
Normalization

The prior step took us a long way toward a
normalized database. The last main step is third
normal form, which requires that there be No
dependencies within non-key attributes.
Suppose we wanted to add address information for
each of our owners. We might add address, city,
state, and zip to the owner table. Technically,
however, city and state should be in a separate
table since they are dependent on zip (in real
life you might not bother with this).
Another example might be storing the total
parcels owned by the owner in the owner table.
This would introduces a dependency between
non-key attributes, and is a form of
de-normalization.
Typically, when developing a database, you will
create a completely normalized data model, then
de-normalize it as required by the application.
Usually, any de-normalization steps will require
extra application code to maintain the redundant
data.

12
Primary Keys

The primary key is an attribute or a set of
attributes that uniquely identify a specific
instance of an entity.
Every entity in the data model must have a
primary key whose values uniquely identify
instances of the entity.
To qualify as a primary key for an entity, an
attribute must have the following properties
it must have a non-null value for each instance
of the entity.
the value must be unique for each instance of an
entity.
the values must not change or become null during
the life of each entity instance

13
Composite and Artificial Keys

Sometimes it requires more than one attribute to
uniquely identify an entity. A primary key that
made up of more than one attribute is known as a
composite key. In our example, OwnerID and
ParcelID are a composite key in our joining table
between Owner and Parcel.
An artificial key is one that has no meaning to
the business or organization. These can be very
useful when no attribute has all the primary key
properties, or the primary key is complicated.

14
Foreign Keys

A foreign key is an attribute that completes a
relationship by identifying the parent entity.
Foreign keys provide a method for maintaining
integrity in the data (referential integrity) and
for navigating between different instances of an
entity. Every relationship in the model must be
supported by a foreign key.
Foreign keys make possible establishing
relationships in an Entity-Relationship Diagram
see samples below.
Usually you can design a database with an ERD
tool. SQL Server, for example, ships with a
2-way tool for visually designing your database.
Typically, an RDBMS will let you specify whether
dependent entities are deleted (cascade delete)
when a parent is deleted, or disallowed. This
ensures the referential integrity of the
database.

15
Entity-Relationship Diagrams

Simple Access diagram

16
Entity-Relationship Diagrams
A typical ERD for a small application
17
Entity-Relationship Diagrams
A typical table design
18
SQL

Structured Query Language is the interface you
use to communicate with an RDBMS.
It consists of DML data manipulation language
and DDL data definition language.
SQL is standardized, but different DB vendors
have different flavors and extensions (Oracle
Spatial, for example, adds spatial keywords to
SQL).
SQL is not really a full featured language like
C, although most database vendors have SQL-based
languages like PL/SQL or TSQL that let you embed
SQL statements directly in procedural code.
Typically, a program will interact with a
database by submitting SQL statements, one by
one, to a database using a data access layer that
sends the requests to the database and returns
the results to the program.

19
SQL

DDL statements - examples
CREATE TABLE - creates a new database table
ALTER TABLE - alters (changes) a database table
DROP TABLE - deletes a database table
CREATE INDEX - creates an index (search key)
DROP INDEX - deletes an index

20
SQL

DML statements - example
SELECT - extracts data from a database table
UPDATE - updates data in a database table
DELETE - deletes data from a database table
INSERT INTO - inserts new data into a database
table

21
SQL

Sample all the rows in a table
SELECT from LandParcel
Order by a field
SELECT from LandParcel ORDER BY TaxID
Add a where clause
SELECT from LandParcel
WHERE ParcelId lt 5
ORDER BY TaxID

22
SQL

Joining two tables
SELECT LandParcel.ParcelID, ParcelLandCover.LandCo
ver, LandParcel.ParcelShape, LandParcel.Descriptio
n
FROM LandParcel INNER JOIN ParcelLandCover ON
LandParcel.ParcelID ParcelLandCover.ParcelId
Joining three tables
SELECT LandParcel., Owner.
FROM Owner INNER JOIN (LandParcel INNER JOIN
OwnerParcel ON LandParcel.ParcelID
OwnerParcel.OwnerId) ON Owner.OwnerId
OwnerParcel.OwnerId
Others
SELECT DISTINCT LandCoverType from
ParcelLandCover
SELECT COUNT() from Owner
There are many many more variations on SELECT
statements
(for example, see this discussion.)

23
Data Access Layers

There are many ways of accessing relational
database via code, such as ODBC, ADO, ADO.NET,
JDBC, OLE DB (to see a list of only Microsofts
for example, see http//msdn2.microsoft.com/en-us/
library/ms810810.aspx.
ADO.NET is commonly used in C
SQLServerConnection Conn new
SQLServerConnection("hostnc-starport1433 User
IDtest01Passwordtest01
Database NameTest")
try
Conn.Open()
catch (SQLServerException ex)
Console.WriteLine(ex.Message) return
try
string strSQL "SELECT ename FROM emp WHERE
salgt50000"
SQLServerCommand DBCmd new SQLServerCommand(st
rSQL, Conn)

24
Indexes
25
Other Features of RDBMS

Triggers
Transaction Support
Stored Procedures
Views
User-Defined Functions
User-Defined Data Types
Extended features
Full-text search
Spatial data
Replication
Others

26
Spatial Databases

GIS Systems are often built on top of RDBMS
systems.
Requirements are more involved, including
referential integrity based on topological
geometrical relationships, not just foreign-key
constraints.
OGC has a spec for simple features storage very
similar to the simple features spec used for geo
tools http//www.opengeospatial.org/standards/sfs
ESRIs geodatabases can comply with the Simple
Features Spec for SQL.
Personal geodatabase is MS Access SDE database
can be server database like Oracle or SQL Server

27
Simple Features SQL Spec

Key points from the spec
In a SQL-implementation, a collection of features
of a single type are stored as a "feature table"
usually with some geometric valued attributes
(columns).
Each feature is primarily represented as a row in
this feature table, and described by that and
other tables logically linked to this base
feature table using standard SQL techniques.
The non-spatial attributes of features are mapped
onto columns whose types are drawn from the set
of SQL data types, potentially including SQL3
user defined types (UDT).
The spatial attributes of features are mapped
onto columns whose types are based on the
geometric data types for SQL defined in this
standard and its references.
Feature-table schemas are described for two sorts
of SQL-implementations implementations based a
more classical SQL relational model using only
the SQL predefined data types and SQL with
additional types for geometry.
In any case, the geometric representations have a
set of SQL accessible routines to support
geometric behavior and query.

28
Simple Features SQL Spec

Key points from the spec, continued
In an implementation based on predefined data
types, a geometry-valued column is implemented
using a "geometry ID" reference into a geometry
table.
A geometry value is stored using one or more rows
in a single geometry table all of which have the
geometry ID as part of their primary key. The
geometry table may be implemented using standard
SQL numeric types or SQL binary types schemas
for both are described in this standard.
The term SQL with Geometry Types is used to
refer to a SQL-implementation that has been
extended with a set of Geometry Types.
In this environment, a geometry-valued column is
implemented as a column whose SQL type is drawn
from this set of Geometry Types. The mechanism
for extending the type system of an SQL
implementation is through the definition of user
defined User Defined Types.