Databases and Database Design - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Databases and Database Design

Description:

'Software that controls the organization, storage, retrieval, security and ... but slightly different) char, varchar, number, float, text, BLOB (image), etc. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 33
Provided by: ggr28
Category:

less

Transcript and Presenter's Notes

Title: Databases and Database Design


1
Databases andDatabase Design
  • Overview of key relational database concepts and
    how they relate to GIS.

2
Definitions
  • Database Management Systems (DBMS)
  • Software that controls the organization,
    storage, retrieval, security and integrity of
    data in a database. It accepts requests from the
    application and instructs the operating system to
    transfer the appropriate data. (Computer Desktop
    Encyclopedia, 2007)
  • DBMS may work with traditional programming
    languages (COBOL, C, etc.) or they may include
    their own programming language for application
    development
  • Key features Data Security Data Integrity
    Interactive Query Interactive Data Manipulation
    Data Independence
  • May be Network, Hierarchical, Object, Relational.
    Relational is by far the most commonly-used and
    well-established, and handles most real-world
    data management problems very well.

3
Definitions
  • Relational Database Management Systems (RDBMS)
  • A database that maintains a set of separate,
    related files (tables), but combines data
    elements from the files for queries and reports
    when required. The concept was developed in 1970
    by Edgar Codd, whose objective was to accommodate
    a user's ad hoc request for selected data. See
    Codd Article for details. (Computer Desktop
    Encyclopedia, 2007)
  • Data stored in separate tables, each containing
    tabular data like a spreadsheet, joined together
    as needed.
  • In the early days of RDBMS, many vendors claimed
    to offer relational databases when they did not
    Codd came up with 12 rules defining the
    requirements for a database system to be truly
    relational.
  • Concept of data normalization is key to
    understanding how relational databases are
    designed.

4
DBMS Perspective
5
Tables
  • Relational tables have these properties
  • Values Are Atomic
  • Column Values Are of the Same Kind
  • Each Row is Unique
  • The Sequence of Columns is Insignificant
  • The Sequence of Rows is Insignificant
  • Each column must have a unique name (within the
    table)
  • Represents a single entity.

6
Columns
  • Table columns have these properties
  • They have a data type (similar to variables, but
    slightly different) char, varchar, number,
    float, text, BLOB (image), etc.
  • Column values should be independent from each
    other.
  • Values may be required (not null) or nullable.
    (Some databases differentiate between null and
    empty).
  • Columns may be indexed to improve access speed.
  • A column may be used as the basis of the order of
    the physical data (clustered index).

7
Normalization
  • The most elementary structure for a database is a
    single table, which corresponds to a spreadsheet
    or FeatureClass.
  • In this model there is one kind of entity
    (consisting of rows) which has a uniform set of
    attributes (consisting of columns).
  • Suppose our database is being used to store
    information on land parcels. A basic
    non-normalized structure might look something
    like see Access sample.

8
Normalization
  • While this might adequately capture the data we
    need, it has some drawbacks
  • There will be lots of redundant data if many
    parcels are added. The township, county, and
    owner names, for example, will all be replicated
    for each row where these values are the same.
  • What happens if a single parcel is zoned for
    multiple uses? Or if one parcel has several
    owners? One solution would be to replicate the
    parcel row and then put in some kind of indicator
    that this is referring to the same parcel, but a
    different zoning or owner.
  • How do you know for sure which parcel is which?
    One solution would be to ensure that each
    description is unique.
  • Update Access sample extra columns

9
Normalization
  • Even though this solves some of the initial
    problems, it is obviously still very limited
    why only three coverage types, for example? How
    would one manage all the redundant data? What if
    we wanted to store addresses for each owner? The
    table would quickly become huge and unusable.
  • The solution is to begin normalizing the data.
    You can look at this as a three-step process
    implementing each of the Normal Forms.
  • First normal form rules
  • A row of data cannot contain repeating groups of
    similar data (atomicity) and
  • Each row of data must have a unique identifier or
    primary key.
  • Update Access sample remove redundant columns

10
Normalization
  • Removing columns solves the first problem, but,
    Description no longer uniquely identifies each
    row. What identifies each row now? It is a
    combination of the parcel description and all the
    other columns that used to be redundant. Each
    row now represents a unique Parcel / Owner / Land
    Cover / Zoning type.
  • We are now ready to apply the second normal form
    rule, which states that No attribute can be
    dependent on only a portion of the primary key.
    Every column must depend only on the entire
    primary key if it is dependent on one or more
    other columns, these should be moved into new
    tables.
  • The dead giveaway for a table not being in second
    normal form (2NF) is that column values are
    required to be repeated in multiple rows. In our
    example, one parcel with five owners will require
    that the description column be identical for each
    of five rows. Therefore, it fails the second
    normal form criteria.
  • Update Access sample add primary key fields,
    create new tables

11
Normalization
  • The prior step took us a long way toward a
    normalized database. The last main step is third
    normal form, which requires that there be No
    dependencies within non-key attributes.
  • Suppose we wanted to add address information for
    each of our owners. We might add address, city,
    state, and zip to the owner table. Technically,
    however, city and state should be in a separate
    table since they are dependent on zip (in real
    life you might not bother with this).
  • Another example might be storing the total
    parcels owned by the owner in the owner table.
    This would introduces a dependency between
    non-key attributes, and is a form of
    de-normalization.
  • Typically, when developing a database, you will
    create a completely normalized data model, then
    de-normalize it as required by the application.
  • Usually, any de-normalization steps will require
    extra application code to maintain the redundant
    data.

12
Primary Keys
  • The primary key is an attribute or a set of
    attributes that uniquely identify a specific
    instance of an entity.
  • Every entity in the data model must have a
    primary key whose values uniquely identify
    instances of the entity.
  • To qualify as a primary key for an entity, an
    attribute must have the following properties
  • it must have a non-null value for each instance
    of the entity.
  • the value must be unique for each instance of an
    entity.
  • the values must not change or become null during
    the life of each entity instance

13
Composite and Artificial Keys
  • Sometimes it requires more than one attribute to
    uniquely identify an entity. A primary key that
    made up of more than one attribute is known as a
    composite key. In our example, OwnerID and
    ParcelID are a composite key in our joining table
    between Owner and Parcel.
  • An artificial key is one that has no meaning to
    the business or organization. These can be very
    useful when no attribute has all the primary key
    properties, or the primary key is complicated.

14
Foreign Keys
  • A foreign key is an attribute that completes a
    relationship by identifying the parent entity.
    Foreign keys provide a method for maintaining
    integrity in the data (referential integrity) and
    for navigating between different instances of an
    entity. Every relationship in the model must be
    supported by a foreign key.
  • Foreign keys make possible establishing
    relationships in an Entity-Relationship Diagram
    see samples below.
  • Usually you can design a database with an ERD
    tool. SQL Server, for example, ships with a
    2-way tool for visually designing your database.
  • Typically, an RDBMS will let you specify whether
    dependent entities are deleted (cascade delete)
    when a parent is deleted, or disallowed. This
    ensures the referential integrity of the
    database.

15
Entity-Relationship Diagrams
  • Simple Access diagram

16
Entity-Relationship Diagrams
A typical ERD for a small application
17
Entity-Relationship Diagrams
A typical table design
18
SQL
  • Structured Query Language is the interface you
    use to communicate with an RDBMS.
  • It consists of DML data manipulation language
    and DDL data definition language.
  • SQL is standardized, but different DB vendors
    have different flavors and extensions (Oracle
    Spatial, for example, adds spatial keywords to
    SQL).
  • SQL is not really a full featured language like
    C, although most database vendors have SQL-based
    languages like PL/SQL or TSQL that let you embed
    SQL statements directly in procedural code.
  • Typically, a program will interact with a
    database by submitting SQL statements, one by
    one, to a database using a data access layer that
    sends the requests to the database and returns
    the results to the program.

19
SQL
  • DDL statements - examples
  • CREATE TABLE - creates a new database table
  • ALTER TABLE - alters (changes) a database table
  • DROP TABLE - deletes a database table
  • CREATE INDEX - creates an index (search key)
  • DROP INDEX - deletes an index

20
SQL
  • DML statements - example
  • SELECT - extracts data from a database table
  • UPDATE - updates data in a database table
  • DELETE - deletes data from a database table
  • INSERT INTO - inserts new data into a database
    table

21
SQL
  • Sample all the rows in a table
  • SELECT from LandParcel
  • Order by a field
  • SELECT from LandParcel ORDER BY TaxID
  • Add a where clause
  • SELECT from LandParcel
  • WHERE ParcelId lt 5
  • ORDER BY TaxID

22
SQL
  • Joining two tables
  • SELECT LandParcel.ParcelID, ParcelLandCover.LandCo
    ver, LandParcel.ParcelShape, LandParcel.Descriptio
    n
  • FROM LandParcel INNER JOIN ParcelLandCover ON
    LandParcel.ParcelID ParcelLandCover.ParcelId
  • Joining three tables
  • SELECT LandParcel., Owner.
  • FROM Owner INNER JOIN (LandParcel INNER JOIN
    OwnerParcel ON LandParcel.ParcelID
    OwnerParcel.OwnerId) ON Owner.OwnerId
    OwnerParcel.OwnerId
  • Others
  • SELECT DISTINCT LandCoverType from
    ParcelLandCover
  • SELECT COUNT() from Owner
  • There are many many more variations on SELECT
    statements
  • (for example, see this discussion.)

23
Data Access Layers
  • There are many ways of accessing relational
    database via code, such as ODBC, ADO, ADO.NET,
    JDBC, OLE DB (to see a list of only Microsofts
    for example, see http//msdn2.microsoft.com/en-us/
    library/ms810810.aspx.
  • ADO.NET is commonly used in C
  • SQLServerConnection Conn new
    SQLServerConnection("hostnc-starport1433 User
    IDtest01Passwordtest01
  • Database NameTest")
  • try
  • Conn.Open()
  • catch (SQLServerException ex)
  • Console.WriteLine(ex.Message) return
  • try
  • string strSQL "SELECT ename FROM emp WHERE
    salgt50000"
  • SQLServerCommand DBCmd new SQLServerCommand(st
    rSQL, Conn)

24
Indexes
25
Other Features of RDBMS
  • Triggers
  • Transaction Support
  • Stored Procedures
  • Views
  • User-Defined Functions
  • User-Defined Data Types
  • Extended features
  • Full-text search
  • Spatial data
  • Replication
  • Others

26
Spatial Databases
  • GIS Systems are often built on top of RDBMS
    systems.
  • Requirements are more involved, including
    referential integrity based on topological
    geometrical relationships, not just foreign-key
    constraints.
  • OGC has a spec for simple features storage very
    similar to the simple features spec used for geo
    tools http//www.opengeospatial.org/standards/sfs
  • ESRIs geodatabases can comply with the Simple
    Features Spec for SQL.
  • Personal geodatabase is MS Access SDE database
    can be server database like Oracle or SQL Server

27
Simple Features SQL Spec
  • Key points from the spec
  • In a SQL-implementation, a collection of features
    of a single type are stored as a "feature table"
    usually with some geometric valued attributes
    (columns).
  • Each feature is primarily represented as a row in
    this feature table, and described by that and
    other tables logically linked to this base
    feature table using standard SQL techniques.
  • The non-spatial attributes of features are mapped
    onto columns whose types are drawn from the set
    of SQL data types, potentially including SQL3
    user defined types (UDT).
  • The spatial attributes of features are mapped
    onto columns whose types are based on the
    geometric data types for SQL defined in this
    standard and its references.
  • Feature-table schemas are described for two sorts
    of SQL-implementations implementations based a
    more classical SQL relational model using only
    the SQL predefined data types and SQL with
    additional types for geometry.
  • In any case, the geometric representations have a
    set of SQL accessible routines to support
    geometric behavior and query.

28
Simple Features SQL Spec
  • Key points from the spec, continued
  • In an implementation based on predefined data
    types, a geometry-valued column is implemented
    using a "geometry ID" reference into a geometry
    table.
  • A geometry value is stored using one or more rows
    in a single geometry table all of which have the
    geometry ID as part of their primary key. The
    geometry table may be implemented using standard
    SQL numeric types or SQL binary types schemas
    for both are described in this standard.
  • The term SQL with Geometry Types is used to
    refer to a SQL-implementation that has been
    extended with a set of Geometry Types.
  • In this environment, a geometry-valued column is
    implemented as a column whose SQL type is drawn
    from this set of Geometry Types. The mechanism
    for extending the type system of an SQL
    implementation is through the definition of user
    defined User Defined Types.

29
Spatial Databases
  • Relational view of spatial data

30
Spatial Databases
  • Sample DDL with spatial columns

31
Spatial Databases
  • Sample DML with spatial columns

32
Spatial Databases
  • Mechanics of a spatial query
Write a Comment
User Comments (0)
About PowerShow.com