Choosing the correct data movement technology - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

Choosing the correct data movement technology

Description:

Underlying technology is normally Two Phase Commits Sybase products: EAServer Not really data movement but is appropriate in some systems. – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 80
Provided by: Stephen662
Category:

less

Transcript and Presenter's Notes

Title: Choosing the correct data movement technology


1
EM416 Choosing the Correct Data Movement
Technology
Chris Kleisath Director of Engineering iAnywhere
Solutions kleisath_at_ianywhere.com
2
Overview
  • When choosing a data movement technology you need
    to consider
  • The business requirements for the distributed
    database.
  • The technological limitations of your
    environment.
  • The development and administrative resources
    available.

3
Enterprise Data Movement
Wired and Wireless Communication Links
4
Distributed Systems
  • Using any data movement technology means we have
    a distributed database.
  • This almost always implies some form of
    distributed application.

5
What is a Distributed System?
  • C.J. Dates working definition
  • A distributed database system consists of a
    collection of sites, connected together via some
    sort of network, in which
  • Each site is a database system in its own right
  • Sites have agreed to work together (if
    necessary), so that a user at any site can access
    data anywhere in the network exactly as if the
    data were all stored at the users own site.

6
Distributed SystemsPractical Factors
  • Not all systems require that all data be
    available to all sites.
  • Not all systems require that all data be
    consistent between all sites all of the time.
  • The degree to which your system must meet the
    ideal definition is the single biggest factor in
    choosing your data movement technology.

7
Issues when distributing data
  • Local autonomy
  • Data partitioning (fragmentation)
  • Consistency
  • Transaction control
  • Accessibility (connection)
  • Topology

8
Local autonomy
  • Each site should operate independently of the
    other sites.
  • No site should depend on another site for its
    successful functioning.
  • A centralized database provides the lowest level
    of local autonomy.
  • Decentralized systems provide the highest level
    of local autonomy.

9
Data Partitioning
  • Also known as fragmentation.
  • Only the data needed by a site is present at the
    site.
  • The database at a site is a complete subset of
    the data.
  • Some data will need to be duplicated between
    sites.

10
Data PartitioningUpdate Anywhere
  • Primary keys must be unique across the entire
    distributed system.
  • If multiple sites insert into the same table.
  • Requires a conflict detection and resolution
    mechanism.
  • If multiple sites are able to change the same row.

11
Tight vs. Loose Consistency
  • Which version of the data is being used?

Waterloo
Paris
12
Consistency
  • Tight consistency requires all data to be in a
    consistent state.
  • Loose consistency permits data to be
    out-of-date.
  • Latency is the measure of how long it takes the
    data to become consistent.
  • In some cases it is never consistent since there
    are always changes that have not been moved.

13
Transaction Control
  • Your chosen technology must pass the ACID test
  • Atomicity, Consistency, Isolation, Durability
  • Only committed data should move.
  • Committed data must move.
  • Failure to successfully move committed data must
    be detectable.
  • Changes must be applied in the same order on all
    databases.

14
Accessibility
  • What kind of network do you have between the
    sites?
  • High-speed LAN/WAN
  • Low-speed Dial-up (RAS)
  • Wireless
  • Indirect (email, ftp)
  • Internet (HTTP)
  • Sneaker-net

15
Topology
  • What kind of relationship exists between the
    sites.
  • Peer-to-peer
  • Each site can transfer data to any other site.
  • No centralized master copy can exist.
  • Conflict resolution is extremely difficult
  • There is no place to detect and resolve the
    conflict

16
Topology
  • Hierarchical
  • Each site passes data up and down the hierarchy.
  • A central master copy (consolidated database)
    exists.
  • Data must pass through the consolidated to move
    to another site.
  • Conflict detection and resolution is implemented
    on the consolidated.

17
Other Issues
  • Number of sites
  • Some technologies are better suited to mass
    deployment.
  • Vendors
  • Are the databases at each site the same product?
  • Is the technology commercially available and
    supported?

18
FactorsSummary
  • Each of the following factors will influence your
    choice of data movement technologies
  • Local autonomy
  • Data partitioning (fragmentation)
  • Consistency
  • Transaction control
  • Accessibility (connection)
  • Topology

19
Types of Data Movement
  • All technologies can be categorized as one of
  • Online
  • Synchronization
  • Replication

20
Online
  • Changes are made simultaneously on all
    databases.

Please wait while your account is being updated
Withdraw 100
Waterloo
Paris
21
Online
  • In its simplest form the application updates
    all of the databases directly.
  • Underlying technology is normally Two Phase
    Commits
  • Sybase products
  • EAServer
  • Not really data movement but is appropriate in
    some systems.

22
Online CharacteristicsLocal Autonomy
  • Very low level of local autonomy.
  • If one site is down the entire system is down.

X
Sorry the System is Unavailable
Withdraw 100
Waterloo
Paris
23
Online CharacteristicsData Partitioning
  • Data can be partitioned as required.
  • If the data is partitioned the application must
    update the row(s) everywhere.
  • Since transactions are applied at all databases
    simultaneously no primary key or conflict issues
    arise.

24
Online CharacteristicsConsistency
  • Use when tight consistency is an absolute
    requirement.
  • Transactions will succeed or fail on all
    databases.

25
Online CharacteristicsTransaction Control
  • A Distributed Transaction Server (DTS) should be
    used.
  • Ensures the transaction is applied on all sites
    or not at all.
  • Very expensive to code yourself.
  • Both ASA and ASE provide support for a DTS

26
Online CharacteristicsAccessibility
  • Requires a reliable network connection between
    sites.
  • Transactions will fail if one database is
    unavailable.
  • Application speed will be affected by network
    speed.

27
Online Characteristics
  • Typically a peer-to-peer topology.
  • Since all databases are updated at once no master
    copy is required.

28
Online CharacteristicsOther Issues
  • Simple to understand
  • Looks just like a centralized database.
  • Very few sites can be supported
  • Consider the cost of updating many databases at
    once.
  • Vendors
  • Heterogeneous environments are easily supported.

29
Synchronization
  • Current state of the data is moved between
    databases.
  • Can be a complete refresh or only the rows that
    have changed.
  • Sybase products
  • MobiLink

30
Synchronization
1234
10
UPDATE Product SET qty_oh 8WHERE sku_key 1234
1234
10
31
Synchronization CharacteristicsLocal Autonomy
  • High local autonomy
  • Site database must have all of the data required
    for the application to run.

32
Synchronization CharacteristicsData Partitioning
  • Data is usually partitioned.
  • Each site has common data and site specific data.
  • Update anywhere requires
  • Unique primary keys.
  • Conflict detection and resolution mechanism.

33
Synchronization CharacteristicsConsistency
  • Low to high consistency.
  • Data is only consistent immediately after
    synchronization.
  • Frequency of synchronization affects level of
    consistency but in all cases there is some
    latency.

34
Synchronization CharacteristicsTransaction
Control
  • Transaction boundaries are not maintained.
  • Some operation sequences can not be synchronized.
    (i.e. insert then delete of a row with the same
    primary key value)
  • Most synchronization technologies batch the
    operations.
  • e.g. all deletes, then inserts, then updates

35
Synchronization CharacteristicsAccessibility
  • Requires a stable network connection during the
    synchronization process.
  • Connection speed affects the amount of data that
    can be reasonably synchronized.

36
Synchronization CharacteristicsTopology
  • Both peer-to-peer and hierarchical topologies are
    possible.
  • Peer-to-peer is difficult if update anywhere is
    permitted.
  • Which copy of the data is correct?
  • Who resolves an update conflict?

37
Synchronization CharacteristicsOther Issues
  • Heterogeneous environments can be supported.
  • Be aware of compatibility issues. E.g. Oracle
    allows 1 varchar column/table, ASA has no limit.
  • Because each site synchronizes independently many
    sites can be supported.

38
MobiLink
ASA, ASE, Microsoft, Oracle, IBM
Serial
HTTP, TCPIP
HotSync, Wireless
ASA, PalmOS, CE, Pagers, Phones
39
MobiLink Characteristics
  • Complete local autonomy.
  • Complete control over data partitioning on the
    consolidated through the use of scripts.
  • Uses the consolidated databases scripting
    language or Java.
  • No partitioning allowed on the remote.

40
MobiLink Characteristics
  • Session based.
  • Only changed records are synchronized.
  • Connection only required while synchronizing.
  • Bi-directional by default.
  • Medium to high latency.
  • Low to medium data volume.

41
MobiLink Characteristics
  • Hierarchical topology.
  • Consolidated can be any ODBC-based database
  • Sybase, Microsoft, Oracle, IBM
  • ASA and/or UltraLite remotes.
  • Optimized for thousands of remotes.
  • Scalable based on consolidated databases
    capabilities.

42
MobiLink Synchronization Components
Consolidated Database Server
Consolidated Data Store
MobiLink Client(ASA or UltraLite)
Remote Database Server (ASA or UltraLite)
MobiLinkServer
Remote Data Store
43
MobiLink Synchronization Server
  • Provides interface between consolidated database
    and remote server.
  • Works with ODBC-based host databases.
  • Responsible for ensuring the synchronization
    process completes.
  • Supports multiple simultaneous synchronizations.

44
MobiLink Consolidated Synchronization Logic
  • SQL statements executed against the consolidated
    database.
  • Written in language of consolidated database or
    Java.
  • Guides the synchronization server.
  • Controls the flow of data in both directions.
  • Handles conflicts.

45
MobiLink Remote Synchronization Logic
  • ASA and UltraLite keep track of changes to the
    data.
  • A synchronization component is provided to
  • Scan for changes to create the upload stream
  • Receive the download stream and apply the
    changes to the remote

46
Replication
  • Transactions (changes) are moved between the
    databases.
  • Uses store and forward mechanism.
  • Site(s) must have a common starting point.
  • Sybase Products
  • SQL Remote
  • Replication Server

47
Replication
1234
10
UPDATE Product SET qty_oh 9WHERE sku_key 1234
UPDATE Product SET qty_oh 8WHERE sku_key 1234
1234
10
48
Replication CharacteristicsLocal Autonomy
  • High local autonomy.
  • Database must have all of the data required for
    the application to run.

49
Replication CharacteristicsData Partitioning
  • Data is usually partitioned.
  • Each site has common data and site specific data.
  • Update anywhere requires
  • Unique primary keys.
  • Conflict detection and resolution mechanism.

50
Replication CharacteristicsConsistency
  • Low to high consistency is possible.
  • Speed of store and forward messaging system
    determines how consistent the database is.
  • Some latency is always present.

51
Replication CharacteristicsTransaction Control
  • Mechanism must exist to guarantee transactions
    are
  • Sent and applied in the correct order.
  • No transactions are skipped

52
Replication CharacteristicsAccessibility
  • Whether a direct connection is required or not is
    dependant on the latency requirements.
  • Not required in high latency implementations.

53
Replication CharacteristicsTopology
  • Both peer-to-peer and hierarchical topologies can
    be used.
  • Conflict resolution normally requires a
    hierarchical model.

54
Replication CharacteristicsOther Issues
  • Only transactions are moved therefore
  • It is possible to support many sites.
  • Throughput is usually independent of database
    size.

55
Replication Server
Replicate Sites
Primary Sites
  • Adaptive Server

Replication Agent
Replication Server
  • DirectCONNECT
  • (Native drivers)
  • Adaptive Server/Enterprise
  • Adaptive Server/Anywhere
  • Oracle
  • Informix
  • OS/390 DB2
  • Replication Toolkit for MVS
  • DirectCONNECT/
  • Anywhere (ODBC)

56
Replication Server Characteristics
  • Transactions are sent to Replication Server which
    stores and forwards them to the interested sites.
  • Assumes there is normally a high speed
    connection.
  • Near real time (low latency).
  • High data volumes.
  • Moderate number of sites.
  • Heterogeneous databases supported.
  • Uni-directional by default.

57
Replication Server
Replicate Sites
Primary Sites
Replication Agent
Replication Server
58
Replication Server ComponentsPrimary Site
  • Origin of the data being moved.
  • Multiple vendors RDBMS supported.
  • Keeps a record of all transactions. Normally
    this is in the transaction log but it depends on
    the RDBMS.

59
Replication Server ComponentsReplication Agent
  • Scans the primary sites record of transactions.
  • Passes the committed transactions, in the order
    they were applied, to Replication Server.

60
Replication Server ComponentsReplication Server
  • Receives transactions from the Replication
    Agents.
  • Stores the transactions until they are
    successfully applied on all replicate sites.
  • Maintains a connection to all replicate sites.
  • Automatically recovers when a connection is
    dropped and restored
  • Determines which site(s) require the transaction
    and applies them in the correct order.

61
Replication Server ComponentsReplication Server
  • Prevents circular transactions.
  • Provides user programmable function strings to
    allow manipulation of the transaction.
  • Data conversions (e.g. date formats)
  • Conversion of SQL in heterogeneous environments.
  • Detects SQL errors.

62
Replication Server ComponentsReplicate Site
  • Applies SQL sent by Replication Server.
  • A replicate site can also be defined as a primary
    site if bi-directional replication is required.

63
SQL Remote
ASE
ASA
OR
ASA
ASA
64
SQL Remote Characteristics
  • Complete local autonomy.
  • Partitioning based on
  • Column values
  • Subqueries
  • Where clauses
  • Message based (no connection)
  • MAPI (Microsoft), VIM (Lotus), SMTP, FTP and File
  • Very loose consistency.

65
SQL Remote Characteristics
  • Built in guaranteed message delivery.
  • Hierarchical
  • Consolidated is either ASA or ASE
  • Remotes are ASA
  • Homogeneous.
  • Many (thousands) of remotes.
  • Low to medium data volumes.

66
SQL Remote Components
Consolidated DatabaseServer
Message Agent
Consolidated Data Store
Message System
Remote DatabaseServer
Message Agent
Remote Data Store
67
SQL Remote ComponentsConsolidated Database
  • Contains a copy of all data that is replicating.
  • Performs conflict detection and resolution.
  • Transactions are recorded in the transaction log.
  • Maintains additional data in the transaction log
    about what transactions are eligible to replicate
    and how they are partitioned.

68
SQL Remote ComponentsMessage Agent
  • Scans the transaction log for committed
    transactions that are eligible to replicate.
  • Builds messages for the sites that have
    subscribed to the transactions.
  • Interfaces with the message system.
  • Guarantees that transactions are
  • Sent in the correct order.
  • Applied in the correct order and only applied
    once.
  • No transactions are skipped.

69
SQL Remote ComponentsMessage Agent
  • Receives transactions from the message system.
  • Applies the transactions.
  • Prevents circular transactions.
  • Detects update conflicts.
  • Detects SQL errors.

70
SQL Remote ComponentsMessage System
  • Provides the store and forward technology.
  • Support for
  • MAPI
  • SMTP
  • VIM
  • File
  • FTP

71
SQL Remote ComponentsRemote Database
  • Contains data the site is subscribed to.
  • Transactions are recorded in the transaction log.
  • Maintains additional data in the transaction log
    about what transactions are eligible to replicate
    and how they are partitioned.

72
Which Technology should I Choose?
  • Depends on the business requirements and
    technological infrastructure available.
  • Consistency and latency are the biggest factors.

73
Use EAServer When
  • Absolute consistency is required (Zero latency).
  • Transactions must fail when one of the site
    databases is unavailable.
  • There are very few sites.

74
Use MobiLink When
  • Latency is permitted.
  • Local autonomy is required.
  • A reliable connection exists.
  • You have low to medium data volumes.
  • There are heterogeneous databases.
  • You do not require transaction boundaries to be
    maintained.
  • You have a hierarchical topology.
  • You have many remotes.
  • You must know when your changes have been
    synchronized.

75
Use Replication Server When
  • Near real-time consistency is required.
  • You have high data volumes.
  • Local autonomy is required.
  • There are heterogeneous databases.
  • You require transaction boundaries to be
    maintained.
  • You have a peer-to-peer topology.
  • You can implement a hierarchical topology
  • A small number of sites.

76
Use SQL Remote when
  • Latency is not a factor.
  • No direct connection exists (or is not permitted)
    or the connection is unreliable,.
  • Local autonomy is required.
  • You have low to medium data volumes.
  • Homogeneous (ASA ASE) databases.
  • Hierarchical topology.
  • You have many remotes.
  • You require transaction boundaries to be
    maintained.

77
Or Combine them
  • All four products are compatible.
  • Use the strengths of each to solve your business
    problems.
  • Replication Server or EAServer between main
    geographical databases.
  • MobiLink or SQL Remote for mass deployed devices.

78
Summary
  • Sybase has many different methods of maintaining
    data in distributed databases.
  • Your business requirements dictate which method
    is best.
  • All the technologies can be used together on the
    same database.

79
EM 416 Choosing the Correct Data Movement
Technology
Chris Kleisath Director of Engineering iAnywhere
Solutions kleisath_at_ianywhere.com
Write a Comment
User Comments (0)
About PowerShow.com