Title: Designing%20a%20Data%20Exchange%20-%20Best%20Practices
1Designing a Data Exchange - Best Practices
- Data Exchange Scenarios
- Sender vs. Receiver-initiated exchanges
- Node Design
- Best Practices
- Handling Large Transactions
- State Management
- Data Services
- Data Validation
- Schema Design
2Data Exchange Scenarios
3Requesting Data (1 of 3)
- Simple Query
- Synchronous process
- Ideal for small data sets
- Ideal for both ad hoc and planned exchanges
- Onus is on requestor to initiate exchange
4Requesting Data (2 of 3)
- Solicit with Download
- Asynchronous process
- Good for larger datasets
- Data Provider can schedule processing of request
- Requester can use GetStatus to see if data is
ready yet
5Requesting Data (3 of 3)
- Solicit with Submit
- Asynchronous process
- Good for larger datasets
- Does not require the requestor to continuously
poll the data provider to see if data is ready
6Sending Data (1 of 2)
- Simple Submit
- Very simple and very common process
- Typical for traditional regulatory flows
- Hides data since is not exposed as a service
7Sending Data (2 of 2)
- Notify with Download
- Asynchronous approach to Simple Submit
- Receiver can perform download at the time of
their own choosing
8Data Exchange Scenarios
- Nodes wait for requests
- Nodes may initiate actions (i.e. Submit)
- How can a node do both?
9Node Components
Example Node Architecture
10Node Components
- Node can be divided into components, each playing
a different role - The Web Services Interface
- Acts as a listener for inbound requests and
submissions - Hosted on a Web Server (i.e. IIS, WebSphere)
- Should not do any heavy lifting (i.e. data
processing)
11Node Components (continued)
- Request Processor
- Performs all data processing
- Composes XML files for outbound delivery
- Decomposes and processes inbound XML files
- Coupled with a scheduler component
- Enables node to process Solicit requests at a
time of the node administrators choosing - Automatically kick off outbound processes (i.e.
daily Submit) - Flow agnostic
- Decoupled from specific flow implementations
- Ideally installed on an Application Server
12Node Components (continued)
- Node Administration Utility
- Create and manage local accounts
- Install new data exchange components
- Set processing schedules
- Audit Node activity
- Extract documents (inbound and outbound should be
stored)
13Node Components (continued)
- Flow-specific components
- Discrete components tailored for a specific data
exchange - Hot-swappable
- Services (interface) is generic
- Node configuration determines which services are
internal or public - Node configuration determines whether a given
service is for Query or Solicit
14Node Components (continued)
Flow-to-Node Interface
15Large Transactions
- Can cause problems in several areas
- Data retrieval (SQL)
- XML serialization (sender side)
- Transmission over Internet
- XML deserialization (receiver side)
- Schema validation (both sender and receiver)
16Large Transactions
- Stage data in a model similar to that which is
used by the schema - XML is hierarchal whereas RDBMS is relational
- More secure
- source system unaffected by node operations
- Index query parameter fields
(SQL)
17Large Transactions (continued)
- Use an asynchronous exchange
- Use Solicit, not Query
- Schema design considerations
- Schema KEY/KEYREF discouraged
- Element naming may significantly affect file size
- ltMailingAddressStateUSPSCodegtORlt/MailingAddressSt
ateUSPSCodegt - Query costing
- Calculate the size of a given result set (i.e.
COUNT()) before running full query. - Not very much experience in this area
18Large Transactions (continued)
- A well-designed flow can help avoid large
transactions - List services can return only high-level data
- Scenario 1
- RCRA.GetFacilities(WA)
- Scenario 2
- RCRA.GetFacilityList(WA)
- RCRA.GetFacilityDetail(WA,FACID1234)
- Data service parameters can be used to limit
transaction size - Scenario 3
- RCRA.GetFacilitiesByType(WA,LQG)
- All options affect schema design
19Large Transactions (continued)
- File compression
- zipping files can reduce file size by over 90
- Compact storage (archiving)
- Significant reduction in time to transmit
- Disk I/O versus memory I/O
- If possible, avoid using techniques which require
system to read entire document into memory in
order to process. Toughie
20State Management
- State Management is required any time two systems
must be synchronized - Contrast to Data Publishing exchange
- Typically the senders burden, but does not have
to be - Partial rejects compound the difficulty
21State Management (continued)
- Flagging source data
- Set submission status indicator on source data
- Complexity is directly related to transaction
granularity - Compounded if record-level rejects are performed
22State Management (continued)
- Exchange Network Header
- Same schema can be used to perform different
transactions - Can remove the need for TransactionCode (i.e.
INSERT, UPDATE, DELETE) in schema - Delta to derive data changes since last submit
- Many systems do not store deleted data
- Compare last submission snapshot with current
snapshot, derive what has changed - Incremental and full refresh services
- i.e. Facility Flow
23Data Service Best Practices
- Data service naming conventions
- Prefix.ActionObjectByParameter(s)
- i.e. FacID.GetFacilityByName
- Work in Progress
- What about versioning?
24Data Services Best Practices
- Documenting data services
- Data Service name
- Whether the service is supported by Query,
Solicit, or both - Parameters
- Parameter Name
- Index (order)
- Required/Optional
- Minimum/Maximum allowed values
- Data type (string, integer, Boolean, Date)
- Whether multiple values can be supplied to the
parameter - Whether wildcard searches are supported and
default wildcard behavior - Special formatting considerations
- Access/Security settings
- Return schema
- Special fault conditions
- Wildcards
- Parameter delimiter (pipe character)
25Data Validation Best Practices
- XML instance files should be validated against
the schema by the sender before submittal - CDX offering pre-submittal validation services
for some flows - Schematron (Doug Timms)
26Schema Design Best Practices
- DRC 1.0 and DRC 1.1
- Schema Namespace
- Schema Versioning
- Exchange Network Schema Types
- Use the Shared Schema Components