Title: Data Migration
1Data Migration
- Massachusetts Biotechnology Council
- 11-July-2008
- Brian K. Perry, President
- BKP Technologies, Inc.
2Highlights
- What is Data Migration?
- Anatomy of Migration Projects
- Migration Types and Strategies
- Technical Considerations
- Validation Considerations
3What is Data Migration?
- Data migration is the process of transferring
data between storage types, formats, or computer
systems. Data migration is usually performed
programmatically to achieve an automated
migration, freeing up human resources from
tedious tasks. - - Wikipedia
4What is Data Migration?
CDMS
Safety
Data Warehouse or Datamart
Migration ETL Process
EDC
Analytics
Preclin
Data
5Anatomy of Migration Projects
Planning and Analysis Phase
Team Selection and Planning
Migration Strategy
Analyze Data Sources
Execution Phase
Validation
Migration
Data Mapping/ Programming
Go Live
6Team Selection and Planning
- Project Management
- Clinical Data Management
- Pre-Clinical
- Product Safety/Pharmacovigilance
- Information Technology
- QA/Validation
7Analysis of Data Sources
- CDMS, EDC or Safety System
- Direct Database Transfer
- Flat Data Export Files
- CDISC or E2B XML Exports
- SAS Datasets
- Other electronic sources
- Microsoft Excel Spreadsheets
- Home-grown databases
- Paper Documents (Source)
- Regulatory Submissions (NDA, 3500A, etc)
8Migration Types
- Single Use
- End of In-House study
- End of CRO study
- Migration from legacy system
- Acquisition/License of compound/product
- Continuous
- On-going Studies
- Safety Data and Post Marketing Data
9Migration Strategies Single Use
- Data Formats
- Full Database Dump
- Flat File Exports
- SAS Datasets
- Structured Files (XML w/CDISC or E2B)
- Considerations
- Cleanliness of data source
- Static nature of data
10Migration Strategies Continuous
- Data Formats
- Full Database Dump
- Structured Files (XML w/CDISC or E2B)
- Considerations
- Dynamic nature of data
- Ability to adapt to changes in source system
- Validation on-going
11Migration Strategies CDISC/E2B
- Leverages existing CDISC and E2B export
functionality of CDMS, EDC and Safety systems - Data mapping is simplified because the standards
are defined - But. Not all data in source database may be
present in CDISC or E2B
12Migration Strategies Database Transfer
- Provides access to all data fields in the source
and destination systems - More complicated mapping than CDISC or E2B
options - May not be an option for single-use migrations
where the source system is contained at a partner
company or CRO
13Migration Strategies Tools
- Commercial Data Integration/Manipulation and ETL
Tools - BizTalk Server Microsoft Corporation
- Data Junction Pervasive Software Inc.
- DataMirror Transformation Server DataMirror
Corporation - Data Transformation Services (DTS) Microsoft
Corporation - XML Spy - Altova
- Open Source Tools
- PERL
- PHP
14Technical Considerations
- CDMS/Safety System view of Data
- Optimized for Data Entry, Cleaning, Review and
Regulatory Submission Preparation - Operational and transactional data model
- Different data models and coded values
- Data Mart/Data Warehouse view of Data
- Optimized for data retrieval and analysis
- Unified data model
- Unified coding of values
- Normalized or dimensional model of data
15Technical Considerations
- Clinical vs. Safety view of data
16Technical Considerations
- Identifying Data Elements
- Data Fields and Values
- Derived and Computed Values
- Coding Dictionaries
- Events/History COSTART, WHOART, MedDRA,
ICD9/10, Custom dictionary - Meds and Products WHODRL, Custom dictionary
- Metadata
- Visit Structure
- Company Products, Studies, Licenses
- Code Lists
17Technical Considerations
- Data Element Issues
- Data Type Issues
- Data Field Size Issues
- CDSIC and E2B Compliance Issues
- Cleanliness and Integrity of Source Data
- Transformations of Data
- Coded Values
18Technical Considerations
- Coded Data
- Events, Medical History and Labs
- Source data often has a multitude of dictionaries
(COSTART, WHOART, MEDDRA, ICD9/10, SNOMED) - Issues in maintaining multiple dictionary
versions - Leveraging auto-encoders
- Products
- Typically WHODRL
- Managing company products
- Leveraging auto-encoders
19Technical Considerations
- Metadata
- Visit Structure
- Code Lists
- Time Units
- Dosing Units
- Weight/Age Units
- Product data (dose units, frequency,
formulations, etc.) - Lab Codes
- Causality codes
20Technical Considerations
The Golden Rule of data migration Garbage
In Garbage Out
21Validation Considerations
- Validation Strategies
- Tools and Process
- Data Verification of Data Samples
- Key Decision Drivers
- Validation status of source system/data
- Whether the migration is single-use or continuous
22Validation Considerations
- Validation Artifacts
- User Requirements
- Technical Specifications/Data Mapping Plan
- Risk Assessment and Mitigation
- Migration Master Plan
- Unit Test Plan and Tests
- Qualifications
- Installation Qualification
- Operational Qualification
- Performance Qualification (Continuous Migrations)
- Traceability Matrix
- Final Report
23Validation Considerations
- Qualification
- Installation Qualification (IQ) of Migration
Tools - Operational Qualification (OQ) of
Mapping/Transforms - Performance Qualification (PQ) for Continuous
Migrations - Data Verification
- Manual sampling and comparison of cases between
data sources and destination safety system - Sample Size
- ANSI Z1.4 (MIL-105)
- Sqrt(n) 1
- 10
24Questions and Discussion
25Contact Information
- Brian K. Perry
- President
- BKP Technologies, Inc.
- bkp_at_bkptech.com
- 1.617.964.2100