Title: Beyond the CDISC SDTM V3.1 Model: Statistical
1Beyond the CDISC SDTM V3.1 Model Statistical
Programming Considerations
American Statistical Association 2004
FDA/Industry Statistics Workshop Washington,
DC September 23, 2004
- William J. Qubeck, IV MS, MBA
- Electronic Submissions Data Group Leader
- Global Clinical Data Services, Pfizer Inc.
2Agenda
- Model Overview of CDISC SDTM V3.1
- Programming, Statistical and Submission
Considerations - Cost/Benefits of 3 Implementation Strategies
- Implications and Summary
3CDISC SDTM Version 3.1
4CDISC SDTM Material (www.cdisc.org)
Source of model information www.cdisc.org
5SDTM V3.1 Characteristics
- SDTM applies to all Case Report Tabulation (CRT)
data across all phases of clinical trials
development and generally refers to collected
data - V3 added new variables to represent additional
timing descriptions, flags and descriptive
attributes - All variables must come from the SDTM model does
not allow sponsored defined variables to be added - Numerous changes from Version 2 variables and
labels - Removed most, if not all, selection variables
from domains - Added
- Study Design (planned versus actual) datasets
- Special Purpose/Relationship Datasets
6V3 - Study Data Information Model
- 3 main types of observations (data domains)
- Interventions, events, findings, and other
- Interventions
- Are related to the therapeutic and experimental
treatments (expanded to include other things) - Events
- Observations from subjects on adverse reactions
- Findings
- Evaluations/examinations to address specific
questions (when in doubt its a finding)
7SDS V3 Standard Data Structures
Interventions
Events
Findings
IE
AE
EX
LB
DS
CM
VS
SC
SU
MH
PE
EG
8Standard Model Variables
- Topic
- Identifies the focus of the observation
- Unique identifiers
- Identifies the subject of the observation
- Timing
- Describes the start and end of the observation
- Qualifiers
- Describes the traits of the observation
9An Example Observation
Unique Subject Identifier
Topic
Subject 123 had a severe headache starting on
study day 2
Qualifier
Timing
10Dataset Structure
Timing
Subject Identifier
Qualifier
Topic
Var Names
USUBJID
AESEV
AETERM
AESTDY
Severity/ Intensity
Study Day of Start of Event
Reported Term for the Adverse Event
Unique Subject Identifier
Labels
Observa tion
123
2
HEADACHE
SEVERE
11Core Variables Definition
- A required variable is any variable that is basic
to the identification of a data record (i.e.,
essential key identifiers and a topic variable
that cannot be null) - An expected variable is any variable necessary to
make a record meaningful in the context of a
specific domain (variable should be included)
Some values may be null - Permissible variables should be used as
appropriate when collected or derived. - Any general timing variable not explicitly
mentioned in a domain model is permissible to be
included - Only qualifier variables specified in a domain
model are allowed for that domain.
12A Brief Look at the Domain Classes
13Model Topic Variables Qualifiers
- Events Domain Class
- Topic Variable --TERM (Reported Term)
- Approx. 12 qualifiers (e.g., Modified Term,
Seriousness) - Intervention Domain Class
- Topic Variable --TRT (Treatment)
- Approx. 6 qualifiers (e.g., Dose, Unit)
- Findings Domain Class
- Topic Variable --TESTCD (Test Code)
- Many qualifiers (e.g., Units, Standardize Results)
14Example Events Data (MH)
15Example Findings Data (VS)
16Creating a New Domain
Superset of Variables
17Programming, Statistical and Submission
Considerations.
18PFE CDISC SDTM
- Pfizer has and continues to contribute to CDISC,
participated in the FDA pilots and has
implemented CDISC Version 2.0 - We delivered our first CDISC SDTM compliant
submission in August - Submitted 5 protocols of partial data
- Over 11,000 patients worth of data
- Included all CDISC defined domains plus 5
additional as well as the define.xml - Converted several of the analysis datasets into
SDTM compliant structures
19Submission Data Processes
20Mapping Events Interventions
Internal Dataset
A
Retain the SEQ s
21Lessons learned
- Mapping was straight forward
- eSub data documentation was not affected (e.g.,
define.pdf) - Only a few variables were mapped to SUPPQUAL (the
exception not the rule) - Technical challenges
- Increase dependencies SUPPQUAL CO become
dependent on all contributing source datasets 1
to many (source to target domain) - Several defined internal datasets may map to 1
domain target - May rethink how XPTs are generated one at a
time or in batches - No specific statistical considerations
22Lessons learned
- May need to rethink how you organize your data
into CDISC SDTM structures - For example,
23Example Exercise
- Does each item go into the Demographics Domain?
24Answer NO
Demographics
Vital Signs
Subject Characteristics
Substance Use
25Mapping Findings
Internal Dataset
B
Horizontal dataset
Retain the SEQ s
26Lessons learned
- It describes the majority of the data in a
submission - More complicated, b/c need to retain the
transposed information and should be provided in
define.xml - Statistical programming considerations
- data stored in non-traditional structure
- The structure is flexible enough to contain both
collected and analysis data do you continue to
keep them separate? - eSub data documentation is affected
- Need to change CRF annotations and provide column
(variable) and record-level
27An Example Vitals Signs (VS)
Example Dataset
USUBJID VISIT DIABP SYSBP BMI HEIGHT
0001 1 70 110 25.3 55
28Additional define.pdf/xml Section
29VS Annotated Page (blankcrf.pdf)
OR VSORRES, where VSTESTCD XYZ
30Overall Statistical Programming Considerations
- Where to implement the data standards?
- At the end (at XPT generation)
- During the table production process
- All the way back to the Database
- Must prioritize whats important
- Having minimal impact on your internal data
storage /or table creation process algorithms - Implementing versions quickly (Time Resource
Issues) - End-game mapping costs
- Software re-use?
31Implementation Strategies
32Benefit/Cost of Mapping to SDTM Post-CSR
- Benefits
- Versions have minimal impact on data storage
processing - Version changes can be quickly implemented
- Supports early adoption of the standards
- Costs
- Mapping Costs (for each study and type of data)
- Could add time to the critical path
- Data used to produce the outputs (tables,
listings and graphs) may not match the submitted
data (e.g. variable names, data structure, the
records maybe placed into different domains)
raises questions regarding data exchanges for
Rapid Response - Additional QC steps
33Benefits/Costs of Mapping to SDTM within CSR
Process
- Benefits
- Data used to produce the outputs matches the
submitted data - Previously developed software can be used to
answer reviewer questions (supports software
reuse) - Additional time does not have to be added to the
critical path - Costs
- Version changes affect the application of
algorithms plus output generation software - Mapping the data (for each study and type of
data) - Although time is not added to the end
additional time is needed to complete the
mappings - Annotated CRFs from the clinical trials database
do not match the data submitted
34Benefits/Cost of Mapping to SDTM within Database
(data storage)
- Benefits
- The standards would be throughout the entire
clinical data storage, processing, and reporting
processes - The extra time needed to implement the standards
is an up front cost - No additional QC step because no mapping is
necessary - Supports software reuse
- Facilitates Electronic Data Interchange - cost
savings - CDISC estimates that the average data transfer
cost per study is approximately 35k 122.5M
annually - Standardizes the exchange btw researchers, study
sponsors, regulatory authorities and the applicant
35Benefits/Cost of Mapping to SDTM within Database
(2)
- Costs
- Version changes can have a significant impact
upon the entire clinical data storage,
processing, and reporting processes - Raises change control and implementation issues
- Drug development programs may span many different
versions due to length of time in development - Software version control and output
reproducibility - How to roll out new versions of the standards?
36Implications to the Industry
- All sponsors are facing implementation strategy
challenges - Analysis Datasets should also be provided in
addition to the SDTM datasets - At this point they dont need to conform to V3
- Will be provided separately (e.g., in a different
submission directory) - Standardized datasets will enable the use of
standardized review tools and could lead to more
thorough and efficient reviews (e.g., decreased
learning curve)
37Summary
- There are significant differences between CDISC
SDS V2 and V3 in terms of scope, design and
philosophy - For more information regarding SDTM Version 3.1
www.cdisc.org - Thank you!
- William_J_Qubeck_at_Groton.Pfizer.com