Title: PS1 PSPS Object Data Manager Design
1PS1 PSPSObject Data Manager Design
- PSPS Critical Design Review
- November 5-6, 2007
- IfA
2Outline
- ODM Overview
- Critical Requirements Driving Design
- Work Completed
- Detailed Design
- Spatial Querying AS
- ODM Prototype MN
- Hardware/Scalability JV
- How Design Meets Requirements
- WBS and Schedule
- Issues/Risks
- AS Alex, MN Maria, JV Jan
3ODM Overview
- The Object Data Manager will
- Provide a scalable data archive for the
Pan-STARRS data products - Provide query access to the data for Pan-STARRS
users - Provide detailed usage tracking and logging
4ODM Driving Requirements
- Total size 100 TB,
- 1.5 x 1011 P2 detections
- 8.3x1010 P2 cumulative-sky (stack) detections
- 5.5x109 celestial objects
- Nominal daily rate (divide by 3.5x365)
- P2 detections 120 Million/day
- Stack detections 65 Million/day
- Objects 4.3 Million/day
- Cross-Match requirement 120 Million / 12 hrs
2800 / s - DB size requirement
- 25 TB / yr
- 100 TB by of PS1 (3.5 yrs)
5Work completed so far
- Built a prototype
- Scoped and built prototype hardware
- Generated simulated data
- 300M SDSS DR5 objects, 1.5B Galactic plane
objects - Initial Load done Created 15 TB DB of simulated
data - Largest astronomical DB in existence today
- Partitioned the data correctly using Zones
algorithm - Able to run simple queries on distributed DB
- Demonstrated critical steps of incremental
loading - It is fast enough
- Cross-match gt 60k detections/sec
- Required rate is 3k/sec
6Detailed Design
- Reuse SDSS software as much as possible
- Data Transformation Layer (DX) Interface to IPP
- Data Loading Pipeline (DLP)
- Data Storage (DS)
- Schema and Test Queries
- Database Management System
- Scalable Data Architecture
- Hardware
- Query Manager (QM CasJobs for prototype)
7High-Level Organization
8Detailed Design
- Reuse SDSS software as much as possible
- Data Transformation Layer (DX) Interface to IPP
- Data Loading Pipeline (DLP)
- Data Storage (DS)
- Schema and Test Queries
- Database Management System
- Scalable Data Architecture
- Hardware
- Query Manager (QM CasJobs for prototype)
9Data Transformation Layer (DX)
- Based on SDSS sqlFits2CSV package
- LINUX/C application
- FITS reader driven off header files
- Convert IPP FITS files to
- ASCII CSV format for ingest (initially)
- SQL Server native binary later (3x faster)
- Follow the batch and ingest verification
procedure described in ICD - 4-step batch verification
- Notification and handling of broken publication
cycle - Deposit CSV or Binary input files in directory
structure - Create ready file in each batch directory
- Stage input data on LINUX side as it comes in
from IPP
10DX Subtasks
DX
Initialization Job FITS schema FITS reader CSV
Converter CSV Writer
Batch Ingest Interface with IPP Naming
convention Uncompress batch Read batch Verify
Batch
Batch Verification Verify Manifest Verify FITS
Integrity Verify FITS Content Verify FITS
Data Handle Broken Cycle
Batch Conversion CSV Converter Binary
Converter batch_ready Interface with DLP
11DX-DLP Interface
- Directory structure on staging FS (LINUX)
- Separate directory for each JobID_BatchID
- Contains a batch_ready manifest file
- Name, rows and destination table of each file
- Contains one file per destination table in ODM
- Objects, Detections, other tables
- Creation of batch_ready file is signal to
loader to ingest the batch - Batch size and frequency of ingest cycle TBD
12Detailed Design
- Reuse SDSS software as much as possible
- Data Transformation Layer (DX) Interface to IPP
- Data Loading Pipeline (DLP)
- Data Storage (DS)
- Schema and Test Queries
- Database Management System
- Scalable Data Architecture
- Hardware
- Query Manager (QM CasJobs for prototype)
13Data Loading Pipeline (DLP)
- sqlLoader SDSS data loading pipeline
- Pseudo-automated workflow system
- Loads, validates and publishes data
- From CSV to SQL tables
- Maintains a log of every step of loading
- Managed from Load Monitor Web interface
- Has been used to load every SDSS data release
- EDR, DR1-6, 15 TB of data altogether
- Most of it (since DR2) loaded incrementally
- Kept many data errors from getting into database
- Duplicate ObjIDs (symptom of other problems)
- Data corruption (CSV format invaluable in
catching this)
14sqlLoader Design
- Existing functionality
- Shown for SDSS version
- Workflow, distributed loading, Load Monitor
- New functionality
- Schema changes
- Workflow changes
- Incremental loading
- Cross-match and partitioning
15sqlLoader Workflow
- Distributed design achieved with linked servers
and SQL Server Agent - LOAD stage can be done in parallel by loading
into temporary task databases - PUBLISH stage writes from task DBs to final DB
- FINISH stage creates indices and auxiliary
(derived) tables
- Loading pipeline is a system of VB and SQL
scripts, stored procedures and functions
16Load Monitor Tasks Page
17Load Monitor Active Tasks
18Load Monitor Statistics Page
19Load Monitor New Task(s)
20Data Validation
- Tests for data integrity and consistency
- Scrubs data and finds problems in upstream
pipelines - Most of the validation can be performed within
the individual task DB (in parallel)
21Distributed Loading
Samba-mounted CSV/Binary Files
Load Monitor
Master
LoadAdmin
Slave
Slave
LoadSupport
LoadSupport
LoadSupport
View of Master Schema
Task DB
Task DB
Task DB
Publish
Finish
22Schema Changes
- Schema in task and publish DBs is driven off a
list of schema DDL files to execute (xschema.txt) - Requires replacing DDL files in schema/sql
directory and updating xschema.txt with their
names - PS1 schema DDL files have already been built
- Index definitions have also been created
- Metadata tables will be automatically generated
using metadata scripts already in the loader
23Workflow Changes
LOAD
- Cross-Match and Partition steps will be added to
the workflow - Cross-match will match detections to objects
- Partition will horizontally partition data, move
it to slice servers, and build DPVs on main
Export
Check CSVs
Create Task DBs
Build SQL Schema
Validate
XMatch
PUBLISH
Partition
24Matching Detections with Objects
- Algorithm described fully in prototype section
- Stored procedures to cross-match detections will
be part of the LOAD stage in loader pipeline - Vertical partition of Objects table kept on load
server for matching with detections - Zones cross-match algorithm used to do 1 and 2
matches - Detections with no matches saved in Orphans table
25XMatch and Partition Data Flow
26Detailed Design
- Reuse SDSS software as much as possible
- Data Transformation Layer (DX) Interface to IPP
- Data Loading Pipeline (DLP)
- Data Storage (DS)
- Schema and Test Queries
- Database Management System
- Scalable Data Architecture
- Hardware
- Query Manager (QM CasJobs for prototype)
27Data Storage Schema
28PS1 Table Sizes Spreadsheet
29PS1 Table Sizes - All Servers
Table Year 1 Year 2 Year 3 Year 3.5
Objects 4.63 4.63 4.61 4.59
StackPsfFits 5.08 10.16 15.20 17.76
StackToObj 1.84 3.68 5.56 6.46
StackModelFits 1.16 2.32 3.40 3.96
P2PsfFits 7.88 15.76 23.60 27.60
P2ToObj 2.65 5.31 8.00 9.35
Other Tables 3.41 6.94 10.52 12.67
Indexes 20 5.33 9.76 14.18 16.48
Total 31.98 58.56 85.07 98.87
Sizes are in TB
30Data Storage Test Queries
- Drawn from several sources
- Initial set of SDSS 20 queries
- SDSS SkyServer Sample Queries
- Queries from PS scientists (Monet, Howell,
Kaiser, Heasley) - Two objectives
- Find potential holes/issues in schema
- Serve as test queries
- Test DBMS iintegrity
- Test DBMS performance
- Loaded into CasJobs (Query Manager) as sample
queries for prototype
31Data Storage DBMS
- Microsoft SQL Server 2005
- Relational DBMS with excellent query optimizer
- Plus
- Spherical/HTM (C library SQL glue)
- Spatial index (Hierarchical Triangular Mesh)
- Zones (SQL library)
- Alternate spatial decomposition with dec zones
- Many stored procedures and functions
- From coordinate conversions to neighbor search
functions - Self-extracting documentation (metadata) and
diagnostics
32Documentation and Diagnostics
33Data Storage Scalable Architecture
- Monolithic database design (a la SDSS) will not
do it - SQL Server does not have cluster implementation
- Do it by hand
- Partitions vs Slices
- Partitions are file-groups on the same server
- Parallelize disk accesses on the same machine
- Slices are data partitions on separate servers
- We use both!
- Additional slices can be added for scale-out
- For PS1, use SQL Server Distributed Partition
Views (DPVs)
34Distributed Partitioned Views
- Difference between DPVs and file-group
partitioning - FG on same database
- DPVs on separate DBs
- FGs are for scale-up
- DPVs are for scale-out
- Main server has a view of a partitioned table
that includes remote partitions (we call them
slices to distinguish them from FG partitions) - Accomplished with SQL Servers linked server
technology - NOT truly parallel, though
35Scalable Data Architecture
- Shared-nothing architecture
- Detections split across cluster
- Objects replicated on Head and Slice DBs
- DPVs of Detections tables on the Headnode DB
- Queries on Objects stay on head node
- Queries on detections use only local data on
slices
36Hardware - Prototype
Storage
S3 PS04
4
10A 10 x 13 x 750 GB 3B 3 x 12 x 500 GB
2A
Server Naming Convention
Function
S2 PS03
4
LX Linux L Load server S/Head DB server M
MyDB server W Web server
PS0x 4-core PS1x 8-core
2A
L2/M PS05
S1 PS12
8
4
A
2A
Head PS11
8
W PS02
4
LX PS01
L1 PS13
8
4
B
2B
2A
A
Web
Staging
Loading
DB
MyDB
Function
9 TB
39 TB
0 TB
10 TB
Total space
RAID10
RAID5
RAID10
RAID10
RAID config
12D/4W
14D/3.5W
Disk/rack config
37Hardware PS1
- Ping-pong configuration to maintain high
availability and query performance
- 2 copies of each slice and of main (head) node
database on fast hardware (hot spares)
- 3rd spare copy on slow hardware (can be just
disk) - Updates/ingest on offline copy then switch copies
when ingest and replication finished
- Synchronize second copy while first copy is
online - Both copies live when no ingest
38Detailed Design
- Reuse SDSS software as much as possible
- Data Transformation Layer (DX) Interface to IPP
- Data Loading Pipeline (DLP)
- Data Storage (DS)
- Schema and Test Queries
- Database Management System
- Scalable Data Architecture
- Hardware
- Query Manager (QM CasJobs for prototype)
39Query Manager
- Based on SDSS CasJobs
- Configure to work with distributed database, DPVs
- Direct links (contexts) to slices can be added
later if necessary - Segregates quick queries from long ones
- Saves query results server-side in MyDB
- Gives users a powerful query workbench
- Can be scaled out to meet any query load
- PS1 Sample Queries available to users
- PS1 Prototype QM demo
40ODM Prototype Components
- Data Loading Pipeline
- Data Storage
- CasJobs
- Query Manager (QM)
- Web Based Interface (WBI)
- Testing
41Spatial Queries (Alex)
42Prototype (Maria)
43Hardware/Scalability (Jan)
44How Design Meets Requirements
- Cross-matching detections with objects
- Zone cross-match part of loading pipeline
- Already exceeded requirement with prototype
- Query performance
- Ping-pong configuration for query during ingest
- Spatial indexing and distributed queries
- Query manager can be scaled out as necessary
- Scalability
- Shared-nothing architecture
- Scale out as needed
- Beyond PS1 we will need truly parallel query plans
45WBS/Development Tasks
2 PM 3 PM 1 PM 3 PM 3 PM 1 PM 2 PM 2 PM 2
PM 2 PM 4 PM 4 PM
Refine Prototype/Schema
Staging/Transformation
Initial Load
4 PM
Load/Resolve Detections
- Workflow Systems
- Logging
- Data Scrubbing
- SSIS (?) C
Resolve/Synchronize Objects
Create Snapshot
Replication Module
Query Processing
2 PM
Hardware
Redistribute Data
Total Effort 35 PM Delivery 9/2008
Documentation
Testing
46Personnel Available
- 2 new hires (SW Engineers) 100
- Maria 80
- Ani 20
- Jan 10
- Alainna 15
- Nolan Li 25
- Sam Carliles 25
- George Fekete 5
- Laszlo Dobos 50 (for 6 months)
47Issues/Risks
- Versioning
- Do we need to preserve snapshots of monthly
versions? - How will users reproduce queries on subsequent
versions? - Is it ok that a new version of the sky replaces
the previous one every month? - Backup/recovery
- Will we need 3 local copies rather than 2 for
safety - Is restoring from offsite copy feasible?
- Handoff to IfA beyond scope of WBS shown
- This will involve several PMs
48Mahalo!
49Query Manager
MyDB table that query results go into
Check query syntax
Name that this query job is given
Context that query is executed in
Get graphical query plan
Run query in quick (1 minute) mode
Query buffer
Load one of the sample queries into query buffer
Submit query to long (8-hour) queue
50Query Manager
51Query Manager
MyDB context is the default, but other contexts
can be selected
User can browse DB Views, Tables, Functions and
Procedures
The space used and total space available
Multiple tables can be selected and dropped at
once
Table list can be sorted by name, size, type.
52Query Manager
The query that created this table
53Query Manager
Context to run search on
Search radius
Table to hold results