Title: DDS, A Seismic Processing Architecture
1DDS, A Seismic Processing Architecture
- Reproducible research workshop UBC, Vancouver,
2006 -
-
- Randall L. Selzler RSelzler _at_
Data-Warp.com - Jerry Ehlers Jerry.Ehlers _at_
BP.com - Joseph A. Dellinger Joseph.Dellinger _at_
BP.com
2DDS ORIGINS Amoco TRC, early 90s
- DDS began at the Amoco Tulsa Research Center at a
time of great organizational strain. - The job of the TRC was to do research and crunch
data, not to write software. - Creating software is expensive!
- Amocos solution was an edict that
- everyone will use DISCO, or else.
3Else!
- But DISCO just wasnt good enough!
- And so chaos ensued...
- We were mired in seismic processing diversity.
- DDS grew up surrounded by
- USP (Amoco internal trace-header based)
- SEPlib (ASCII header pointing to data cubes)
- SU (SEGY trace-header based)
- DISCO (proprietary monitor-based system)
- .... and needed to be compatible with all of
these!
4- Although formally cast as a research group,
in fact the TRC also functioned as an internal
contractor processing shop. - 1) So to catch on, not only would any software
have to be usable for quick-turnaround research,
but - 2) the ability to process large datasets
efficiently and in parallel was also of vital
importance. - Terabytes of data, Connection Machines, MPI,
OpenMP - 3) The group had accumulated a considerable
number and variety of computers. All Unix, but
- CM5, Cray, Sun, SGI, Linux, Linux clusters,
32 and 64 bit... - 4) Finally, there was an urgent need for software
that could accomodate all the various mutant SEGY
formats coming into the shop, as well as DISCO,
SEPlib, SU, and USP!
5and out of the chaos came...
- John Etgen was using SEPlib for migration
algorithm research on the CM200, a machine that
required massively parallel data I/O. - He showed SEPlib to Randy Selzler
- I want something that looks like THIS, but
can handle the large industrial-strength jobs I
need to do! - And thus DDS was born...
6How SEPlib did it
header file
data file
... processing history ... esize4
(bytes) data_formatxdr_float indata_location n1
trace_length n2number_traces_per_record n3numbe
r_records d1sample_interval o1starting sample
etc...
regularly sampled cube of IEEE 4-byte floats
of dimension n1 x n2 x n3
SEPlib was the system favored by the folks
writing programs that worked on large data
volumes instead of individual traces.
7DDS can look a lot like SEPlib
SEPlib header file
DDS dictionary file
... processing history ... typefloat4 formatfcu
be data data location axis t offset
cdp size.t trace length size.offsetnumber
traces per record size.cdp number
records delta.t sample_interval origin.t
starting sample units.t seconds etc...
... processing history ... esize4
(bytes) data_formatxdr_float indata_location
n1trace_length n2number_traces_per_record n3num
ber_records d1sample_interval o1starting
sample label1seconds etc...
8DDS can look a lot like SEPlib
data file
regularly sampled cube of IEEE 4-byte floats of
dimension size.t x size.offset x size.cdp
(command-line arguments look a LOT like SEPlib
too)
9DDSs Generalizations
- N-Dimensional Array of I/O Records
- Densely populated for random access
- Sequential access if sparse
- Meaningful Axis Names
- t, x, y, z, w, kx, ky, kz, cmp, shot, offset,
- Extensible Axis Attributes
- Regular grid (size, origin, delta, units, )
- Variable grid (grid.z 1 3 5 7 11, )
- Non-numeric (label.attr Vp Vs rho)
Dictionary
axis t y cmpsize.t 1000size.y
96size.cmp 24delta.t 0.008units.t
sorigin.y 5000units.y mformat
segydata oak39__at_
Great for research! Exotic algorithms and
unforeseen domains can be accurately represented
and processed as easily as traditional ones.
Binary Data
Card HeaderLine Header Traces
10How USP did it
USP-format data file
Unix Seismic Processing USP was
Amocos internally home-grown trace-based
processing system, beloved of Amocos signal
processors. USP is similar to SU
in concept. USP uses longer trace headers than
SU, but they still turned out to not be long
enough! USP is still used as much as ever today.
historical line header (processing history and 3
data dimensions)
element count trace header trace samples
traces
element count trace header trace samples
element count trace header trace samples
...
11SU and USP use fixed-format trace headers defined
by include files
/ hdr.h SU include file for segy offset
array / static struct char key char
type int offs hdr
"tracl", "i", 0, "tracr", "i", 4,
"fldr", "i", 8, "tracf", "i", 12,
"ep", "i", 16, "cdp", "i", 20,
"cdpt", "i", 24, "trid", "h", 28,
"nvs", "h", 30,
"nhs", "h", 32, "duse", "h", 34,
"offset", "i", 36, "gelev", "i", 40,
"selev", "i", 44, "sdepth", "i", 48,
"gdel", "i", 52, ...
12DDS also plays well with USP
DDS dictionary file
USP-format data file
typefloat4 formatusp data data
location axis t offset cdp comp size.t trace
length size.offsetnumber traces per
record size.cdp number records size.comp number
components delta.t sample_interval origin.t
starting sample units.t seconds etc...
line header (three dimensions)
element count trace header trace samples
traces
element count trace header trace samples
element count trace header trace samples
...
DDS knows what USP headers look like!
13and SEGY...
SEGY-format data file
DDS dictionary file
EBCDIC cards binary header
trace header IBM-format samples
traces
trace header IBM-format samples
trace header IBM-format samples
...
Note DDS only bothers to convert back to SEGYs
archaic IBM floats when writing to disk!
14DDS can speak SU
note input format auto-detected
editd inminute2.usp \ 3s16 3e16 2s2
2e32 2i2 \ out_format su \
out_data stdout \ supswigp clip.2 gt
wiggle.ps
15DDS dictionaries can point at dictionaries!
dict.comp1
dict.comp2
16DDS plays well with mutant SEGY
bridge in Atlantis_EQ.segy \
in_formatsegy \
out_formatusp \ comment"Component Type"
\ mapsegyusp.RcComp
"TotalStatic" \ \ comment"Src and rec
locations" \
mapsegyusp.SrPtXC "SrcX / 10" \
mapsegyusp.SrPtYC "SrcY / 10" \
mapsegyusp.SrPtEl "15" \
mapsegyusp.ShtDep "SrcDepth / 10" \ \
mapsegyusp.RcPtXC "GrpX / 10"
\ mapsegyusp.RcPtYC "GrpY /
10" \ mapsegyusp.GrpElv
"Spare.I410 / 10" \
mapsegyusp.CabDep "Spare.I410" \
mapsegyusp.DstSgn "DstSgn / 10"
\ \ comment"Rec point and line numbers"
\ mapsegyusp.DpPtLn
"Spare.I48" \
mapsegyusp.DpPtLt "Spare.I49"
\ \ comment"Dead or Live" \
mapsegyusp.StaCor '( TrcIdCode - 1 ) 30000'
\ \ editd in stdin 3e106
out_data raw.usp
straight map
fixed number
arithmetic calculation
17Data formats and mappings
- This is how DDS differs from SEPlib...
- The properties of the binary data, and all the
elements within the binary data, are looked up in
the dictionary. - Even the array of trace samples is just another
trace field as far as DDS is concerned. - DDS knows a few default formats, but can use any
format that you can define. - It can also map to and from any format that you
can define the necessary mappings for. - This has the important side effect of documenting
the data format, making future reproducibility
possible
18DDS supports generic formats
In fact, besides having a few built-in default
formats such as USP, SU, and SEGY that are
convenient for geophysicists, there is nothing
in the core of DDS that limits it to being a
seismic processing system!
19Internal data formats
- Programs can define their own internal data
formats as well, simply by writing definitions
into their own internal dictionary - fdds_printf (MOD_FIELD, float
MyHeader1, MyHeader2\n\0) - DDS will then convert from the format of the
data, as documented by its dictionary, to the
internal format specified by the program. - On output, the internal format will be converted
back into whatever output format has been
requested on the command line, or by default, the
output format will be the same as the input
format.
20Leverage Diversity? Interoperate! Data handling
is fundamental
Format and API EmulationWith Random Access I/O
USP Re-link1998 Proofof Concept
DISCO Support1997-2003
DDSApplication
Generic I/O
API Emulation
Foreign Library
Foreign Format
21Are you scared yet?
- You can probably imagine that all this
translating between formats can get very
complicated...
... fmtSAMPLE_TYPE typedef float4
SAMPLE_TYPE
fmtUSP_ADJUST typedef enum4 USP_LINE_PAD \
0, USP_TRACE_PAD \ 0, USP_HLH_SIZE \
2236 USP_ADJUST
fmtSEQUENCE typedef USP_TRACE SEQUENCE
aliasfmtUSP_TRACE_PAD
fmtUSP_ADJUST aliasfmtUSP_HLH_SIZE
fmtUSP_ADJUST aliasfmtUSP_LINE_PAD
fmtUSP_ADJUST usp_NumRec 2056 ...
But still better than having to change your code
or relink your code for every different mutant
data format! It also makes it possible
to interoperate with historical data formats
without too much pain.
22DDS scripting as a Rosetta stone
/apps/global/bin/bridge \ in
/hpc/dat13/zdsr01/Node/EQ/all.segy \
in_formatsegy out_formatusp
\ comment"Component Type" \
mapsegyusp.RcComp "TotalStatic"
\ comment"Src and rec locations" \
mapsegyusp.SrPtXC "SrcX / 10" \
mapsegyusp.SrPtYC "SrcY / 10" \
mapsegyusp.SrPtEl "15" \
mapsegyusp.ShtDep "SrcDepth / 10"
\ comment"Azimuth, Roll Tilt" \
mapsegyusp.TVPT01 "100 Spare.F411" \
mapsegyusp.TVPT02 "100
Spare.F412" \
mapsegyusp.TVPT03 "100 Spare.F413"
\ comment"Dead or Live" \
mapsegyusp.StaCor '( TrcIdCode - 1 ) 30000'
\ comment"Shot Time" \ mapsegyusp.TVPT15Da
te.DateYear \ mapsegyusp.TVPT16Date.DateDay
\ mapsegyusp.TVPT17Date.DateHour
\ mapsegyusp.TVPT18Date.DateMin
\ mapsegyusp.TVPT19Date.DateSec \ ....
23In Conclusion caveats
- Things arent so complicated if you use DDS as if
it were SEPlib, but then whats the point? - Because so much functionality already exists in
USP, there has been little motivation to flesh
out DDS. - The external distribution is a subset of the same
stuff we use internally. There has been little
effort put into improving the packaging. - While there is some documentation, it is somewhat
lacking!
24In Conclusion upsides
- The software infrastructure inside BP today is
based almost entirely on DDS and USP. It is BPs
infrastructure both for research and for
processing. BPs advanced imaging team in Houston
is BPs largest contractor. - The DDS I/O library was released publicly in 2003
on freeusp.org. The core of the USP system was
released a year or so earlier on the same web
site, along with some ARCO-heritage processing
systems as well. - By releasing USP and DDS, BP hoped to make it
easier to share algorithms with academia and
contractors. - Randy Selzler now wants to create a successor to
DDS, but thats his talk, as the prophet, to
give...