Title: File and Data Conversion
1File and Data Conversion
- Jonathan Carter
- NERSC User Services
- jcarter_at_nersc.gov
- 510-486-7514
2Introduction
- Converting file and data for use on the IBM SP
- IBM uses IEEE data representation
- Industry standard Fortran unformatted file
structure - Tools available on the Cray systems
- Tools available on the IBM SP
3Demand for File Conversion
- Currently, CTSS text files
- ctou, rlib will be available on the IBM SP
- After decommissioning the Cray Systems in October
2002 - Cray Fortran unformatted files
- Cray C binary files
4Tools on the Cray Systems - FFIO
- Flexible File I/O - general system of specifying
how data should be written or read - Can be used without recompiling or linking
(Fortran) - Can be changed at runtime
- Various layers available to convert both file
structure and data - Controlled via the assign command
5assign Command
- Can specify how I/O is done
- On a Fortran unit basis assign F f77 u10
- On a filename basis assign F f77 ffilename
- Common options
- Clear assigns assign -R
- See current assigns in effect assign -V
6Fortran Unformatted Sequential-access Files
- Cray uses a vendor specific format called COS
blocked, or simply blocked - IBM (and most Unix vendors) use f77 blocking
- Use F f77 option to have the FFIO f77 blocking
layer used instead of the default COS blocking - assign F f77 u10
- T3E already uses IEEE arithmetic, so F f77 is
sufficient - Note that default real and integer data types on
the T3E are 64 bit - SV1 data needs to be converted, so an IEEE
conversion layer is needed - -N ieee performs basic conversion
- assign F f77 -N ieee ffilename
7Fortran Unformatted Direct-access Files
- Files are not blocked on Cray or IBM
- Data conversion layers can be used as in
sequential-access files for the SV1 machines - assign -N ieee u20
- T3E files dont need any conversion
8C Binary Files
- Files are not blocked on Cray or IBM
- FFIO conversion layer not easy to use
- Use library routines such as cry2cri
9Using FFIO to Convert a File
- Isolate I/O statements for the file from program
to make a simple conversion program - Pair each read with a write
- Use assign to have all written data converted, or
use data conversion routines
10Tools on the IBM SP - NCARU Library
- Library developed by the SCD at NCAR
- Read COS blocked file
- Convert Cray data to IEEE data
- Does not use Fortran API, so program modification
is required - Basic calls are crayopen, crayread, crayrew,
crayback, crayclose - Calls to crayread can convert data if record is
composed of one data type only, otherwise user
must handle explicitly - Conversion routines are ctodpf, ctospf, ctospi
- Cray Fortran I/O sometimes inserts padding, user
must handle explicitly
11Using the NCARU Library
- To use
- module load ncaru
- xlf -o a.out b.f NCARU
- Limitations
- 2GB limit for unblocked files
- Currently no 64 bit address space support
- Not thread-safe
- No support for 128 bit data
12Dealing with Different Files
- Open using blocked option to crayopen for Fortran
unformatted sequential access, open with
unblocked option for Fortran unformatted direct
access - If written on the SV1 use conversion option on
read, or call conversion routines directly - C binary files can be read by the unblocked I/O
calls or by usual C I/O followed by data
conversion routines
13Records with Mixed Data Types
- Read into a buffer and convert items one by one
real x(50) integer n(50) real8 buffer(100) !
open in blocked mode ifc crayopen(filename,10,
0) ! read record without converting nwds
crayread(ifc,buffer,100,0) ! convert data call
ctospf(buffer,x,50) call ctospi(buffer(51),n,50)
14Data Padding
- With Cray Fortran I/O, extra bytes are inserted
into the user data. - In cases where padding occurs, bytes are inserted
so that any datum of length 8 bytes is at a byte
offset, which is measured from the beginning of
the record, that is a multiple of 8 bytes. Then
the end of the record is padded so that the whole
record length is a multiple of 8. - Padding will only occur if you have used
character variables that are not of lengths that
are a multiple of 8 or have used real4 or
integer4 data on the T3E (on the SV1 systems, 8
bytes are used).
15Example
A Fortran record is written on an SV1
real a(50) integer n(50) character17
label write(50) n, a, label
The lengths of n, a, and label are 8 bytes, 8
bytes, and 17 bytes respectively. Within the
Fortran record, n starts at offset 0, a at offset
400, and label at offset 800. The only padding
that occurs is at the end of the record, where 7
bytes are added to make the total record length
816 bytes, which is a multiple of 8.
16Example
A Fortran record is written on an SV1
real a(50) integer n(50) character17
label write(50) label, n, a
Without padding, the alignments are label at
offset 0, a at offset 17, and n at offset 417.
Since a has elements of length 8 bytes, it must
be written at an offset that is a multiple of 8
bytes therefore a pad of 7 bytes is inserted
between the end of label and the beginning of a.
In the record that is written to the file, the
alignments are label at offset 0, a at offset 24,
and n at offset 424.
17Example
A Fortran record is written on the T3E
real a(40), b(40) integer4 n(13),
m(13) character12 label write(50) label, n, a,
m, b
The data has lengths label 12 bytes, n and m 52
bytes, and a and b both 320 bytes. Without
padding, the alignments are label at offset 0, n
at offset 12, a at offset 64, m at offset 384,
and b at offset 436. a and b need to be at
offsets that are a multiple of 8 bytes the
offset of a is already correct, but 4 bytes must
be inserted before b, so that it starts at offset
440.
18crayconv Utility
- crayconv automatically converts files written on
the SV1 to IBM compatible format - Basic Fortran data types only
- Sequential access unformatted files only
- Possible problem if compiler option -Onofastint
used, or integer8 explicitly declared and
written-- Integers over 246 not correctly
interpreted - Pad data not removed
- Extension to T3E data and direct access
unformatted files planned
19More Information
- http//hpcf.nersc.gov/computers/SP/ffio.html
- -by Mike Stewart
- http//hpcf.nersc.gov/computers/crayretire.html
- man ncaru
20(No Transcript)