Title: The New SAS Engine For CRSP
1The New SAS Engine For CRSP
- Benefits of using SAS engine for CRSP
- Setting up the interface between SAS and CRSP
- Examples of use
- Performance Benchmarks
2Benefits
- Access native CRSPAccess Databases.
- Most SAS Dataset options work without
modification (including keep, rename, and drop). - Native SAS datasets and SASECRSP dataset can be
used in the same statements. - Procedures can directly access SASECRSP datasets.
- Supports random access on PERMNO.
- Supports global PERMNO specific date
restrictions. - Supports functions and formats specific to CRSP
trading calendars. - Often requires no intermediate files.
- Saving coding time, CPU time, I/O, Disk space.
3Setting up the Interface
- After CRSP Stock Data and SAS/ETSTM 8.2 are
installed independently. Two notes on
installation - - CRSPDB_SASCAL environment variable must be set
to you latest CRSP data. - SASECRSP is currently only available for
Windows and HP/UX. - After that, the interface can be accessed using
simple extensions to the SAS libname statement.
For instance, - To access CRSP daily or monthly sample database
respectively - libname dstk sasecrsp c\dasample\'
setid10libname mstk sasecrsp c\masample\'
setid20 - The data files being accessed are the same ones
that are accessed by FORTRAN, C, ts_print, or
stk_print. Therefore disk space is only needed
for one installation regardless of how users
access the data. This saves not only disk space,
but labor each time new data is shipped.
4Printing HeadersExample 1 Use of keep
statement, and direct use of the proc print
procedure
- proc print data mstk.stkhead (keep permno
hcomnam begdt enddt) - Obs PERMNO BEGDT ENDDT
HCOMNAM - 1 10107 19860331 20010731
MICROSOFT CORP 2 10823 19251231
19980529 KEYSPAN ENERGY CORP 3 12079
19251231 20010731 GENERAL MOTORS CORP 4
12490 19251231 20010731 INTERNATIONAL
BUSINESS MACHS COR 5 14218 19251231
19890428 R J R NABISCO INC 6 14322
19251231 20010731 SEARS ROEBUCK CO 7
15069 19251231 20010731 U S X MARATHON
GROUP 8 18948 19300430 19961231
ROCKWELL INTERNATIONAL CORP 9 24360
19500731 20010731 KEYSPAN CORP10
28353 19620131 20010731 RALSTON PURINA
GROUP11 52070 19710331 19841031
ELECTRONIC DATA SYS CORP12 66931
19841031 19960628 GENERAL MOTORS CORP13
68451 19851231 20010731 GENERAL
MOTORS CORP14 75228 19850830
20010731 IVAX CORP15 75390 19881031
19990831 DEVON ENERGY CORP16 76597
19910328 20001229 NABISCO GROUP HOLDING
CORP17 78946 19930226 19970530
DEAN WITTER DISCOVER CO18 79323
19930630 20010731 ALLSTATE CORP19
79348 19930831 19950531 RALSTON
CONTINENTAL BAKING GROUP20 81191
19950131 20001229 NABISCO HOLDINGS CORP21
83596 19960628 20010731 ELECTRONIC
DATA SYSTEMS CORP22 84381 19961231
20010731 ROCKWELL INTERNATIONAL CORP NEW23
85349 19971031 20010731 ARVINMERITOR
INC24 86496 19990129 20010731
CONEXANT SYSTEMS INC25 87137 19990831
20010731 DEVON ENERGY CORP NEW
5Average Returns Example 2 Demonstrates date
restriction on a libname, direct use of proc
means, and output of native SAS datasets
- It is easy to restrict data to a specified date
range before passing it to SAS for calculation.
- To Calculate Average monthly returns for all the
securities in the sample database for the period
of January, 1996 December, 2000, use the
following commands. - libname mth5yr sasecrsp 'c\masample\'
range'19960131-20001229' setid20 - proc means datamth5yr.ret noprint var ret by
permno output meanavgret outavgret run
6Works with Proc SQLExample 3 Demonstrates
SASECRSP and native SAS dataset in the same
statement.
- / Join with the company names and restrict to
securities with exactly five years (60 months) of
non-missing returns, and sort output by the
avgret. / - proc sql
- select a.permno, a.hcomnam, b.avgret
- from mth5yr.stkhead a, avgret b
- where a.permnob.permno order and b._FREQ_ 60
by avgret - run
- PERMNO Company Name Header
Returns - _______________________________________
___________________________________ - 14322 SEARS ROEBUCK CO
0.004601 - 12079 GENERAL MOTORS CORP
0.010279 - 28353 RALSTON PURINA GROUP
0.013447 - 15069 U S X MARATHON GROUP
0.013855 - 79323 ALLSTATE CORP
0.018369 - 68451 GENERAL MOTORS CORP
0.022518 - 24360 KEYSPAN CORP
0.024243 - 75228 IVAX CORP
0.026936 - 12490 INTERNATIONAL BUSINESS MACHS COR
0.027325
7Performance MetricsExample 4 The SASECRSP
engines performs at near CRSPAccess C native
speed. This was run on the full December 2000
daily database on a local drive of Pentium III
733 MHz running Windows 2000. This example reads
over 50 million return observations, and
calculates the minimum, maximum, mean, and
standard deviation in approximately 2 ½ minutes.
- libname dlyall sasecrsp 'c\da200012\' setid10
- proc means datadlyall.ret
- var ret
- run
- N Mean Std Dev Minimum
Maximum - __________________________________________________
_________________ - 51398971 0.000845599 0.0442346
-0.9716981 12.7777777 - __________________________________________________
_________________ - (Reads 205 MB and over 51 million observations in
2 minutes and 34 seconds) - WARNING Defaulting to selecting all PERMNOs in
CRSP database. - NOTE There were 52365201 observations read from
the dataset DLYALL.RET. - NOTE PROCEDURE MEANS used
- real time 234.78
- cpu time 231.12
8Performance MetricsExample 5 This example
demonstrates that random access by PERMNO is very
efficient. Accessing the same underlying
51,000,000 observation dataset as the previous
example, accessing just a single permno, takes
just a faction of a second.
libname dlyibm sasecrsp 'c\da200012\' setid10
permno12490 proc means datadlyibm.ret var
ret run N Mean Std Dev
Minimum Maximum _______________________
___________________________________________ 9691
0.000533690 0.0159335 -0.2296296
0.1316364 _______________________________________
___________________________ (Reads one
security (IBM) and the 9,695 observations in 0.07
seconds) NOTE There were 9695 observations read
from the dataset DLYIBM.RET. NOTE PROCEDURE
MEANS used real time 0.07
seconds cpu time 0.04 seconds
9Performance MetricsExample 6 This example
demonstrates how a full pass of the header data
can be made to restrict the data and then used to
create an input list with date ranges. This
input list is then used to restrict access to the
51,000,000 observation dataset from the first
performance example, and it access just the
securities and dates requested. Creating the
list of securities, takes less than two seconds.
The restricted list takes a little more than ½
minute.
- / This example has two parts. First it does a
subset to just companies that delist due to
mergers and creates a dataset of PERMNO, delist
date, and 200 trading days before the delist
date. The second part on the next screen, then
then subsets based on that universe of perms / - data mrgdt (keeppermno date1 date2)
- set dlyall.stkhead
- where dlstcd gt 200 and dlstcd lt 299
- crspenddt crspdicd(enddt,0) / convert
enddt to a CRSP calendar date / - date1 crspdcid(crspenddt - 200) /
calculate 200 trading days prior to enddt
and store in YYYYMMDD format / - if date1 ne . and date1 gt begdt /
subsetting if - ignores companies that do
not have 200 trading days of CRSP data available.
/ - date2 enddt
- run
- NOTE There were 7790 observations read from the
dataset DLYALL.STKHEAD. - WHERE ((dlstcdgt200 and dlstcdlt299))
- NOTE The dataset WORK.MRGDT has 7640
observations and 3 variables. - NOTE DATA statement used
- real time 1.21 seconds
- cpu time 0.42 seconds
10Performance MetricsExample 6 Continued A
new libname needs to be created using the mrgdt
dataset created in the previous step. Any
datasets accessed using the dlymrg libname will
automatically be restricted using random access
to PERMNOs in the mrgdt dataset, and then after
being read, but before being processed by SAS,
restricted to the security specific date range
defined by date1 and date2.
libname dlymrg sasecrsp 'c\da200012\' setid10
insetmrgdt,PERMNO,PERMNO,date1,date2 proc
means datadlymrg.ret var ret run N
Mean Std Dev Minimum
Maximum __________________________________________
____________ 1532194 0.0022860 0.0400804
-0.8750000 2.5000000
NOTE There
were 1535640 observations read from the dataset
DLYMRG.RET. NOTE PROCEDURE MEANS used
real time 34.18 seconds cpu time
11.19 seconds
11CRSP User Support
Phone 773.834.1025930 A.M. 200 P.M. Central
Time Monday-Friday, Business Days Email
support_at_crsp.uchiccago.edu User Support Web Page
Form http//www.crsp.uchicago.edu/support/tech_hl
p.html