The New SAS Engine For CRSP - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

The New SAS Engine For CRSP

Description:

6 14322 19251231 20010731 SEARS ROEBUCK & CO. 7 15069 19251231 20010731 U S X MARATHON GROUP ... 14322 SEARS ROEBUCK & CO 0.004601. 12079 GENERAL MOTORS CORP 0.010279 ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 12
Provided by: dan9154
Category:
Tags: crsp | sas | engine | new | roebuck

less

Transcript and Presenter's Notes

Title: The New SAS Engine For CRSP


1
The New SAS Engine For CRSP
  • Benefits of using SAS engine for CRSP
  • Setting up the interface between SAS and CRSP
  • Examples of use
  • Performance Benchmarks

2
Benefits
  • Access native CRSPAccess Databases.
  • Most SAS Dataset options work without
    modification (including keep, rename, and drop).
  • Native SAS datasets and SASECRSP dataset can be
    used in the same statements.
  • Procedures can directly access SASECRSP datasets.
  • Supports random access on PERMNO.
  • Supports global PERMNO specific date
    restrictions.
  • Supports functions and formats specific to CRSP
    trading calendars.
  • Often requires no intermediate files.
  • Saving coding time, CPU time, I/O, Disk space.

3
Setting up the Interface
  • After CRSP Stock Data and SAS/ETSTM 8.2 are
    installed independently. Two notes on
    installation -
  • CRSPDB_SASCAL environment variable must be set
    to you latest CRSP data.
  • SASECRSP is currently only available for
    Windows and HP/UX.
  • After that, the interface can be accessed using
    simple extensions to the SAS libname statement.
    For instance,
  • To access CRSP daily or monthly sample database
    respectively
  • libname dstk sasecrsp c\dasample\'
    setid10libname mstk sasecrsp c\masample\'
    setid20
  • The data files being accessed are the same ones
    that are accessed by FORTRAN, C, ts_print, or
    stk_print. Therefore disk space is only needed
    for one installation regardless of how users
    access the data. This saves not only disk space,
    but labor each time new data is shipped.

4
Printing HeadersExample 1 Use of keep
statement, and direct use of the proc print
procedure
  • proc print data mstk.stkhead (keep permno
    hcomnam begdt enddt)
  • Obs PERMNO BEGDT ENDDT
    HCOMNAM
  • 1 10107 19860331 20010731
    MICROSOFT CORP 2 10823 19251231
    19980529 KEYSPAN ENERGY CORP 3 12079
    19251231 20010731 GENERAL MOTORS CORP 4
    12490 19251231 20010731 INTERNATIONAL
    BUSINESS MACHS COR 5 14218 19251231
    19890428 R J R NABISCO INC 6 14322
    19251231 20010731 SEARS ROEBUCK CO 7
    15069 19251231 20010731 U S X MARATHON
    GROUP 8 18948 19300430 19961231
    ROCKWELL INTERNATIONAL CORP 9 24360
    19500731 20010731 KEYSPAN CORP10
    28353 19620131 20010731 RALSTON PURINA
    GROUP11 52070 19710331 19841031
    ELECTRONIC DATA SYS CORP12 66931
    19841031 19960628 GENERAL MOTORS CORP13
    68451 19851231 20010731 GENERAL
    MOTORS CORP14 75228 19850830
    20010731 IVAX CORP15 75390 19881031
    19990831 DEVON ENERGY CORP16 76597
    19910328 20001229 NABISCO GROUP HOLDING
    CORP17 78946 19930226 19970530
    DEAN WITTER DISCOVER CO18 79323
    19930630 20010731 ALLSTATE CORP19
    79348 19930831 19950531 RALSTON
    CONTINENTAL BAKING GROUP20 81191
    19950131 20001229 NABISCO HOLDINGS CORP21
    83596 19960628 20010731 ELECTRONIC
    DATA SYSTEMS CORP22 84381 19961231
    20010731 ROCKWELL INTERNATIONAL CORP NEW23
    85349 19971031 20010731 ARVINMERITOR
    INC24 86496 19990129 20010731
    CONEXANT SYSTEMS INC25 87137 19990831
    20010731 DEVON ENERGY CORP NEW

5
Average Returns Example 2 Demonstrates date
restriction on a libname, direct use of proc
means, and output of native SAS datasets
  • It is easy to restrict data to a specified date
    range before passing it to SAS for calculation.
  • To Calculate Average monthly returns for all the
    securities in the sample database for the period
    of January, 1996 December, 2000, use the
    following commands.
  • libname mth5yr sasecrsp 'c\masample\'
    range'19960131-20001229' setid20
  • proc means datamth5yr.ret noprint var ret by
    permno output meanavgret outavgret run

6
Works with Proc SQLExample 3 Demonstrates
SASECRSP and native SAS dataset in the same
statement.
  • / Join with the company names and restrict to
    securities with exactly five years (60 months) of
    non-missing returns, and sort output by the
    avgret. /
  • proc sql
  • select a.permno, a.hcomnam, b.avgret
  • from mth5yr.stkhead a, avgret b
  • where a.permnob.permno order and b._FREQ_ 60
    by avgret
  • run
  • PERMNO Company Name Header
    Returns
  • _______________________________________
    ___________________________________
  • 14322 SEARS ROEBUCK CO
    0.004601
  • 12079 GENERAL MOTORS CORP
    0.010279
  • 28353 RALSTON PURINA GROUP
    0.013447
  • 15069 U S X MARATHON GROUP
    0.013855
  • 79323 ALLSTATE CORP
    0.018369
  • 68451 GENERAL MOTORS CORP
    0.022518
  • 24360 KEYSPAN CORP
    0.024243
  • 75228 IVAX CORP
    0.026936
  • 12490 INTERNATIONAL BUSINESS MACHS COR
    0.027325

7
Performance MetricsExample 4 The SASECRSP
engines performs at near CRSPAccess C native
speed. This was run on the full December 2000
daily database on a local drive of Pentium III
733 MHz running Windows 2000. This example reads
over 50 million return observations, and
calculates the minimum, maximum, mean, and
standard deviation in approximately 2 ½ minutes.
  • libname dlyall sasecrsp 'c\da200012\' setid10
  • proc means datadlyall.ret
  • var ret
  • run
  • N Mean Std Dev Minimum
    Maximum
  • __________________________________________________
    _________________
  • 51398971 0.000845599 0.0442346
    -0.9716981 12.7777777
  • __________________________________________________
    _________________
  • (Reads 205 MB and over 51 million observations in
    2 minutes and 34 seconds)
  • WARNING Defaulting to selecting all PERMNOs in
    CRSP database.
  • NOTE There were 52365201 observations read from
    the dataset DLYALL.RET.
  • NOTE PROCEDURE MEANS used
  • real time 234.78
  • cpu time 231.12

8
Performance MetricsExample 5 This example
demonstrates that random access by PERMNO is very
efficient. Accessing the same underlying
51,000,000 observation dataset as the previous
example, accessing just a single permno, takes
just a faction of a second.
libname dlyibm sasecrsp 'c\da200012\' setid10
permno12490 proc means datadlyibm.ret var
ret run N Mean Std Dev
Minimum Maximum _______________________
___________________________________________ 9691
0.000533690 0.0159335 -0.2296296
0.1316364 _______________________________________
___________________________ (Reads one
security (IBM) and the 9,695 observations in 0.07
seconds) NOTE There were 9695 observations read
from the dataset DLYIBM.RET. NOTE PROCEDURE
MEANS used real time 0.07
seconds cpu time 0.04 seconds
9
Performance MetricsExample 6 This example
demonstrates how a full pass of the header data
can be made to restrict the data and then used to
create an input list with date ranges. This
input list is then used to restrict access to the
51,000,000 observation dataset from the first
performance example, and it access just the
securities and dates requested. Creating the
list of securities, takes less than two seconds.
The restricted list takes a little more than ½
minute.
  • / This example has two parts. First it does a
    subset to just companies that delist due to
    mergers and creates a dataset of PERMNO, delist
    date, and 200 trading days before the delist
    date. The second part on the next screen, then
    then subsets based on that universe of perms /
  • data mrgdt (keeppermno date1 date2)
  • set dlyall.stkhead
  • where dlstcd gt 200 and dlstcd lt 299
  • crspenddt crspdicd(enddt,0) / convert
    enddt to a CRSP calendar date /
  • date1 crspdcid(crspenddt - 200) /
    calculate 200 trading days prior to enddt
    and store in YYYYMMDD format /
  • if date1 ne . and date1 gt begdt /
    subsetting if - ignores companies that do
    not have 200 trading days of CRSP data available.
    /
  • date2 enddt
  • run
  • NOTE There were 7790 observations read from the
    dataset DLYALL.STKHEAD.
  • WHERE ((dlstcdgt200 and dlstcdlt299))
  • NOTE The dataset WORK.MRGDT has 7640
    observations and 3 variables.
  • NOTE DATA statement used
  • real time 1.21 seconds
  • cpu time 0.42 seconds

10
Performance MetricsExample 6 Continued A
new libname needs to be created using the mrgdt
dataset created in the previous step. Any
datasets accessed using the dlymrg libname will
automatically be restricted using random access
to PERMNOs in the mrgdt dataset, and then after
being read, but before being processed by SAS,
restricted to the security specific date range
defined by date1 and date2.
libname dlymrg sasecrsp 'c\da200012\' setid10
insetmrgdt,PERMNO,PERMNO,date1,date2 proc
means datadlymrg.ret var ret run N
Mean Std Dev Minimum
Maximum __________________________________________
____________ 1532194 0.0022860 0.0400804
-0.8750000 2.5000000
NOTE There
were 1535640 observations read from the dataset
DLYMRG.RET. NOTE PROCEDURE MEANS used
real time 34.18 seconds cpu time
11.19 seconds
11
CRSP User Support
Phone 773.834.1025930 A.M. 200 P.M. Central
Time Monday-Friday, Business Days Email
support_at_crsp.uchiccago.edu User Support Web Page
Form http//www.crsp.uchicago.edu/support/tech_hl
p.html
Write a Comment
User Comments (0)
About PowerShow.com