Ch' 2: Getting Your Data into SAS - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Ch' 2: Getting Your Data into SAS

Description:

... Windows these are just collections of SAS data sets stored in directories. ... With v8 we can now define SAS data libraries that include multiple directories. ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 34
Provided by: Blo100
Category:

less

Transcript and Presenter's Notes

Title: Ch' 2: Getting Your Data into SAS


1
Ch. 2 Getting Your Data into SAS
  • At OSEDA we try to ease the burden of converting
    data by having data already in SAS data set
    format in the archive.
  • But it is still important that you know how to
    convert your own.
  • If you understand input statements and proc
    import, youll be well on your way to how to use
    put statements and proc export.

2
Four Basic Methods
  • Direct data entry via Viewtable or SAS/FSP
    windows.
  • Reading raw data files (ascii) with input
    statements.
  • Converting files from other packages such as dbf,
    xls, wkn -- usually via proc import.
  • Accessing data in other DB formats directly via
    special engines that make them look like SAS data
    sets. (Oracle access, for example.)

3
Entering Data Directly via Windows
  • Not used much here. This is usually done as part
    of an on-line transaction system, which is not
    something we do much with SAS.
  • It can be done, however, and can be quite
    sophisticated.

4
Reading Raw Data with Input Stmts
  • This has been traditionally the most important
    way we capture our data.
  • This requires the most knowledge of SAS language
    to do. Sometimes the format of the data makes it
    quite challenging.
  • Reading .csv files has become very important and
    common. SAS makes it pretty easy.

5
Converting xls, dbf, etc.
  • DDE (Windows only) option. Requires having the
    other application running during conversion.
  • Proc import and the Import Data Wizard is
    easiest.
  • Exporting the other applications data to csv and
    then importing remains an option.

6
Reading SPSS, Oracle, etc.
  • We now have special engines that can be used to
    make things such as Oracle tables look just like
    a SAS data set.
  • We have had good success accessing Oracle tables
    via the engine.
  • Nice part of doing it this way is that if the
    source data changes you do not have to reconvert.
    Sometimes, thats a problem.

7
Reading Raw Data Files
  • For small data collections you can imbed the data
    lines in your own code, preceded by a datalines
    statement.
  • Most of the time youll be dealing with data
    stored in external files.
  • The infile statement is used to point SAS to the
    file to be read. The input statement does the
    actual reading.

8
Accessing the Sample Code
  • Go to URLhttp//ftp.sas.com/samples/A56649
  • Cut and paste to get individual programs

9
Long Records
  • An annoying gotcha is the default of truncating
    input lines at 256 bytes.
  • Easy to circumvent if you know to code
    lrecl2000 (or some large value) as option on
    the infile statement. Sets up the buffer size -
    it does not mean the records have to really be
    that long. They just cant be longer.

10
Space Separated Data
  • Avoid this except for really small sets of data
    that you are keying in yourself.
  • Too easy for something to go wrong.
  • Character data containing blanks are a big
    problem.

11
Reading Fixed Format File
  • These are files where the fields are in the
    same column locations on each input line.
  • Read using a combination of column and formatted
    read specifications.
  • A good idea is to use formatted input for
    everything. Keeps it simple.

12
Formatted vs. Column Input
  • Input id 5. 1 year 4. 1 sales 7.
  • Input id 1-5 year 7-10 sales 12-18
  • input id char5. _at_7 year 4. _at_12 sales 7.
  • input id char5. 1 year 4. 1 sales 7 -!-
  • vs. free format
  • input id year sales

13
SAS builtin formats
  • Used within input (and put) statements (and
    functions) to convert to/from external formats to
    SAS internal formats.
  • always used as 1st char of a character format
    or informat.
  • Most references to formats require a period at
    the end to distinguish what it is.
  • Input dob date7 lt--what happens?-

14
Rich Collection of Formats
  • As seen on pp. 34-35 of TLSB.
  • Most of these can be used to write data (in a put
    statement) as well as read data. They just
    reverse the conversion process.
  • Note the wide assortment of data and time related
    formats.

15
Converting Raw Data at OSEDA
  • Major purpose of creating data archive is to make
    the expensive and error-prone task of getting the
    data properly converted something that is done
    just once and by pro. programmers.
  • Goal is to replace infile/input with set
    statements as much as possible. Make
    record-layout type documentation obsolete.

16
Need to Carefully Review
  • It is extremely easy to make a mistake when
    coding complex input statements.
  • Always review the resulting data set(s) very
    carefully before going on.
  • Use Proc Contents and Proc Print in batch, or the
    corresponding interactive windows in DM to
    carefully inspect data sets.

17
SAS Data Sets
  • Absolutely essential that you become very
    comfortable with what these are, how they are
    created, referenced, etc.
  • For some reason, many new users see the way SAS
    handles data set references as really hard.
  • But its really pretty simple.

18
New in V7 Data Set Literals
  • Starting with v7 you can code data
    c\temp\boone_county
  • Before v7 this was heresy. You always had to
    define the data library separately and use a
    2-level name to reference a data set within that
    directory libname tempsas c\temp data
    tempsas.boone_county

19
One Level Data Set Names
  • References to SAS data sets that have only one
    level (I.e. no period such as save.set1) are
    assumed to be stored in a special temporary SAS
    data library named work.
  • An important exception to this is if you define
    the specific user libref. If such a library is
    defined, SAS will store and look for all 1-level
    names in that data library.

20
SAS Data Libraries
  • On Unix and Windows these are just collections of
    SAS data sets stored in directories.
  • At least it used to be that simple. With v8 we
    can now define SAS data libraries that include
    multiple directories.
  • But the latter kind are still rare around here.

21
Creating a Permanent SAS Data Set
  • Typically involves something such as libname
    mysas s\myname\mysas data mysas.data0901
  • The directory specified in quotes in the libname
    statement must be allocated outside SAS (using
    mkdir or the equivalent.)
  • The data set is stored as a system file within
    that directory with name data0901. The extension
    will vary with the engine/SAS version/platform.

22
Entering Data with VT Window
  • I have never done this or known anyone who has.
  • One of our goals at OSEDA is to be able to have a
    record of how the data we use relates to the
    original source.
  • Keying in data like this can be tricky in that it
    is way too easy to make a mistake that the
    software cannot detect.

23
Reading Multiple Input Lines
  • The input statement typically reads one line of
    data to create one observation in a SAS data set.
    However, .
  • You can use the / to tell SAS to go to the next
    input line, or 2 (or 3, etc.) to tell it to
    position itself at the 2nd line (relative to
    where it began reading.)

24
Load Data and Code for 2.12
  • Go to the sample code at http//ftp.sas.com/sample
    s/A56649
  • Search for text 2.12
  • Go to Windows to create c\MyRawData.
  • Type DM command note rawdata to open a SAS
    notepad window.
  • Copy the data from your browser and paste it to
    the notepad window.

25
Data Code for 2.12 - cont.
  • Use File-Save as to save the contents of the
    notepad window to c\MyRawData\temperature.dat
  • Go back and get the sample data step code from
    the browser. Come back to SAS and paste the code
    into the editor window.
  • Submit the code.

26
Multiple Obs Per Line of Raw Data
  • Use the double trailing _at_ to tell SAS to stay on
    this line, even across cycles of the DATA step.
    (Default is to always flush the current record at
    the end of a d-s cycle.)
  • data scores input name score
    _at__at_datalinesMike 25 Samuel 36 Melanie 40 Me
    2 proc print sum score run

27
Trailing _at_ to Read Part of Line
  • A single trailing _at_ in an input stmt says to
    leave the data pointer where it is.
  • Subsequent input statements in the same step will
    pick up where last input left off.
  • But record is released at end of data step cycle
    (unlike with double trailing _at_).
  • Allows you to read part of a record and then
    conditionally continue reading, or have
    alternative reads based on record type.

28
Infile Statement Options
  • Firstobs and obs commonly used to begin/end
    at specific locations within file.
  • Missover, truncover, and stopover (not in TLSB)
    specify how to handle case where SAS needs to go
    to a new line to complete the input read.

29
Reading Delimited (.csv) Files
  • You almost always want to use the dsd option.
    Add option dlm09x for tab-delimited files.
  • A problem with reading this way is that character
    variables may not get right length imputed.
  • Strongly recommend declaring all vars before
    reading (prior to input statement.)

30
Example of Reading csv File
  • data class
  • length stud_id 6 Name 24 Address 40 City 20
    State 2 zip 5 gpa 5
  • infile datalines dsd
  • input stud_id -- gpa
  • put / _all_
  • datalines
  • 001234, Joe Smith,123 S 5th St,Columbia,MO,65201,3
    .0
  • 003456, Mary Jones,909 Nifong, Jefferson
    City,MO,65103,3.5
  • run

31
Running the Samples
  • We have captured the sample code for the text
    (TLSB) and edited/extended it for local use.
  • See the code in file s\sas8\Ch2_Samples.sas s
    \sas8\Ch3_Samples.sas etc.

32
The copyto Macro
  • Utility macro used to copy the sample data to our
    local data directory, s\sas8\RawData
  • Uses a single positional parameter and a single
    keyword parameter.
  • Generates a simple SAS data step that reads data
    from datalines and writes a file in the RawData
    directory with name determined by the parameters.

33
Run the Samples - Ch 2
  • Use the new enhanced editor window.
  • Do a file-open and select the Ch2_Samples.sas
    program file.
  • Note that all the steps which invoke copyto have
    already been run and the results are already out
    there. But it will not hurt if you rerun.
  • You can edit the copyto macro and make your own
    personal copies which you can then play with.
Write a Comment
User Comments (0)
About PowerShow.com