Title: An Introduction to SAS Character Functions including some new V9 functions
1An Introduction to SAS Character Functions
(including some new V9 functions)
2Some Functions We Will Discuss
- LENGTH
- SUBSTR
- COMPBL
- COMPRESS
- VERIFY
- INPUT
- PUT
- TRANWRD
- SCAN
- TRIM
- UPCASE
- LOWCASE
- INDEX
- INDEXC
- SPEDIS
- LENGTH
3Some V9 Functions
- ANYDIGIT
- ANYALPHA
- NOTDIGIT
- NOTALPHA
- CATX and CATS
- COMPARE
- LENGTHC
- LENGTHN
- STRIP
- COUNT
- COUNTC
- PROPCASE
- INDEXW
4Character Storage Lengths
- data chars1
- length string 7
- string 'abc'
- length length(string)
- storage_length lengthc(string)
- display "" string ""
- put storage_length /
- length /
- display
- run
5SAS Log
- 11 data chars1
- 12 length string 7
- 13 string 'abc'
- 14 storage_length lengthc(string)
- 15 length length(string)
- 16 display "" string ""
- 17 put storage_length /
- 18 length /
- 19 display
- 20 run
- storage_length7
- length3
- displayabc
6Moving the LENGTH Statement
- data chars2
- string 'abc'
- length string 7
- storage_length lengthc(string)
- length length(string)
- display "" string ""
- put storage_length /
- length /
- display
- run
7SAS Log
- 1 data chars2
- 2 string 'abc'
- 3 length string 7
- WARNING Length of character variable string has
already been set. Use the LENGTH statement as the
very first statement in the DATA STEP to declare
the length of a character variable. - 4 storage_length lengthc(string)
- 5 length length(string)
- 6 display "" string ""
- 7 put storage_length /
- 8 length /
- 9 display
- 10 run
- storage_length3
- length3
- displayabc
8The INPUT Function
- data special
- INPUT is a special function often used
- for character to numeric conversion
- length c_date 10 numeral 3
- input c_date numeral
- sas_date input(c_date,mmddyy10.)
- number input(numeral,3.)
- datalines
- 11/12/1950 123
- 9-15-2004 99
Listing of Data Set SPECIAL c_date numeral
sas_date number 11/12/1950 123 -3337
123 9-15-2004 99 16329 99
9The PUT Function
- data special
- PUT is a special function often used
- for numeric to character conversion
- input sas_date number ss
- c_date put(sas_date,date9.)
- money put(number,dollar8.)
- ss_char put(ss,ssn.)
- datalines
- 0 1234 123456789
-
Listing of Data Set SPECIAL sas_date number
ss c_date money ss_char 0
1234 123456789 01JAN1960 1,234 123-45-6789
10Converting Multiple Blanks to a Single Blank
- data multiple
- input 1 _at_1 Name 20.
- 2 _at_1 Address 30.
- 3 _at_1 City 15.
- _at_20 State 2.
- _at_25 Zip 5.
- name compbl(name)
- address compbl(address)
- city compbl(city)
- datalines
- Ron Cody
- 89 Lazy Brook Rd.
- Flemington NJ 08822
- Bill Brown
- 28 Cathy Street
- North City NY 11518
Multiple Name Address Ron
Cody 89 Lazy Brook Rd. Bill Brown 28
Cathy Street City State Zip
Flemington NJ 08822 North City NY
11518
11How to Remove Characters from a String
- data phone
- input phone 15.
- phone1 compress(phone)
- phone2 compress(phone,'(-) ')
- datalines
- (908)235-4490
- (201) 555-77 99
Phone phone phone1
phone2 (908)235-4490 (908)235-4490
9082354490 (201) 555-77 99 (201)555-7799
2015557799
12Another COMPRESS Example
- data social
- input ss_char 11.
- ss input(compress(ss_char,'-'),9.)
- easy_ss input(ss_char,comma11.)
- datalines
- 123-45-6789
-
ss 123456789 (numeric) ss_easy 123456789
(numeric)
13Compress Function (v9 changes)
- COMPRESS (char_value lt, comp_stringgt
lt,modifiersgt) - char_value is a SAS character value
- comp_string is a character value containing the
characters to remove from char_value. - modifiers add additional characters to the list
of - characters to remove or modify the way the
function works (see next slide).
14Compress Function Modifiers (v9)
- Selected list of COMPRESS modifiers (upper- or
lowercase) - a adds upper- and lowercase letters
- d adds numerals (digits)
- i ignores case
- k keeps listed characters instead of removing
them - s adds space (blank, tabs, lf, cr) to the list
- p adds punctuation
15Examples
For these examples, char "A C123XYZ" , phone
"(908) 777-1234"
16Using the Compress Modifiers
- data phone
- input phone 15.
- number compress(phone,,'dk')
- datalines
- (908)235-4490
- (201) 555-77 99
- Listing of Data Set PHONE
- phone number
- (908)235-4490 9082354490
- (201) 555-77 99 2015557799
17The VERIFY Function
- data verify
- input _at_1 id 3.
- _at_5 answer 5.
- position verify(answer,'abcde')
- datalines
- 001 acbed
- 002 abxde
- 003 12cce
- 004 abc e
Verify id answer position 001 acbed
0 002 abxde 3 003 12cce
1 004 abc e 4
18Watch Out for Trailing Blanks
- data trailing
- length string 10
- string 'abc'
- position verify(string,'abcde')
- run
String 'abc ' Position 4 (the position
of the first trailing blank)
19Watch Out for Trailing Blanks
- data trailing
- length string 10
- string 'abc'
- position
- verify(trim(string),'abcde')
- run
Position 0
20Using VERIFY for Data Cleaning
- data clean
- input id
- Valid ID's contain letters X,Y, or Z
- and digits
- if verify(trim(id),'XYZ0123456789') eq 0
- then valid 'Yes'
- else valid 'No'
- datalines
- 12X67YZ
- 67WXYZ
Listing of Data Set CLEAN id
valid 12X67YZ Yes 67WXYZ No
21Substring Example
- data pieces_parts
- input Id 9.
- length State 2
- state substr(id,1,2)
- Num input(substr(id,7,3),3.)
- datalines
- NYXXXX123
- NJ1234567
Listing of Data Set PIECES_PARTS Id
State Num NYXXXX123 NY 123 NJ1234567
NJ 567
22The SUBSTR Function on the Left-Hand Side of the
Equal Sign
- data pressure
- input sbp dbp _at__at_
- length sbp_chk dbp_chk 4
- sbp_chk put(sbp,3.)
- dbp_chk put(dbp,3.)
- if sbp gt 160 then
- substr(sbp_chk,4,1) ''
- if dbp gt 90 then
- substr(dbp_chk,4,1) ''
- datalines
- 120 80 180 92 200 110
23The SUBSTR Function on the Left-Hand Side of the
Equal Sign
- Listing of Data Set PRESSURE
- sbp dbp sbp_chk dbp_chk
- 120 80 120 80
- 180 92 180 92
- 200 110 200 110
24Unpacking a String
- data pack
- input string 5.
- datalines
- 12345
- 8 642
-
- data unpack
- set pack
- array x5
- do j 1 to 5
- xj input(substr(string,j,1),1.)
- end
- drop j
- run
Listing of Data Set UNPACK string x1 x2 x3
x4 x5 12345 1 2 3 4 5 8 642 8 .
6 4 2
25Parsing a String
- data take_apart
- input _at_1 date 10.
- month input(scan(date,1,'-/'),2.)
- day input(scan(date,2,'-/'),2.)
- year input(scan(date,3,'-/'),4.)
- sas_date input(date,mmddyy10.)
- format sas_date mmddyy10.
- datalines
- 10/21/1994
- 3-7-1960
- 1251970
Listing of Data Set TAKE_APART date month
day year sas_date 10/21/1994 10 21 1994
10/21/1994 3-7-1960 3 7 1960
03/07/1960 1251970 12 5 1970 12/05/1970
26Using the SCAN Function to Extract a Last Name
- data first_last
- input _at_1 name 20.
- _at_22 phone 13.
- extract the last name from name
- last_name scan(name,-1,' ')
- minus value scans from the right
- datalines
- Jeff W. Snoker (908)782-4382
- Raymond Albert (732)235-4444
- Alfred Edward Newman (800)123-4321
- Steven J. Foster (201)567-9876
- Jose Romerez (516)593-2377
27Using the SCAN Function to Extract a Last Name
- Names and Phone Numbers in Alphabetical Order (by
Last Name) - Name Phone Number
- Raymond Albert (732)235-4444
- Steven J. Foster (201)567-9876
- Alfred Edward Newman (800)123-4321
- Jose Romerez (516)593-2377
- Jeff W. Snoker (908)782-4382
28Locating the Position of One String Within
Another String
- data locate
- input string 10.
- first index(string,'xyz')
- first_c indexc(string,'x','y','z')
- datalines
- abczyx1xyz
- 1234567890
- abcx1y2z39
Listing of Data Set LOCATE string first
first_c abczyx1xyz 8 4 1234567890
0 0 abcx1y2z39 0 4
29Locating One Word in a StringV9 Function INDEXW
- data find_word
- string 'anything goes any where'
- index index(string,'any')
- indexw indexw(string,'any')
- put index indexw
- run
index 1 indexw 15
30Changing Case
- Data case
- input name 15.
- upper upcase(name)
- lower lowcase(name)
- proper propcase(name)
- Datalines
- gEOrge SMITH
- The end
Listing of Data Set CASE name upper
lower proper gEOrge SMITH GEORGE
SMITH george smith George Smith The end
THE END the end The End
31Substituting One Word for Another in a String
- data convert
- input _at_1 address 20.
- Convert Street, Avenue and
- Boulevard to their abbreviations
- Address tranwrd(address,'Street','St.')
- Address tranwrd(address,'Avenue','Ave.')
- Address tranwrd(address,'Road','Rd.')
- datalines
- 89 Lazy Brook Road
- 123 River Rd.
- 12 Main Street
Listing of Data Set CONVERT Obs Address 1
89 Lazy Brook Rd. 2 123 River Rd. 3
12 Main St.
32Spelling distance
- data compare
- length string1 string2 15
- input string1 string2
- points spedis(string1,string2)
- datalines
- same same
- same sam
- first xirst
- last lasx
- receipt reciept
Listing of Data Set COMPARE string1 string2
points same same 0 same
sam 8 first xirst 40 last
lasx 25 receipt reciept 7
33The "ANY" Functions
- data find_alpha_digit
- input string 20.
- first_alpha anyalpha(string)
- first_digit anydigit(string)
- datalines
- no digits here
- the 3 and 4
- 123 456 789
Listing of Data Set FIND_ALPHA_DIGIT
first_ first_ string alpha
digit no digits here 1 0 the 3
and 4 1 5 123 456 789 0
1
34The "NOT" FunctionsBeware of Trailing Blanks
- length string 10
- string '123'
- position notdigit(string)
- pos_trim notdigit(trim(string))
- position 4 (position of first blank)
- pos_trim 0
35The "NOT" Functions
- data data_cleaning
- input string 20.
- not_alpha notalpha(trim(string))
- not_digit notdigit(trim(string))
- datalines
- abcdefg
- 1234567
- abc123
- 1234abcd
Listing of Data Set DATA_CLEANING
not_ not_ string alpha digit abcdefg
0 1 1234567 1 0 abc123
4 1 1234abcd 1 5
36Concatenation Functions
- data join_up
- string1 'ABC '
- string2 ' XYZ '
- string3 '12345'
- cats cats(string1,string2)
- catx catx('',string1,string2,string3)
- run
cats 'ABCXYZ' catx 'ABCXYZ12345
37Some LENGTH Functions
- data how_long
- one 'ABC '
- two ' ' / character missing value /
- three ' XYZ'
- length_one length(one)
- lengthn_one lengthn(one)
- lengthc_one lengthc(one)
- length_two length(two)
- lengthn_two lengthn(two)
- lengthc_two lengthc(two)
- length_three length(three)
- lengthn_three lengthn(three)
- lengthc_three lengthc(three)
- run
3 3 6 1 0 1 6 6 6
38The COMPARE Function
- COMPARE(string1, string2 lt,'modifiers'gt)
- I ignore case
- L remove leading blanks
- truncate the longer string to the length of the
shorter string. The default is to pad the shorter
string with blanks before a comparison. - (Note similar to the comparison operator)
If string1 and string2 are the same, COMPARE
returns a value of 0. If the arguments differ,
the sign of the result is negative if string1
precedes string2 in a sort sequence, and positive
if string1 follows string2 in a sort sequence
The magnitude of the result is equal to the
position of the leftmost character at which the
strings differ.
39The COMPARE Function
- data compare
- input code _at__at_
- value 'V30.450'
- c1 compare(code,value)
- c2 compare(code,value,'')
- c3 compare(trim(code),value,'i')
- datalines
- V30 V30.450 v30.4
Listing of Data Set COMPARE code value
c1 c2 c3 V30 V30.450 -4 -4
0 V30.450 V30.450 0 0 0 v30.4
V30.450 1 1 0
40The STRIP Function
- data _null_
- file print
- one ' ABC '
- two ' XYZ '
- one_two '' one two ''
- strip '' strip(one) strip(two)
'' - concat cats('',one,two,'')
- put one_two /
- strip /
- concat
- run
one_two ABC XYZ stripABCXYZ concat
ABCXYZ
41COUNT and COUNTC Functions
- data Dracula / Get it Count Dracula /
- input string 20.
- count_abc count(string,'abc')
- countc_abc countc(string,'abc')
- count_abc_i count(string,'abc','i')
- datalines
- xxabcxABCxxbbbb
- cbacba
Listing of Data Set DRACULA
count_ countc_ count_ string
abc abc abc_i xxabcxABCxxbbbb 1
7 2 cbacba 0
6 0
42Contact Information
- Author Ron Cody
- You may download copies of the Powerpoint
presentation from - www2.umdnj.edu/codyweb/biocomputing