Title: Introduction to S-PLUS (Win2000/NT)
1Introduction to S-PLUS
(Win2000/NT)
2Preliminaries
- History of S-Plus
- Open Source, Public Domain version of the S
language - Example of using R in a teaching context
- Useful resources
3The History of S-Plus
- The S Language is developed by ATT Bell Labs
1980s - The commercial value-added version marketed by
Stat-Sci Inc. - MathSoft acquires Statsci in 1992 (markets
MathCad software) - S-Plus 4.0 released in 1997
- S-Plus 4.5 released in 1998 (WIN95/NT)
- S-Plus 2000 released in 2000 (marketed by
Insightful Corp http//www.insightful.com) - S-Plus 6.0 released in fall 2001
4Open Source Version of the S language
- R is an open source version of the S language
(http//www.cran.r-project.org) - Implemented for Linux, Windows, Macintosh
- No GUI, only command interface
- Has many libraries available covering almost all
areas of modern statistics
5An Example of Using R through a Web Application
Server
- http//www.unt.edu/benchmarks/october01/rss.htm
- Programs can be submitted through a web browser
interface - Programs can be modified and resubmitted to get
new results - Useful as a teaching aid
6Useful Resources
- http//www.insightful.com/
- The Basics of S and S-PLUS (second
edition)Andreas Krause and Melvin
OlsonSpringer-Verlag, New York (2000) - An introduction to S-PLUS and the Hmisc and
Design libraries C.F. Alzola and F.E. Harrell,
Freely available document - Regression Modeling Strategies with Applications
to Linear Models, Logistic Regression, and
Survival Analysis F.E. Harrell, Springer (2001)
7Use Resources (cont.)
- Modern Applied Biostatistical Methods Using
S-PLUSS. Selvin, Oxford University Press
(1998)ISBN 0-19-512025-6 - Introduction to Robust Estimation and Hypothesis
TestingRand Wilcox, Academic Press (1997)ISBN
0-12-751545-3 - WebLinks (S-news listserve, and Software archive)
- http//lib.stat.cmu.edu/s-news/
- http//lib.stat.cmu.edu/S/
8Features of S-Plus
- S-Plus is an objected oriented programming
language - Version 2000 has a GUI which is extensible
- S-Plus has over 3,000 functions which allow total
control of data and graphics - S-Plus 2000 accommodates data analysts from
novice to expert - There is a very active S-Plus user group (S-news)
9Getting Help in S-Plus
- Help for general tasks, S-Plus functions, etc.
- Can use the Help index from GUI interface
- Alternatively can use help function in the
Command Window - Example In the command window at the gt prompt
type, - gt help(help) help on help
- gt help(scan) help on scan()
function - General Help Help Menu from GUI
- Function index, keyword search, on-line manuals
(user manual, programmer manual, statistics
manual, visual demo
10How to stop computations or quit an S-Plus session
- To cancel an ongoing computation, press ESC
- Cancels long outputs
- Aborts long computations
- To quit from S-Plus
- From GUI use File-Exit
- From Command Window use
- gt q ( )
-
11Getting Data into S-Plus
- Dialog box starts up as a default
- Select existing data or import data from a file
- For existing data, enter the name of an S-Plus
object in the Name box
12Data Import Options
- Select the Options tab for import options
- By default, ASCII files are assumed to have
- column labels on row 1 (blank rows ignored)
- data from row 2 on
- columns are seperated by whitespace or comma
- Can change defaults with import options
- First data line, separators, column
specifications - Note .dat files are assumed to be GAUSS files
13Data Objects
- When data are imported, the data are displayed in
a data window and a new object is created - The name of the data object is chosen from the
file name, or can be specified in the import
dialog
14Data Window
- The data appears in a spreadsheet
- This includes row names and column names
- Can click on a column name to select a column
- Use shift-click to select a block of columns
- Use ctrl-click to select non-adjacent columns
15Data Window (cont.)
- The order of selection is important (selected
columns will be used in graphs and analyses)
16Object Browser
- S-Plus is object oriented
- All data, functions, results from analyses,
plots, etc., are objects - The object browser organizes all the objects you
create - Left half - object type Right half - list of
objects
17Command Window
- Every menu and toolbar generates the S-Plus
command language - S-Plus is a complete programming language
- This language has loops, functions, expressions,
and is object oriented - Most of the functionality and flexibility of
S-Plus is accessed through the language
18Creating Graphs
- Open GRAPH - 2D Plot, or GRAPH - 3D Plot
menu option - Select column(s) for plotting from the Data
Window - ctrl-click the column for the x-axis
- ctrl-click the column for the y-axis
19Creating Graphs (cont.)
- ctrl-click the column for the z-axis if needed
- click on the palette button for the desired plot
20Creating Trellis Graphics
- Any plot can be conditioned by the value of any
other variable in the data - First create a plot, then select the conditioning
column in the data window - Drag the column (grabbing a cell) to the title
area of plot
21Creating Trellis Graphics (cont.)
- The cursor will change to a when ready to
drop - The plot is redrawn with panels for subranges of
the conditioning variable
22The S-Plus Language
- Expressions are entered at the prompt gt
- S-Plus prints out the result
- gt 33
- 1 6
- gt sin(pi)
- 1 1.224606e-016
- gt sqrt(100)
- 1 10
-
23Additional prompt
- An incomplete expression leads to a second
prompt - Can continue at the second prompt
24Getting stuck at prompt
- If the prompt continues after hitting return,
then enter many ) to get the gt prompt - Then start your expression again
- gt sqrt(
-
- )))))
- Error in parse(text txt) Syntax error No
opening parenthesis, before ")" at this point - sqrt(
- ))
- Dumped
- gt sqrt(100)
25Scalars and Assignments
- Read this as weight gets 190
- This assigns the value 190 to the scalar named
weight - The assignment operator is the sequence of
characters lt-
- gt weightlt-190
- gt weight
- 1 190
-
26Character Assignments
- Character values are inserted in quotes
- If the quotes are omitted, S-Plus will look for a
data object called, Jim , to assign to person - The result is not printed until you enter the
object name
- gt personlt-"Jim"
- gt person
- 1 "Jim"
-
27Vectors
- gt rnorm(10)
- 1 0.3037020
- 2 -0.5248669
- 3 1.4674553
- 4 0.4536315
- 5 0.4077797
- 6 0.5362221
- 7 0.0759569
- 8 0.3239556
- 9 -1.3531665
- 10 -2.4226150
-
- The function rnorm( ), returns a vector of
random deviates from the normal distribution - The n on the left shows where the row starts
28Vectors (cont.)
- A single number is a vector of length 1
- We can make vectors using the concatenation
function c( ) - We assign the integers 1,2,3 to the vector x
- gt mean(rnorm(10))
- 1 0.5807564
-
- gt xlt-c(1,2,3)
- gt x
- 1 1 2 3
-
29Vectors (cont.)
- We can create a vector of names
- We can create a vector of sequential integers
using the function ab , where a is the
starting integer and b is the ending integer
- gt peoplelt-c("Jim", "Sue", "Dave")
- gt people
- 1 "Jim" "Sue" "Dave"
-
- gt 510
- 1 5 6 7 8 9 10
-
30Object Names
- Object names may contain,
- Letters abcDEF
- Numbers 0123456789
- Dot .
- Examples of valid names
- height
- weight
- x.var
- .yvar
- x.y.var
- x110
31Object names (cont.)
- Objects names cannot use an underscore, a hyphen,
begin with a number, or use reserved symbols
- Examples of invalid object names
- _xvar
- y_var
- x-yvar
- 120xvar
- T
- F
- NA
32Handling Objects
- We can list out all of the objects
- gt objects()
- 1 ".Last.value" ".Random.seed" "Cars"
"last.dump" - 5 "last.warning" "people" "person"
"weight" - 9 "weights" "x"
- Objects remain until removed, even if one quits
S-Plus - gt rm(x)
- gt x
- Error Object "x" not found
- Dumped
- Equivalently, you can use the object browser
33Objects as variables
- Objects can be used in expressions
- gt xlt-110
- gt mean(x)
- 1 5.5
- gt ylt-c(x,10)
- gt length(y)
- 1 11
- gt 2y
- 1 2 4 6 8 10 12 14 16 18 20 20
-
34Vector Arithmetic
- Scalar Functions work on elementwise basis
- Can perform scalar and vector arithmetic
- gt xlt-15
- gt x2
- 1 1 4 9 16 25
- gt 2x
- 1 2 4 6 8 10
- gt 2xsqrt(x)
- 1 3.000000 5.414214
- 3 7.732051 10.000000
- 5 12.236068
35Logical Vectors
- Expressions with relational operators return
logical vectors - T is True, F is False
- gt xlt-rnorm(5)
- gt x
- 1 1.2698616 -1.1080517
- 3 0.5627334 0.2454234
- 5 0.2919052
- gt xlt0
- 1 F T F F F
36Missing values
- A missing value is represented by NA
- Operations on NA return NA
- The function is.na( ) checks for missing values
- gt xlt-c(1, NA, 3)
- gt x
- 1 1 NA 3
- gt x1
- 1 2 NA 4
- gt sum(x)
- 1 NA
- gt is.na(x)
- 1 F T F
37Vector indexing
- Use brackets, to select elements of a vector
- Negative indices remove elements
- gt xlt-c(2,4,6,8,10)
- gt x
- 1 2 4 6 8 10
- gt x1
- 1 2
- gt x35
- 1 6 8 10
- gt xc(1,3,5)
- 1 2 6 10
- gt x-(13)
- 1 8 10
38Logical Indices
- gt xlt-rnorm(5)
- gt x
- 1 -2.4592950 0.9074605
- 3 0.5088648 -1.1184415
- 5 0.5137160
- gt xxlt0
- 1 -2.459295 -1.118441
- gt log.xlt-log(x)
- Warning messages
- NAs generated in log(x)
- gt xis.na(log.x)
- 1 -2.459295 -1.118441
- gt log.x!is.na(log.x)
- 1 -0.09710521 -0.67557299
- 3 -0.66608473
- A logical index selects elements
- Symbols for logical operators
- lt Less than
- gt Greater than
- lt Less than or equal to
- gt Greater than or equal to
- Equal to
- ! Negation operator
- ! Not equal to
39Replacement
- You can use on the left hand side of an
assignment, lt-
- gt xlt-sample(18)
- gt x
- 1 2 6 8 3 1 5 7 4
- gt x6lt-NA
- gt x
- 1 2 6 8 3 1 NA 7 4
- gt xis.na(x)lt-0
- gt x
- 1 2 6 8 3 1 0 7 4
40Functions
- Functions are called like this
- function.name(argument, argument, .)
- Functions always return a value
- NULL represents no value
- Example
- seq(from1, toend, by1, lengthinferred,
alongNULL)
- gt seq(1,5)
- 1 1 2 3 4 5
- gt seq(10, 20, length6)
- 1 10 12 14 16 18 20
- gt seq(to100, by15, length7)
- 1 10 25 40 55 70 85 100
- gt seq(length10)
- 1 1 2 3 4 5 6 7 8
- 9 9 10
41Functions (cont.)
- Function arguments have
- position(first, second, .)
- function name
- default values for function options (sometimes)
- Examples
- rnorm(n, mean0, sd1)
- rep(x, timesinferrred, length.outinferred)
gt rep(13,2) 1 1 2 3 1 2 3 gt rep(13,
length8) 1 1 2 3 1 2 3 1 2 gt rep(13,
c(3,2,1)) 1 1 1 1 2 2 3 gt rep() Error in rep
Argument "x" is missing, with no default rep()
Dumped gt rep(13, c(3,2,1), 5) 1 1 2 3 1 2
42Matrices
- matrix(dataNA, nrowinferred,
ncolinferred, byrowF, dimnamesNULL) - The function matrix( ) reads data into a matrix
- The number of columns is specified by using the
argument ncol - And/Or the number of rows can be specified by the
argument, nrow
- gt xlt-matrix(110, nrow2)
- gt x
- ,1 ,2 ,3 ,4 ,5
- 1, 1 3 5 7 9
- 2, 2 4 6 8 10
- gt dim(x)
- 1 2 5
- gt x.matrixlt-matrix(c(20,10,3,1,7,4), ncol2)
- gt x.matrix
- ,1 ,2
- 1, 20 1
- 2, 10 7
- 3, 3 4
43Matrices (cont.)
- We can attach names to columns with the dimnames
option - gt xlt-matrix(c(120), ncol4, byrowT,
dimnameslist(NULL, c("col1", "col2", "col3",
"col4"))) - gt x
- col1 col2 col3 col4
- 1, 1 2 3 4
- 2, 5 6 7 8
- 3, 9 10 11 12
- 4, 13 14 15 16
- 5, 17 18 19 20
44Matrices (cont.)
- Specifying byrowT forces S-Plus to read the data
in row by row - When the argument is not specified, or specified
as byrowF , S-Plus assumes the data is
written in column by column
- gt x.matrixlt-matrix(c(20,10,3,1,7,4), ncol2,
byrowT) - gt x.matrix
- ,1 ,2
- 1, 20 10
- 2, 3 1
- 3, 7 4
45Matrices (cont.) - Indexing
- gt xlt-matrix(115, nrow3, byrowT)
- gt x
- ,1 ,2 ,3 ,4 ,5
- 1, 1 2 3 4 5
- 2, 6 7 8 9 10
- 3, 11 12 13 14 15
- gt x2,3
- 1 8
- gt x23, 35
- ,1 ,2 ,3
- 1, 8 9 10
- 2, 13 14 15
- gt x,1
- 1 1 6 11
- gt x1,
- 1 1 2 3 4 5
- To extract a value from a matrix, use two
elements in the subscript - The first element applies to rows
- The second element applies to columns
- If one dimension is not specified, all elements
for that dimension are extracted
46Constructing Matrices from vectors
- Matrices can be constructed from row vectors and
column vectors using cbind and rbind - Binding together vectors of different attributes
(character and numeric for example), is not
allowed - vectors will be coerced to a similar
attribute - Numeric and character vectors will be coerced to
character
- gt x lt- c(3,4,5)
- gt y lt- c(6,7,8)
- gt x.ylt-cbind(x,y)
- gt x.y
- x y
- 1, 3 6
- 2, 4 7
- 3, 5 8
- gt x lt- c(3,4,5)
- gt x lt- c(6,7,8)
- gt x.ylt-rbind(x,y)
- gt x.y
- ,1 ,2 ,3
- x 3 4 5
- y 6 7 8
47Constructing Matrices from vectors (cont.)
- Example binding together a character vector and
a numeric vector coerces to a character matrix - What is need is a data object called a data frame
(similar to SPSS or SAS datasets)
- gt x lt- c(3,4,5)
- gt y lt- c("Three","Four","Five")
- gt x.ylt-rbind(x,y)
- gt x.y
- ,1 ,2 ,3
- x "3" "4" "5"
- y "Three" "Four" "Five"
48Converting Matrices to Data Frames
- Data Frames are a data object that allows one to
bind data vectors of different types together,
such that the data can be accessed like a matrix - Most of the dialog boxes in the GUI operate on
Data Frames
- gt x lt- c(3,4,5)
- gt y lt- c("Three","Four","Five")
- gt x.ylt-data.frame(x, y)
- gt x.y
- x y
- 1 3 Three
- 2 4 Four
- 3 5 Five
49Arrays
- gt xlt-array(124, c(3, 4, 2))
- gt x
- , , 1
- ,1 ,2 ,3 ,4
- 1, 1 4 7 10
- 2, 2 5 8 11
- 3, 3 6 9 12
- , , 2
- ,1 ,2 ,3 ,4
- 1, 13 16 19 22
- 2, 14 17 20 23
- 3, 15 18 21 24
- gt x,2,
- ,1 ,2
- 1, 4 16
- 2, 5 17
- An array is a data construct that can be thought
of as a multi-dimensional (up to eight
dimensions) - An array is defined as
- array(data, dim)
- If we fix second index at 2 we get a 3x2 matrix
50The apply( ) function
- gt xlt-matrix(c(110), ncol2, byrowT)
- gt x
- ,1 ,2
- 1, 1 2
- 2, 3 4
- 3, 5 6
- 4, 7 8
- 5, 9 10
- gt x.loglt-apply(x, 2, log)
- gt x.log
- ,1 ,2
- 1, 0.000000 0.6931472
- 2, 1.098612 1.3862944
- 3, 1.609438 1.7917595
- 4, 1.945910 2.0794415
- 5, 2.197225 2.3025851
- The apply function successively applies a
function of your choice to each row, each column,
and each level of a higher dimension of a matrix
or array - apply(data, dim, function)
51Lists
- gt xlistlt-list(dat15, name"John", yc(32, 45))
- gt xlist
- dat
- 1 1 2 3 4 5
- name
- 1 "John"
- y
- 1 32 45
- gt xlistdat
- 1 1 2 3 4 5
- gt xlistname
- 1 "John"
- gt xlisty
- 1 32 45
- gt xlist3
- 1 32 45
- A List is an ordered collection of arbitrary
objects - A list can be indexed like a matrix or array
- Index an element of a list by using
- listnameelementname listnameindex
listnameelementnameindex
52 lapply( ) function
- gt Llt-list(vec110, matmatrix(9988, 3,4))
- gt L
- vec
- 1 1 2 3 4 5 6 7 8 9 10
- mat
- ,1 ,2 ,3 ,4
- 1, 99 96 93 90
- 2, 98 95 92 89
- 3, 97 94 91 88
- gt lapply(L, mean)
- vec
- 1 5.5
- mat
- 1 93.5
- The tool lapply ( ) is designed for working on
all elements of a list using the same function - Example Calculate the mean of every list
element
53 lapply () function (cont.)
- Can use the lapply to apply an arbitrary function
to the corresponding elements of two lists - Example Take the mean of elements in first list
position, and add the mean of elements in the
first position of a second list
- gt list1 lt- list(c(2,4,6,8), c(10,12,14,16))
- gt list2 lt- list(c(18,20,22,24), c(26,28,30,32))
- gt lapply(12, function(i, x, y) mean(xi)
mean(yi), x list1, y list2) - 1
- 1 26
- 2
- 1 42
54 unlist() function
- Simplifies the recursive structure of a list
- Usage
- unlist(data, recursiveT, use.namesT)
- Example Create a vector from a list with two
elements
- gt x.listlt-list(c(2,4,6,8), c(10,12,14,16),
c(18,20,22,24)) - gt x.list
- 1
- 1 2 4 6 8
- 2
- 1 10 12 14 16
- 3
- 1 18 20 22 24
- gt unlist(x.list)
- 1 2 4 6 8 10 12 14 16 18 20 22 24
-
55Creating Functions
- Functions allow modular programming
- Functions can call other functions
- Functions have calling parameters enclosed in
parenthesis with the main body inclosed in braces
- gt x.powerlt-function(x, power)
- xlt-xpower
- x
- gt xlt-c(2,4,6,8)
- gt x.power(x, 2)
- 1 4 16 36 64
- gt x.power(x, 3)
- 1 8 64 216 512
-
56Looping
- Looping can allow iteration over indices of a
scalar, vector, matrix, or list - Looping is slow in S-Plus vector operations
are encouraged
- gt templt-0
- gt for (i in 14)
- templt-itemp
- gt temp
- 1 10
57 ifelse() function
- Compares the elements of two objects according to
some boolean statement - Can return scalar or vector values for a true
condition, and a different set of values for a
false condition
- gt xlt-c(1,3,5)
- gt ylt-c(6,4,2)
- gt zlt-ifelse(xlty, 1, 0)
- gt z
- 1 1 1 0
58Matrix Algebra
Matrix Multiply gt t(x)x ,1 ,2
1, 5 15 2, 15 55 gt
gt xlt-cbind(rep(1,5), 15) gt x ,1 ,2
1, 1 1 2, 1 2 3, 1
3 4, 1 4 5, 1 5 gt t(x) ,1
,2 ,3 ,4 ,5 1, 1 1 1 1
1 2, 1 2 3 4 5 gt
59Matrix Algebra (cont.)
- Matrix Inverse solve ()
- gt solve(t(x)x)
- ,1 ,2
- 1, 1.1 -0.3
- 2, -0.3 0.1
- gt
- Matrix Decompositions
- Eigenvector eigen ( )
- Singular Value svd ( )
- Cholesky chol ( )
gt eigen(solve(t(x)x)) values 1 1.18309519
0.01690481 vectors ,1 ,2
1, 0.9637149 0.2669336 2, -0.2669336
0.9637149
60Exercises (a)
- Create the following matrix called marks and put
in the appropriate label names
- gt marks
- Test1 Test2 Test3 Final
- 1, 20 23 18 48
- 2, 16 15 18 40
- 3, 25 20 22 40
- 4, 14 19 18 42
61Solutions (a)
- gt markslt-matrix(c(20,23,18,48,16,15,18,40,25,20,22
,40,14,19,18,42),byrowT, - nrow4,dimnameslist(NULL,c("Test1","Test2","Tes
t3","Final")))
62Exercises (b)
- Add the following row to the bottom of the
matrix - 10 15 14 30
63Solutions (b)
- gt markslt-rbind(marks,c(10,15,14,30))
- gt marks
- Test1 Test2 Test3 Final
- 1, 20 23 18 48
- 2, 16 15 18 40
- 3, 25 20 22 40
- 4, 14 19 18 42
- 5, 10 15 14 30
64Exercises (c)
- Change the fifth mark for test 2 from a 15 to a
17
65Solutions (c)
- gt marks5,2lt-17
- gt marks
- Test1 Test2 Test3 Final
- 1, 20 23 18 48
- 2, 16 15 18 40
- 3, 25 20 22 40
- 4, 14 19 18 42
- 5, 10 17 14 30
66Exercises (d)
- Print all the marks for test 3
67Solutions (d)
- gt marks, 3
- 1 18 18 22 18 14
68Exercises (e)
- Print the final marks for those people with marks
greater than 16 on test 1
69Solutions (e)
gt marksmarks,1gt16,1 1 20 25
70Exercises (f)
- Print the marks matrix without the column for
test 3
71Solutions (f)
- gt marks,-3
- Test1 Test2 Final
- 1, 20 23 48
- 2, 16 15 40
- 3, 25 20 40
- 4, 14 19 42
- 5, 10 17 30
72Exercises (g)
- Print the number of rows in the matrix
73Solutions (g)
74Additional Exercises
- Create a vector INTS containing the integers from
1 to 50 - Create a vector X which is 2 to the power of INTS
- Create a vector Y which is INTS raised to the
second power - Create a T/F vector which contains a T when
elements of x and y are equal and an F when they
are not equal
75Solutions to additonal exercises
- gt Xlt-2INTS
- gt Ylt-INTS2
- gt X
- 1 2.000000e000 4.000000e000 8.000000e000
1.600000e001 3.200000e001 6.400000e001
1.280000e002 - 8 2.560000e002 5.120000e002 1.024000e003
2.048000e003 4.096000e003 8.192000e003
1.638400e004 - 15 3.276800e004 6.553600e004 1.310720e005
2.621440e005 5.242880e005 1.048576e006
2.097152e006 - 22 4.194304e006 8.388608e006 1.677722e007
3.355443e007 6.710886e007 1.342177e008
2.684355e008 - 29 5.368709e008 1.073742e009 2.147484e009
4.294967e009 8.589935e009 1.717987e010
3.435974e010 - 36 6.871948e010 1.374390e011 2.748779e011
5.497558e011 1.099512e012 2.199023e012
4.398047e012 - 43 8.796093e012 1.759219e013 3.518437e013
7.036874e013 1.407375e014 2.814750e014
5.629500e014 - 50 1.125900e015
- gt Y
- 1 1 4 9 16 25 36 49 64 81
100 121 144 169 196 225 256 289 324 361
400 441 - 22 484 529 576 625 676 729 784 841 900
961 1024 1089 1156 1225 1296 1369 1444 1521 1600
1681 1764 - 43 1849 1936 2025 2116 2209 2304 2401 2500
- gt equallt-(XY)
- gt equal
- 1 F T F T F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F
F F F