Introduction to R - Lecture 5: More loops - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to R - Lecture 5: More loops

Description:

... .04000 29.77200 28.77680 28.20880 29.52240 30.24960 30.90160 poodle 30.03063 29.76306 28.77117 28.20631 29.51892 30.23874 30.89910 husky 30.12301 29 ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 55
Provided by: andrewj
Category:

less

Transcript and Presenter's Notes

Title: Introduction to R - Lecture 5: More loops


1
Introduction to R - Lecture 5 More loops
  • Andrew Jaffe
  • 10/4/2010

2
Overview
  • Review For Loop
  • Lists
  • Aside Patterns
  • Application

3
Review For Loop
  • The syntax is for(var in seq) code
  • The seq determines what values var will take in
    the loop
  • The loop is performed length(seq) times
  • On the nth iteration of the loop, var takes the
    value seqn
  • var is a completely new variable and not directly
    related to anything other variable

4
Review For Loop
  • Setting up your loop requires determining the
    correct seq to loop over usually easy
  • The real challenge of looping is relating the
    values of seq to the dimensions/ indices of your
    data

5
Review For Loop
  • From last lecture were relating seq to the
    columns of the data
  • var is indirectly related to the data, as it
    links/relates to the column indices but it has
    only has the values 1-12

Index 415 mean_wt lt- rep(0, length(Index)) for(
i in 1length(Index)) ind Indexi column
index mean_wti mean(dog_dat,ind)
6
Overview
  • Review For Loop
  • Lists
  • Aside Patterns
  • Application

7
Lists
  • "An R list is an object consisting of an ordered
    collection of objects known as its components."
  • "Components are always numbered and may always be
    referred to as such" double brackets can subset
    lists

CRAN. Intro to R
8
Lists
gt L list() empty list gt L1 14 gt L2
27 gt L3 c("a","b","c") gt L4
matrix(rnorm(4), nrow 2) gt L 1 1 1 2 3
4 2 1 2 3 4 5 6 7 3 1 "a" "b"
"c" 4 ,1 ,2 1,
-1.43944849 -0.4801696 2, 0.09923108 1.0783053
9
Lists
gt names(L) c("seq1","seq2","letters","mat") gt
L seq1 1 1 2 3 4 seq2 1 2 3 4 5 6
7 letters 1 "a" "b" "c" mat ,1
,2 1, 1.824487 0.3431034 2, -0.533006
0.9406285
10
Lists
gt L1 1 1 2 3 4 gt str(L) List of 4 seq1
int 14 1 2 3 4 seq2 int 16 2 3 4 5
6 7 letters chr 13 "a" "b" "c" mat
num 12, 12 1.824 -0.533 0.343 0.941
11
Lists
  • Why know lists?
  • Can store data of different lengths and types
  • Some functions return lists

12
Lists
  • Load back in the lecture 4 data
  • We still have one problem to solve - the averages
    of weight, length, and food for each dog type at
    each visit

13
Lists
  • First we can create a list containing each group
    we care about

Indexes list() Indexes1 415 weight
Indexes2 1627 length Indexes3
2839 food names(Indexes) c("weight", "length
", "food")
14
Lists
gt Indexes weight 1 4 5 6 7 8 9 10 11 12
13 14 15 length 1 16 17 18 19 20 21 22 23 24
25 26 27 food 1 28 29 30 31 32 33 34 35 36
37 38 39
15
Lists
  • Next, we can create an output list for our
    results, and recreate the unique dog list for our
    loop

out lt- list() dogs unique(dog_datdog_type)
16
Lists
  • We want to loop over the different covariates
    (wt, len, food) and within each, the different
    dog types
  • For looping over the groups, either works

gt seq(along Indexes) 1 1 2 3 gt
1length(Indexes) 1 1 2 3
17
Lists
for(i in seq(along Indexes)) 13 take
the i'th index from the list Index
Indexesi for that variable, create a new
matrix tmp matrix(nrow length(dogs), ncol
length(Index)) ...
18
Lists
  • We can then fill in that temporary matrix with an
    inner 'for' loop
  • Note that this is the exact same loop as last
    week (note the j's)

Index from the outer loop
for(j in 1length(dogs)) hold
dog_datdog_datdog_type dogsj,Index tmpj,
colMeans(hold) rownames(tmp)
dogs colnames(tmp) paste("month",112,sep"_")
19
Lists
  • Lastly, we save that tmp matrix in our output
    list

outi tmp
20
for(i in seq(along Indexes)) groups
Index Indexesi tmp matrix(nrow
length(dogs), ncol length(Index)) for(j
in 1length(dogs)) dogs hold
dog_datdog_datdog_type dogsj,Index tmpj,
colMeans(hold) rownames(tmp)
dogs colnames(tmp) paste("month",112,sep"_
") outi tmp names(out) lt-
c("weight","length","food")
21
gt out weight month_1 month_2
month_3 month_4 month_5 month_6 month_7 lab
49.81840 48.69200 49.03360 50.26560 50.17600
49.67280 48.41600 poodle 49.40090 48.27297
48.61892 49.84414 49.76126 49.25856
47.99820 husky 49.26372 48.13097 48.48142
49.70088 49.61858 49.11327 47.86195 retriever
50.19474 49.06466 49.40602 50.62632 50.54361
50.04135 48.79248 month_8 month_9
month_10 month_11 month_12 lab 46.54640
44.68640 45.15040 44.30640 45.88240 poodle
46.12613 44.26577 44.73243 43.89009
45.46306 husky 45.98761 44.12832 44.59469
43.75221 45.31858 retriever 46.91278 45.05263
45.51654 44.68496 46.24586 length
month_1 month_2 month_3 month_4 month_5
month_6 month_7 lab 19.91840 20.16800
20.28720 20.49600 20.57840 20.86400
20.96800 poodle 20.63964 20.88198 21.00090
21.20991 21.29189 21.58108 21.68198 husky
20.29115 20.54159 20.65575 20.86195 20.94867
21.23805 21.34071 retriever 20.47068 20.71955
20.83233 21.04135 21.12556 21.41729 21.51880
month_8 month_9 month_10 month_11
month_12 lab 21.10400 21.20880 21.40720
21.57440 21.87440 poodle 21.82072 21.92432
22.12342 22.29009 22.58919 husky 21.47699
21.58142 21.77876 21.94779 22.24779 retriever
21.64962 21.75414 21.95263 22.12406
22.42406 food month_1 month_2
month_3 month_4 month_5 month_6 month_7 lab
30.04000 29.77200 28.77680 28.20880 29.52240
30.24960 30.90160 poodle 30.03063 29.76306
28.77117 28.20631 29.51892 30.23874
30.89910 husky 30.12301 29.85221 28.85841
28.29646 29.60973 30.33363 30.98584 retriever
29.89248 29.62556 28.63008 28.06617 29.37744
30.10075 30.75564 month_8 month_9
month_10 month_11 month_12 lab 29.20880
30.03200 29.89120 29.54240 30.89520 poodle
29.20631 30.02613 29.88739 29.53243
30.89550 husky 29.29646 30.11770 29.97345
29.62389 30.98053 retriever 29.06617 29.88722
29.74887 29.39248 30.75338
22
Overview
  • Review For Loop
  • Lists
  • Aside Patterns
  • Application

23
Aside
  • This step is potentially dangerous
  • Indexes1 415 weight
  • Indexes2 1627 length
  • Indexes3 2839 food
  • Is there a better way? YES! Each group shares a
    common term in the name
  • wt, len, food

24
Aside
  • grep(pattern, x) matches "pattern" in vector x

gt grep("wt", names(dog_dat)) 1 4 5 6 7 8
9 10 11 12 13 14 15 gt grep("len",
names(dog_dat)) 1 16 17 18 19 20 21 22 23 24
25 26 27 gt grep("food", names(dog_dat)) 1 28
29 30 31 32 33 34 35 36 37 38 39
25
Aside
gt Indexes list() gt Indexes1 grep("wt",
names(dog_dat)) gt Indexes2 grep("len",
names(dog_dat)) gt Indexes3 grep("food",
names(dog_dat)) gt Indexes 1 1 4 5 6 7
8 9 10 11 12 13 14 15 2 1 16 17 18 19 20
21 22 23 24 25 26 27 3 1 28 29 30 31 32
33 34 35 36 37 38 39
26
Aside
  • grep can be a lot more powerful when combined
    with 'regular expression' but we're not going to
    get into that

27
Aside
  • Opposite of paste strsplit(x, split) splits
    term 'x' on 'split' character or pattern
  • Returns a list

gt x paste("month",112,sep"_") gt
head(strsplit(x,"_"),3) 1 1 "month" "1"
2 1 "month" "2" 3 1 "month"
"3"
28
Aside
  • If you want one element (in this case, the
    number), easiest to just use a 'for' loop
  • If you split each element separately, the output
    list only has 1 element 1
  • You then need to figure out which slot you want
    using the single bracket

29
Aside
x paste("month",112,sep"_") num
rep(0,length(x)) for(i in 1length(x))
numi strsplit(xi,"_")12 gt i 1 gt
strsplit(xi,"_") list 1 1 "month" "1"
gt strsplit(xi,"_")1 vector 1 "month"
"1" gt strsplit(xi,"_")12
element 1 "1"
30
Overview
  • Review For Loop
  • Lists
  • Aside Patterns
  • Application

31
Applied Example
  • Load in "lec5_data.rda" from the course website
  • These are the people from "lec2_data.rda" that
    did not have a dog at baseline
  • Over monthly follow-up, some of these people
    borrowed dogs over the past month

32
Applied Example
  • dog_0 baseline dog ownership all of these
    people should have "no"
  • dog_1 - dog_12 did you borrow a dog over the
    past month?

33
Applied Example
  • Determine person-time at risk for dog borrowing
  • Create a "survival" dataset from this data with
    columns ID, start, end
  • Note that there is missing data

34
Applied Example
  • We want to convert each person's wide data into
    two numbers start and end
  • Because of missing data, some people might have
    more than 1 row people aren't at risk for dog
    borrowing if they did not report (/are missing)

35
Applied Example
  • Take person 1

gt dat1, id age sex height weight dog_0 dog_1
dog_2 1 1 40 F 63.5 134.5 no no
yes dog_3 dog_4 dog_5 dog_6 dog_7 dog_8 dog_9 1
yes no no yes yes no yes
dog_10 dog_11 dog_12 1 ltNAgt no no
36
Applied Example
  • Person 1 in the new dataset should be
  • ID start end
  • 1 0 9
  • 1 11 12

37
Applied Example
  • Basic premise write a for-loop that passes over
    each person and determines their non-missing
    follow-up time
  • Caveat how many rows do we make our output
    matrix?
  • Perfect opportunity for using rbind()

38
Applied Example
  • Create a matrix with 0 rows and 3 columns
  • Within the body of the loop, using rbind to
    append new rows (this is slow though)

gt out matrix(nr 0, nc 3) gt dim(out) 1 0
3 gt p1 c(1,0,9) gt out rbind(out, p1) gt out
,1 ,2 ,3 p1 1 0 9
39
Applied Example
out matrix(nrow 0, ncol 3) cols
grep("dog", names(dat)) for(i in 1nrow(dat))
hold as.numeric(dati,cols) ...
40
Applied Example
  • Here, the follow-up results are factors, which
    have numerical values

gt dati,cols dog_0 dog_1 dog_2 dog_3 dog_4
dog_5 dog_6 1 no no yes yes no no
yes dog_7 dog_8 dog_9 dog_10 dog_11 dog_12 1
yes no yes ltNAgt no no gt
as.numeric(dati,cols) 1 1 1 2 2 1 1 2
2 1 2 NA 1 1
41
Applied Example
  • Now a cool little trick rle() run length
    encoding
  • Compute the lengths and values of runs of equal
    values in a vector
  • We're going to combine this with is.na()

42
Applied Example
  • This says that there are 10 FALSE in a row, then
    1 TRUE, then 2 FALSE
  • We need to get this in a better format

gt x rle(is.na(hold)) gt x Run Length Encoding
lengths int 13 10 1 2 values logi 13
FALSE TRUE FALSE
43
Applied Example
gt x data.frame(cbind(xvalues, xlength)) gt
names(x) lt- c("missing", "length") gt x missing
length 1 0 10 2 1 1 3
0 2
44
Applied Example
  • cumsum() returns the cumulative sum of a vector

gt xend lt- cumsum(xlength) gt xstart lt- xend -
xlength 1 gt gt x missing length end start 1
0 10 10 1 2 1 1 11
11 3 0 2 13 12
45
Applied Example
  • Note that we actually want all of the values to
    be less one, since our time starts at 0

gt xend lt- cumsum(xlength) - 1 gt xstart lt-
xend - xlength 1 gt x missing length end
start 1 0 10 9 0 2 1 1
10 10 3 0 2 12 11
46
Applied Example
  • Quick rearrangement

gt x lt- x,c(1,2,4,3) gt x missing length start
end 1 0 10 0 9 2 1 1
10 10 3 0 2 11 12
47
Applied Example
  • We want the last two columns of the non-missing
    visits

gt tmp xwhich(xmissing 0),34 gt tmp
start end 1 0 9 3 11 12
48
Applied Example
  • We then want to add a column of the individual ID
    to the front

id dati,1 tmp cbind(rep(id,nrow(tmp)),
tmp) names(tmp)1 "ID" gt tmp ID start end 1
1 0 9 3 1 11 12
49
Applied Example
  • Lastly, bind the tmp matrix to the growing out
    matrix
  • This finishes off our loop body

out rbind(out,tmp)
50
for(i in 1nrow(dat)) hold
as.numeric(dati,cols) x rle(is.na(hold)) x
data.frame(cbind(xvalues, xlength)) names(x)
lt- c("missing", "length") xend lt-
cumsum(xlength) - 1 xstart lt- xend - xlength
1 x lt- x,c(1,2,4,3) tmp
xwhich(xmissing 0),34 id dati,1 tmp
cbind(rep(id,nrow(tmp)), tmp) names(tmp)1
"ID" out rbind(out,tmp) rownames(out)
1nrow(out) cleaning
51
gt head(out,10) ID start end 1 1 0 9 2
1 11 12 3 2 0 5 4 2 7 12 5
3 0 2 6 3 5 12 7 4 0 3 8 4
6 12 9 5 0 0 10 5 3 8 gt
dim(out) 1 1414 3
52
Applied Example
  • One last adjustment needed, since we asked about
    borrowing a dog in the previous month
  • The non-0 starts must be less 1 since these are
    currently indices of visit, but not time at risk

ID start end 1 1 0 9 2 1 11 12
ID start end 1 1 0 9 2 1 10 12
53
Applied Example
  • What is the total time at risk of this
    population?

gt time outend - outstart gt sum(time) 1
4988 person-months
54
Applied Example
  • Save the 'out' matrix as an rda so it can be used
    next week
Write a Comment
User Comments (0)
About PowerShow.com