Speedy R:

loops, parallelization, and the cloud

Andrés Cruz

UT GOV Methods Workshop, 2023-02-22

Intro

  • We’ll cover:
    • ➿ Loops: repeat operations
    • ⛓️ Parallelization: make loops fast
    • ☁️ The cloud: unlock ∞ computing resources
  • To follow along:
    1. Download the materials from the link below
    2. (Maybe) create a DigitalOcean account (for students via GitHub; for non-students)

1. Loops

For loops (I)

for (i in 1:5){
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
# bootstrap the mean of MPG
set.seed(512)
for (i in 1:1000){
  m <- mtcars[sample(1:nrow(mtcars), size = nrow(mtcars), replace = T),]
  print(mean(m$mpg))
}
[1] 20.29062
[1] 19.2875
[1] 20.67188
[1] 19.8875
[1] 18.29062
[1] 19.46562
[1] 19.32187
[1] 20.04062
[1] 20.075
[1] 20.42813
[1] 19.3375
[1] 21.15312
[1] 19.86875
[1] 20.275
[1] 19.67813
[1] 19.7
[1] 21.675
[1] 19.60625
[1] 18.4625
[1] 18.83438
[1] 19.33438
[1] 20.61563
[1] 19.75625
[1] 20.80312
[1] 20.03125
[1] 21.21875
[1] 20.84375
[1] 20.78125
[1] 19.6875
[1] 19.8125
[1] 21.17813
[1] 20.18125
[1] 21.125
[1] 20.68437
[1] 20.9
[1] 21.3375
[1] 17.85312
[1] 19.60312
[1] 18.85938
[1] 20.26875
[1] 20.225
[1] 18.075
[1] 19.4375
[1] 19.73125
[1] 21.31563
[1] 20.55313
[1] 19.90938
[1] 19.3625
[1] 20.775
[1] 19.45938
[1] 20.35
[1] 19.02812
[1] 19.14062
[1] 18.89375
[1] 19.25937
[1] 20.24063
[1] 17.90938
[1] 19.25937
[1] 20.59688
[1] 18.6875
[1] 20.34062
[1] 19.86875
[1] 18.53125
[1] 19.51562
[1] 21.61875
[1] 19.38437
[1] 18.9875
[1] 19.16875
[1] 22.48438
[1] 21.225
[1] 20.25625
[1] 20.62813
[1] 19.72812
[1] 21.20312
[1] 20.09688
[1] 20.81875
[1] 20.09062
[1] 19.55625
[1] 19.90938
[1] 18.975
[1] 18.99063
[1] 19.725
[1] 19.77812
[1] 20.59062
[1] 21.97188
[1] 20.15938
[1] 20.39687
[1] 18.87813
[1] 21.02812
[1] 20.09375
[1] 20.90625
[1] 20.675
[1] 19.13125
[1] 19.99375
[1] 19.7875
[1] 18.775
[1] 19.34062
[1] 19.91875
[1] 18.9375
[1] 19.9625
[1] 20.27188
[1] 18.56875
[1] 20.76562
[1] 21.01875
[1] 20.075
[1] 20.05937
[1] 20.925
[1] 20.39062
[1] 19.04062
[1] 17.5625
[1] 21.01562
[1] 19.1875
[1] 19.64375
[1] 20.69687
[1] 19.04688
[1] 19.15938
[1] 20.3
[1] 18.8625
[1] 21.93125
[1] 19.30313
[1] 20.60938
[1] 21.85938
[1] 20.85625
[1] 18.0875
[1] 19.39375
[1] 18.42813
[1] 21.48125
[1] 19.25313
[1] 20.5625
[1] 19.48438
[1] 18.5
[1] 19.00313
[1] 19.4
[1] 20.5875
[1] 19
[1] 19.475
[1] 21.66562
[1] 19.69063
[1] 20.225
[1] 20.26875
[1] 18.62813
[1] 19.82812
[1] 20.275
[1] 21.74375
[1] 22.08125
[1] 18.87187
[1] 18.87813
[1] 20.26562
[1] 19.42813
[1] 19.10312
[1] 20.05
[1] 20.29062
[1] 19.35
[1] 21.04688
[1] 19.525
[1] 18.30313
[1] 20.97812
[1] 23.19375
[1] 19.99063
[1] 19.575
[1] 21.54375
[1] 20.41875
[1] 19.0375
[1] 21.19687
[1] 20.25937
[1] 20.775
[1] 18.95312
[1] 21.26875
[1] 21.35
[1] 20.69687
[1] 19.44687
[1] 17.9375
[1] 20.39375
[1] 17.67188
[1] 18.51875
[1] 19.70937
[1] 19.35312
[1] 20.25313
[1] 19.79375
[1] 19.27812
[1] 19.11563
[1] 19.39375
[1] 20.51875
[1] 20.73125
[1] 19.40312
[1] 18.86563
[1] 20.04688
[1] 19.93437
[1] 20.91875
[1] 20.34375
[1] 21.525
[1] 18.325
[1] 20.6
[1] 19.30937
[1] 20.06563
[1] 18.88437
[1] 20.50937
[1] 19.84062
[1] 21.07812
[1] 20.87187
[1] 22.05312
[1] 18.20625
[1] 21.15312
[1] 18.11563
[1] 21.10938
[1] 20.6375
[1] 18.65938
[1] 19.1625
[1] 19.65312
[1] 19.2125
[1] 20.17188
[1] 19.31875
[1] 20.90938
[1] 21.08437
[1] 20.41875
[1] 19.90625
[1] 19.62813
[1] 18.82812
[1] 18.4375
[1] 22.125
[1] 22.10312
[1] 17.81563
[1] 18.97812
[1] 18.96875
[1] 20.25625
[1] 19.19687
[1] 19.4625
[1] 19.25937
[1] 21.15
[1] 20.86563
[1] 20.02187
[1] 17.49687
[1] 20.86563
[1] 20.32812
[1] 16.94687
[1] 20.19063
[1] 20.75625
[1] 19.13437
[1] 20.99687
[1] 19.87813
[1] 18.57812
[1] 19.83125
[1] 21.05
[1] 22.40938
[1] 19.63437
[1] 18.85312
[1] 21.05313
[1] 18.65
[1] 19.7
[1] 21.03438
[1] 18.61563
[1] 19.39375
[1] 19.12813
[1] 19.7625
[1] 20.05
[1] 18.71562
[1] 18.79688
[1] 17.22812
[1] 21.42813
[1] 19.25313
[1] 20.61875
[1] 20.45625
[1] 19.475
[1] 19.95
[1] 19.69375
[1] 21.1375
[1] 20.125
[1] 20.21562
[1] 20.97812
[1] 22.04375
[1] 20.39375
[1] 18.4875
[1] 20.50937
[1] 21.79375
[1] 20.78438
[1] 21.17188
[1] 18.16562
[1] 21.3125
[1] 20.55937
[1] 21.27812
[1] 18.61875
[1] 19.9625
[1] 20.09062
[1] 18.65312
[1] 22.01875
[1] 20.5875
[1] 21.10625
[1] 19.72188
[1] 21.0375
[1] 21.95
[1] 19.32187
[1] 19.0125
[1] 20.975
[1] 18.60938
[1] 19.14375
[1] 20.3875
[1] 19.1625
[1] 20.5375
[1] 20.53438
[1] 20.86875
[1] 19.57812
[1] 19.60312
[1] 19.08125
[1] 19.93125
[1] 20.8125
[1] 19.44063
[1] 19.9375
[1] 20.71562
[1] 20.09062
[1] 19.275
[1] 19.93125
[1] 19.74375
[1] 18.27188
[1] 19.975
[1] 21.6
[1] 21.21875
[1] 21.56563
[1] 21.17812
[1] 19.5125
[1] 21.525
[1] 20.50625
[1] 17.59688
[1] 18.79062
[1] 19.15938
[1] 21.62813
[1] 21.90938
[1] 20.31875
[1] 20.43125
[1] 21.20312
[1] 21.20938
[1] 20.46562
[1] 19.69063
[1] 20.10625
[1] 19.04375
[1] 19.92812
[1] 19.33438
[1] 19.92188
[1] 18.95625
[1] 19.74687
[1] 21.85625
[1] 19.68125
[1] 19.94063
[1] 20.6625
[1] 19.6375
[1] 21.02188
[1] 21.29375
[1] 20.56563
[1] 19.1625
[1] 20.75625
[1] 20.14062
[1] 19.05937
[1] 20.69063
[1] 18.97188
[1] 20.95938
[1] 20.62813
[1] 20.65
[1] 19.14688
[1] 21.28438
[1] 19.57812
[1] 17.69375
[1] 22.38125
[1] 18.84688
[1] 20.42813
[1] 19.84375
[1] 19.43437
[1] 20.85
[1] 20.73125
[1] 21.12813
[1] 18.86563
[1] 19.22812
[1] 20.78438
[1] 20.68437
[1] 20.6
[1] 20.65312
[1] 18.46875
[1] 19.77188
[1] 19.625
[1] 21.07812
[1] 18.7625
[1] 18.375
[1] 18.35312
[1] 19.89375
[1] 22.00625
[1] 22.1375
[1] 22.625
[1] 20.22188
[1] 20.52187
[1] 19.49375
[1] 20.83125
[1] 19.4
[1] 18.825
[1] 19.05
[1] 20.1375
[1] 21.09688
[1] 20.15938
[1] 21.26875
[1] 19.74063
[1] 20.24062
[1] 19.21875
[1] 20.76875
[1] 20.84062
[1] 19.68437
[1] 20.53125
[1] 20.94375
[1] 19.75
[1] 21.01562
[1] 18.93437
[1] 17.73125
[1] 19.91562
[1] 20.49063
[1] 20.65312
[1] 21.04688
[1] 19.56563
[1] 19.225
[1] 19.19375
[1] 17.67188
[1] 20.76875
[1] 20.30937
[1] 20.76875
[1] 18.91562
[1] 18.99063
[1] 22.7125
[1] 18.22188
[1] 19.23125
[1] 19.8
[1] 21.40938
[1] 21.90312
[1] 20.17188
[1] 18.63437
[1] 21.27187
[1] 18.69063
[1] 19.66875
[1] 19.49375
[1] 20.12187
[1] 19.74063
[1] 19.64062
[1] 20.1625
[1] 20.33125
[1] 19.85938
[1] 20.47812
[1] 19.05625
[1] 21.01875
[1] 18.85938
[1] 20.29688
[1] 20.33438
[1] 19.80937
[1] 17.60625
[1] 19.60312
[1] 20.95625
[1] 19.2125
[1] 19.66875
[1] 18.4
[1] 20.43437
[1] 18.95938
[1] 19.33125
[1] 18.77188
[1] 20.14688
[1] 19.20312
[1] 20.27188
[1] 20.04375
[1] 18.20312
[1] 20.23438
[1] 19.47188
[1] 20.09375
[1] 22.55937
[1] 18.57812
[1] 19.91875
[1] 18.36875
[1] 20.88437
[1] 19.08438
[1] 20.99063
[1] 20.51875
[1] 20.3625
[1] 20.01875
[1] 19.95
[1] 19.125
[1] 18.28438
[1] 19.025
[1] 20.26562
[1] 19.35
[1] 19.1625
[1] 20.95625
[1] 19.59375
[1] 20.45938
[1] 19.87813
[1] 21.8375
[1] 19.40312
[1] 18.65938
[1] 21.41562
[1] 20.9125
[1] 22.14062
[1] 20.1625
[1] 21.69063
[1] 20.2
[1] 20.57187
[1] 22.02188
[1] 19.7
[1] 19.81563
[1] 20.14688
[1] 19.57812
[1] 19.20625
[1] 19.09688
[1] 20.25
[1] 18.96562
[1] 19.775
[1] 20.19063
[1] 19.30937
[1] 19.74687
[1] 20.5875
[1] 19.39375
[1] 20.825
[1] 20.075
[1] 20.89062
[1] 21.3875
[1] 19.7125
[1] 20.06563
[1] 20.09375
[1] 20.975
[1] 19.33438
[1] 20.03438
[1] 19.375
[1] 22.16562
[1] 21.03438
[1] 19.80313
[1] 20
[1] 20.2625
[1] 18.925
[1] 19.08438
[1] 21.73438
[1] 22.95312
[1] 20.36875
[1] 20.63437
[1] 19.07187
[1] 21.95938
[1] 19.25313
[1] 18.91875
[1] 19.71562
[1] 20.53438
[1] 20.5125
[1] 20.36875
[1] 18.56563
[1] 18.94063
[1] 19.08125
[1] 20.65938
[1] 19.70938
[1] 19.74375
[1] 20.25
[1] 20.01875
[1] 19.925
[1] 20.12813
[1] 19.7375
[1] 20.84062
[1] 20.80625
[1] 21.5375
[1] 19.21875
[1] 20.3
[1] 20.81563
[1] 21.90938
[1] 19.375
[1] 20.81563
[1] 20.1875
[1] 20.93437
[1] 21.0125
[1] 18.03438
[1] 19.94375
[1] 20.06875
[1] 22.09375
[1] 19.58125
[1] 19.59375
[1] 19.03438
[1] 19.00937
[1] 19.65625
[1] 20.4625
[1] 20.63437
[1] 19.68125
[1] 19.83125
[1] 19.50937
[1] 19.46875
[1] 19.54688
[1] 19.49687
[1] 20.06563
[1] 19.63437
[1] 19.83438
[1] 19.70625
[1] 22.55937
[1] 19.10625
[1] 19.7125
[1] 19.15625
[1] 20.06563
[1] 18.03438
[1] 18.60938
[1] 20.26562
[1] 20.24375
[1] 20.00313
[1] 19.63125
[1] 19.35938
[1] 19.51875
[1] 19.69687
[1] 21.14688
[1] 21.17188
[1] 21.2625
[1] 21.00313
[1] 20.2125
[1] 21.5
[1] 20.05937
[1] 20.37187
[1] 19.96562
[1] 20.48438
[1] 18.43437
[1] 20.6875
[1] 20.15938
[1] 19.30937
[1] 20.7875
[1] 21.25
[1] 21.62187
[1] 22.35625
[1] 21
[1] 18.54688
[1] 19.96562
[1] 19.50937
[1] 18.44687
[1] 18.37187
[1] 19.84375
[1] 21.53125
[1] 20.67188
[1] 21.25
[1] 19.25313
[1] 19.16562
[1] 20.82187
[1] 17.75937
[1] 19.9875
[1] 21.69063
[1] 20.34688
[1] 19.1625
[1] 18.725
[1] 20.80625
[1] 19.95625
[1] 21.64375
[1] 22.1375
[1] 19.90312
[1] 19.49687
[1] 20.6375
[1] 21.00313
[1] 18.95312
[1] 19.39375
[1] 19.3375
[1] 19.15
[1] 20.23438
[1] 19.175
[1] 19.09062
[1] 18.84375
[1] 21.45312
[1] 18.88125
[1] 19.99687
[1] 20.0625
[1] 19.3375
[1] 19.66562
[1] 19.0625
[1] 19.175
[1] 19.55937
[1] 19.09062
[1] 19.72812
[1] 20.31875
[1] 19.35312
[1] 20.7
[1] 20.11875
[1] 20.42188
[1] 21.46875
[1] 18.62813
[1] 19.80937
[1] 21.5875
[1] 19.89062
[1] 19.62187
[1] 18.74063
[1] 19.07187
[1] 18.91562
[1] 20.30937
[1] 19.7625
[1] 19.75313
[1] 19.26875
[1] 18.93125
[1] 19.975
[1] 19.39375
[1] 19.93437
[1] 18.10312
[1] 18.71875
[1] 21.27812
[1] 18.57812
[1] 21.23438
[1] 21.18437
[1] 20.31875
[1] 21.62813
[1] 19.11875
[1] 20.725
[1] 21.36562
[1] 18.9875
[1] 24.04375
[1] 21.74063
[1] 19.09062
[1] 19.94375
[1] 20.97812
[1] 21.625
[1] 20.0125
[1] 19.825
[1] 21.40625
[1] 20.09062
[1] 18.95312
[1] 20.85938
[1] 19.69063
[1] 19.14062
[1] 18.50313
[1] 19.725
[1] 21.84062
[1] 18.83438
[1] 18.9625
[1] 19.7
[1] 19.19375
[1] 20.90938
[1] 18.86875
[1] 20.0625
[1] 19.68437
[1] 19.07812
[1] 20.43125
[1] 20.56563
[1] 19.275
[1] 19.975
[1] 20.10625
[1] 21.64062
[1] 21.0625
[1] 21.30313
[1] 19.15312
[1] 19.3875
[1] 19.3125
[1] 20.24687
[1] 19.3375
[1] 17.46875
[1] 19.76562
[1] 20.3625
[1] 20.6125
[1] 20.82812
[1] 20.42188
[1] 17.87187
[1] 20.625
[1] 19.23125
[1] 20.54062
[1] 20.75625
[1] 19.6
[1] 20.14062
[1] 20.54062
[1] 19.30625
[1] 19.51562
[1] 18.55625
[1] 23.65625
[1] 20.175
[1] 19.85938
[1] 20.61563
[1] 20.30625
[1] 18.83125
[1] 22.44375
[1] 21.9125
[1] 20.01562
[1] 21.42813
[1] 19.05937
[1] 18.79375
[1] 19.275
[1] 19.78125
[1] 18.54375
[1] 20.44375
[1] 19.86563
[1] 19.91875
[1] 19.54688
[1] 20.31875
[1] 21.1125
[1] 19.7125
[1] 22.18437
[1] 20.37187
[1] 19.27812
[1] 19.87187
[1] 19.89688
[1] 20.15938
[1] 19.95625
[1] 19.80313
[1] 18.62813
[1] 18.62187
[1] 20.15938
[1] 21.28125
[1] 19.65
[1] 20.725
[1] 19.00313
[1] 21.00625
[1] 21.44063
[1] 20.28438
[1] 20.86563
[1] 20.66562
[1] 17.95938
[1] 19.0375
[1] 19.6875
[1] 19.6125
[1] 19.25313
[1] 18.7875
[1] 19.6625
[1] 20.24375
[1] 20.7625
[1] 19.56875
[1] 20.1625
[1] 21.62813
[1] 19.69375
[1] 20.87187
[1] 19.50625
[1] 19.875
[1] 19.89688
[1] 20.3125
[1] 20.26875
[1] 19.28125
[1] 18.43125
[1] 21.02187
[1] 19.00937
[1] 20.64375
[1] 18.60938
[1] 20.76875
[1] 20.20938
[1] 18.9125
[1] 19.8875
[1] 20.61875
[1] 18.675
[1] 20.85625
[1] 19.1875
[1] 20.20938
[1] 19.65
[1] 20.7375
[1] 20.34062
[1] 19.20625
[1] 19.97812
[1] 21.77188
[1] 20.9625
[1] 19.125
[1] 21.35938
[1] 20.59375
[1] 20.4375
[1] 21.40938
[1] 20.43437
[1] 20.66875
[1] 20.15625
[1] 19.94063
[1] 18.48125
[1] 18.74063
[1] 22.84688
[1] 19.65625
[1] 20.55313
[1] 19.175
[1] 20.45312
[1] 18.64062
[1] 19.7875
[1] 18.8375
[1] 19.575
[1] 22.1
[1] 20.575
[1] 20.7625
[1] 20.15938
[1] 19.14375
[1] 20.99375
[1] 18.67813
[1] 19.2
[1] 22.06563
[1] 20.4375
[1] 19.93437
[1] 20.59688
[1] 18.73125
[1] 18.6875
[1] 20.0875
[1] 20.05937
[1] 19.03125
[1] 19.30313
[1] 20.35312
[1] 18.76562
[1] 19.45625
[1] 20.1875
[1] 19.59062
[1] 21.1
[1] 21.81875
[1] 20.1125
[1] 20.05937
[1] 20.36563
[1] 20.3875
[1] 19.6125
[1] 20.1
[1] 18.99687
[1] 20.64062
[1] 20.575
[1] 19.325
[1] 20.8125
[1] 19.74375
[1] 18.49375
[1] 20.9375
[1] 19.06563
[1] 19.43125
[1] 21.35
[1] 20.03125
[1] 19.40312
[1] 19.86563
[1] 20.52812
[1] 19.3875
[1] 21.71562
[1] 19.29062
[1] 20.68437
[1] 18.725
[1] 20.975
[1] 17.91562
[1] 19.00625
[1] 22.15312
[1] 20.87813
[1] 21.46875
[1] 19.84375
[1] 20.94375
[1] 18.99062
[1] 20.32812
[1] 19.88125
[1] 18.91875
[1] 18.61563
[1] 19.79062
[1] 20.1125
[1] 22.14375
[1] 18.26875
[1] 20.58125
[1] 19.28125
[1] 19.0625
[1] 20.97188
[1] 21.7875
[1] 20.15938
[1] 20.65938
[1] 21.60938
[1] 17.61563
[1] 20.72812
[1] 21.19063
[1] 19.96875
[1] 20.4375
[1] 19.99687
[1] 19.35938
[1] 19.41562
[1] 18.92812
[1] 19.63125
[1] 18.59375
[1] 19.275
[1] 19.86563
[1] 19.825
[1] 19.4375
[1] 19.42813
[1] 20.39688
[1] 19.04688
[1] 18.55937
[1] 19.23438
[1] 18.82812
[1] 19.47812
[1] 20.44375
[1] 19.95312
[1] 18.01562
[1] 21.20938
[1] 19.27188
[1] 20.74687
[1] 18.11875
[1] 21.45312
[1] 19.33125
[1] 20.60312
[1] 20.35938
[1] 22.15625
[1] 18.72812
[1] 21.425
[1] 21.7375
[1] 17.95
[1] 19.47812
[1] 18.2125
[1] 20.4125
[1] 20.7125
[1] 21.3375
[1] 20.40625
[1] 19.84688
[1] 19.94687
[1] 20.2
[1] 20.39688
[1] 19.85625
[1] 19.07187
[1] 20.6
[1] 21.7375
[1] 19.575
[1] 20.89688
[1] 21.80937
[1] 19.45312
[1] 19.79688
[1] 20.325
[1] 19
[1] 19.75313
[1] 20.81875
[1] 19.1
[1] 21.66562
[1] 21.85
[1] 19.30937
[1] 21.35938
[1] 19.90938
[1] 20.3
[1] 21.15

For loops (II) – store results

set.seed(512)
bootstrapped_means <- vector(mode = "numeric", length = 1000)
for (i in 1:1000){
  m <- mtcars[sample(1:nrow(mtcars), size = nrow(mtcars), replace = T),]
  bootstrapped_means[[i]] <- mean(m$mpg)
}
hist(bootstrapped_means)

Functional loops (I)

set.seed(512)
l_bootstrapped_means <- lapply(
  1:1000, 
  function(x){
    m <- mtcars[sample(1:nrow(mtcars), size = nrow(mtcars), replace = T),]
    return(mean(m$mpg))
  }
)

Functional loops (II)

# generate 100 .csv datasets in the data/ folder
dir.create("data/")
for (i in 1:100){write.csv(mtcars, paste("data/dataset", i, ".csv", sep = ""))}
# recover the .csv files' filepaths
my_files <- list.files("data/", full.names = T)
sample(my_files, size = 3)
[1] "data//dataset65.csv"  "data//dataset32.csv"  "data//dataset100.csv"
# load them into one unified dataset, with a column for the filepaths
l_datasets <- lapply(my_files, function(x){
  one_dataset <- read.csv(x)
  one_dataset$filepath <- x
  return(one_dataset)
})
unified_dataset <- do.call(rbind, l_datasets)
unified_dataset[c(1, nrow(unified_dataset)), c("X", "mpg", "filepath")]
              X  mpg            filepath
1     Mazda RX4 21.0  data//dataset1.csv
3200 Volvo 142E 21.4 data//dataset99.csv

Loop limitations (I)

  • Loops are slower than native vectorized functions
v1 <- rnorm(1000000); v2 <- rnorm(1000000)
system.time(v1 + v2)
   user  system elapsed 
  0.002   0.000   0.002 
system.time(lapply(seq_along(v1), function(i){sum(v1[i], v2[i])}))
   user  system elapsed 
  0.792   0.016   0.809 
  • lapply() always outputs a list. For more flexibility and other goodies, check out the purrr package from the tidyverse.

Loop limitations (II)

  • An error at just one of the iterations will “break” the loop
lapply(list(0, 1, "apple"), function(x){log(x + 1)})
Error in x + 1: non-numeric argument to binary operator
lapply(list(0, 1, "apple"), function(x){try(log(x + 1))})
Error in x + 1 : non-numeric argument to binary operator
[[1]]
[1] 0

[[2]]
[1] 0.6931472

[[3]]
[1] "Error in x + 1 : non-numeric argument to binary operator\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in x + 1: non-numeric argument to binary operator>

2. Parallelization

The idea behind parallelization

  • Modern computers have multiple threads in which to compute operations (laptops usually have 8–16)

  • By default, R uses only one thread…

  • So multiple solutions have been created to run R in parallel

  • Any set of independent operations can be parallelized

Source: Athena on Pexels, 20194

Functional loops in parallel (I)

  • To check how many threads are available in your system:
parallel::detectCores() # ↓ this is the output for my laptop ↓
[1] 16
system.time(lapply(1:8, function(x){Sys.sleep(1)}))
   user  system elapsed 
  0.001   0.000   8.008 
library(future.apply)
plan(multisession, workers = 8)
system.time(future_lapply(1:8, function(x){Sys.sleep(1)}))
   user  system elapsed 
  0.131   0.007   1.545 

Functional loops in parallel (II)

set.seed(512)
system.time(
  l_bootstrapped_means <- lapply(1:1000000, function(x){
      m <- mtcars[sample(1:nrow(mtcars), size = nrow(mtcars), replace = T),]
      return(mean(m$mpg))
  })
)
   user  system elapsed 
 61.080   0.004  61.087 
library(future.apply); plan(multisession, workers = 8)
system.time(
  l_bootstrapped_means <- future_lapply(1:1000000, function(x){
      m <- mtcars[sample(1:nrow(mtcars), size = nrow(mtcars), replace = T),]
      return(mean(m$mpg))
  }, future.seed = 512)
)
   user  system elapsed 
  2.453   0.062  15.131 

Parallelization problems

  • Some packages use multiple threads by default. If you try to parallelize on top of them, bad things will happen
    • Notable examples include fixest and data.table
    • They will usually tell you, and can be overridden. For instance, with fixest::setFixest_nthreads(1)
  • There is a setup cost involved in distributing your computation across threads
    • It takes time, but also computer memory (RAM)
    • Ultimately, you have to test what works for your problem

3. The cloud

The cloud and when to use it

  • You rent a️ (super)computer for a limited time, billed by the second. Sometimes called a “virtual machine”
  • Useful when you need raw computing power. Choose:
    • Processor and # of threads
    • RAM
    • Storage capacity
  • Useful when you need to run something for a long time
  • Works beautifully for parallelized loops

Cloud providers

  • The most well-known providers are Amazon Web Services (AWS) and Google Cloud Platform (GCP)
  • Pros: easy to use
  • Cons: fewer available configs; no GPUs

Source: Mark Drake, 20196

DigitalOcean

  • DigitalOcean’s virtual machines are called 💧droplets💧

  • Let’s check the possible configs and prices here

  • Now I’ll create a droplet in real time!

Droplet setup (I)

  • “Spin up a Droplet” or “Droplets” > “Create Droplet”

Droplet setup (II)

Droplet setup (III)

Droplet setup (IV)

  • “Create Droplet” and wait 1-2mins.

  • In the console, type adduser rstudio, choose a password, and type Enter and Y to confirm

  • In your web browser, go to <ip>:8787 (e.g., 165.227.82.108:8787) and log in with username “rstudio” and your newly-created password

RStudio server

  • We can run code, install packages, and do pretty much anything that we could do on the desktop

  • To upload or download files, use the bottom-right “Files” panel

    • Uploading .zip files will uncompress the folder automatically!
  • To terminate the droplet, go to its control panel and “Destroy”>“Destroy Droplet”

Cloud conundrums

  • If you work with sensitive data, you might want to access a cloud provider via UT.

  • 💰 Remember to terminate your virtual machines! 💰

    • Activate billing alerts. On DO it’s under “Billing”
  • You can host your own Shiny server on the cloud (see tutorial by Saskia A. Otto)

Thank you!

<andres.cruz@utexas.edu>

Footnotes

  1. Allison Horst. 2019. Illustrations from Hadley Wickham’s ACM talk “The Joy of Functional Programming (for Data Science).”

  2. Hadley Wickham, 2019. The Joy of Functional Programming (for Data Science)

  3. Juggler on Wikimedia Commons, 2009. Passing 2jugglers 6balls PPSS side.gif

  4. Athena on Pexels, 2019. Computer processor

  5. NASA on Wikimedia Commons, 2017. PIA21474-CrabNebula-5Observatories-Animation.gif

  6. Mark Drake, 2019. How To Probe the Depths of Nautically-Themed Open-Source Projects Using Moby Dick