-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathpreserve_computing-env.qmd
368 lines (232 loc) · 8.32 KB
/
preserve_computing-env.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
---
title: "Preserve your computing environment"
---
Your analysis was done with specific versions of the program used (e.g. R 4.4.3) but also of all the packages involved, as well as the specifications of the Operating System (OS) that was used. The good news is that there are tools to help you systematically capture this information.
In R:
`sessionInfo()` or `devtools::session_info()` are great ways to capture this information. You should save it into a session_info.txt file to include in your GitHub repository
In Python:
`pip freeze > requirements.txt` will capture all the python modules installed in your current environment
## Virtual environments
You can even go a step further and help others to recreate the same computing environment that you used independently of what versions you have installed on your machine.
### R
Here is a short introduction to `renv`, an R package that creates virtual environment to encapsulate your R work. ![source: https://rstudio.github.io/renv/articles/renv.html](../../img/renv.png)
We are going to add renv to our shorebird data cleaning project. Make
sure you have the renv package installed:
```r
install.packages("renv")
```
You may also choose Tools>Project Options>Environments and check "use
renv for this project"
Or at the R console:
```r
renv::init( )
```
Let's check the new files that we have...
- Look at the .gitignore
- Look at the renv.lock file
Let's say we reworked our script and suppose we would need the
[naniar](https://cran.r-project.org/web/packages/naniar/vignettes/getting-started-w-naniar.html)
R package to deal with the NAs. Let's update or virtual environment:
```r
renv::install("naniar")
```
Add some R code using this package, for example
```r
library(naniar)
snowsurvey_csv %>%
miss_var_summary()
```
Alright, now that the installation is completed and we added this package to our code, we can save, and take a snapshot:
```r
renv::snapshot()
```
This action will update the lock file, and we will see naniar and all
its dependencies included.
Let's check. If updated, you should be good to go. If your attempts to
update packages introduced new problems, you may run `renv::restore()` to
revert to the previous state as encoded in the lockfile.
Use `.libPaths()` to confirm where package installations are located!
Want to know more? here is a good resource to get started: <https://rstudio.github.io/renv/articles/renv.html>
### Python
[`virtualenv`](https://docs.python-guide.org/dev/virtualenvs/#lower-level-virtualenv) is a tool to create Python virtual environments. In a nutsheel here are the steps to follow:
Create a new folder, your project folder. I will use `first_example`. Make
sure you set the path to this folder.
- Note: The environment will be created in the current version of
Python that you are running (in Conda we can specify the version we
want).
To create the environment: (second venv is the name of the environment)
```bash
python3 -m venv first_example
```
- The `-m` flag makes sure you are creating a pip that is tied to the
active Python executable
**Time to activate it:**
```bash
source first_example/bin/activate
```
You can tell it is activated because it shows (first_example) in the prompt.
Let's check which packages are there with a new `pip list`
Nothing, right? Only setup tools, and pip). Nothing to worry about, it
should be this way! Let's proceed.
**Install libraries:**
We will be installing two packages for this example.
First:
```bash
pip install numpy
```
And then:
```bash
pip install pandas
```
_This should take a little longer!_
Another `pip list`
Alright, the packages and dependencies installed are right there!
**Export and allow future replication of the environment:**
Let's save the packages and dependencies we have after the installs.
```bash
pip freeze
```
That should be stored in a `requirements.txt file
So let's get it redirected to the required file:
```bash
pip freeze > requirements.txt
```
Question: This file won't leave inside the venv folder, but rather in
the project root folder any idea why?
Well, you only need that file to reproduce the environment. And the venv
should be should throw away and be able to rebuild easily! So, do not
include any project file in that folder and treat that as disposable
after the pip freeze
To double-check if all is good, we can run the following command:
```bash
cat requirements.txt
```
This file should be included in your repository to let others
reinstall your packages and dependencies as needed.
**Deactivate**
If you are done with that, you should deactivate that environment by
typing:
```bash
deactivate
```
Then, you will see you no longer have the environment we created in our
prompt.
If you are willing to delete the environment altogether, you should
delete the directory for the virtual environment
Remove folder:
```bash
rm -rf first_example/
```
**Reusing the Requirements**
*Create a new project folder to reuse the requirements*
```bash
mkdir my_project
```
*Create a virtual env for it*
```bash
python3 -m venv my_project/venv
```
*Activate it*
```bash
source my_project/venv/bin/activate
```
*Install required packages*
```bash
pip install -r requirements.txt
```
- Attention! Never include project files in the venv folder.
- Do not commit your venv file to the environment itself to source
control (git ignore)
- You may install more packages and update the requirements.txt with
the pip freeze command
- You should commit /share only your requirements.txt file. That is
all that others and your future self need to recreate the
environment.
- The environment should be something you should throw away and be
able to rebuild easily.
- Make sure to deactivate when done using it.
#### If you are using Conda
**Checking what is in the system:**
conda list
To create the environment:
```bash
conda create --name first_example
> proceed (\[y\]/n)? y
```
Time to activate it
conda activate first_example only work on conda 4.6 and later versions. For
Conda versions prior to 4.6, run:
- Windows: activate or Linux and macOS: source activate
You can tell it is activated because it shows (first_example) in the prompt.
Let's check which packages are there with a new
```bash
conda list first_example
```
Empty, right? Nothing to worry about, it should be this way! Let's
proceed.
**Install packages:**
We will be installing two packages for this exercise.
First:
```bash
conda install numpy
> proceed (\[y\]/n)? y
```
And then, one more package:
```bash
conda install Pandas
> proceed (\[y\]/n)? y
```
Now check which packages are in the specific environment we are working
on:
```bash
conda list
```
Alright, the packages and dependencies installed are right there!
**Export and allow future replication of the environment:**
Let's save the packages and dependencies we have after the installs.
```bash
conda list --export
```
That should be stored in a environments.yml file
So let's get it redirected to the required file:
```bash
conda list -e > environment.yml
```
This file should be included in your research compendium to let others
reinstall your packages and dependencies as needed.
**Deactivate it:**
If you are done with that, you should deactivate that environment by
typing:
```bash
conda deactivate
```
Note: only works on conda 4.6 and later versions. For conda
versions before 4.6, run:
Windows: `deactivate` or Linux and macOS: `source deactivate`
Then, you will see you no longer have the environment we created in our
prompt.
If you are willing to delete the environment altogether, you should
delete the directory for the virtual environment
**Back to base we can create a new environment based on the .yml
packages and dependencies by running (this is noted on top of the yml
file):**
```bash
conda create --name my-env --file environment.yml
> proceed (\[y\]/n)? y
```
**Activate it:**
```bash
conda activate my-env (or see above if conda version \< 4.6)
```
proceed (\[y\]/n)? Y
**Check if packages are there:**
```bash
conda list
```
**Remember to deactivate it when done:**
```bash
conda deactivate (or see above if conda version \< 4.6)
```
Check all your environments `conda env list`
More info:
[[https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html]{.underline}](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)