-
Notifications
You must be signed in to change notification settings - Fork 3
/
atom.xml
1835 lines (1781 loc) · 224 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Vorakl's notes</title><link href="https://vorakl.com/" rel="alternate"></link><link href="https://vorakl.com/atom.xml" rel="self"></link><id>https://vorakl.com/</id><updated>2024-05-19T20:32:42-07:00</updated><entry><title>How to destroy your OS with tar</title><link href="https://vorakl.com/articles/tar-curdir/" rel="alternate"></link><published>2024-05-19T20:32:42-07:00</published><updated>2024-05-19T20:32:42-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-05-19:/articles/tar-curdir/</id><summary type="html"><p class="first last">A dangerous case of tar archive unpacking</p>
</summary><content type="html"><p>This is a short story about how dangerous a trivial tar unpacking might be, and what can be done to minimize the risk or completely avoid it.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="section" id="the-mistake">
<h2>The mistake</h2>
<p>Recently, I was practicing an installation of <a class="reference external" href="https://voidlinux.org/">Void Linux</a> via chroot <a class="reference external" href="https://docs.voidlinux.org/installation/guides/chroot.html">using XBPS method</a>. I needed the <a class="reference external" href="https://docs.voidlinux.org/xbps/index.html">XBPS Package Manager</a> installed on my Fedora Linux host to prepare Void Linux's base system. One of the options is to download an archive of statically built tools from the official repository. I chose <a class="reference external" href="https://repo-default.voidlinux.org/static/xbps-static-latest.x86_64-musl.tar.xz">https://repo-default.voidlinux.org/static/xbps-static-latest.x86_64-musl.tar.xz</a></p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ tar -tf xbps-static-latest.x86_64-musl.tar.xz <span class="p">|</span> head
./
./usr/
./usr/bin/
./usr/bin/xbps-uunshare
./usr/bin/xbps-uhelper
./usr/bin/xbps-uchroot
./usr/bin/xbps-rindex
./usr/bin/xbps-remove
./usr/bin/xbps-reconfigure
./usr/bin/xbps-query
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>I got so used to having 0:0 as a user:group on all files in archives that I didn't even check their actual permissions and owners. I just looked at the directory structure and noticed that all the executables were conveniently located under the relative path <em>&quot;./usr/bin/&quot;</em>. I quickly decided to just extract them to my root directory, so they would be immediately available in my $PATH. This was a big mistake, because if I checked them, I'd see non-standard permissions (700) of a current directory &quot;.&quot; and non-standard user:group of the entire archive content:</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ tar -tvf xbps-static-latest.x86_64-musl.tar.xz <span class="p">|</span> head
drwx------ duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./
drwxr-xr-x duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/
drwxr-xr-x duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/
lrwxrwxrwx duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/xbps-uunshare -&gt; xbps-uunshare.static
lrwxrwxrwx duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/xbps-uhelper -&gt; xbps-uhelper.static
lrwxrwxrwx duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/xbps-uchroot -&gt; xbps-uchroot.static
lrwxrwxrwx duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/xbps-rindex -&gt; xbps-rindex.static
lrwxrwxrwx duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/xbps-remove -&gt; xbps-remove.static
lrwxrwxrwx duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/xbps-reconfigure -&gt; xbps-reconfigure.static
lrwxrwxrwx duncaen/netusers <span class="m">0</span> <span class="m">2023</span>-09-18 <span class="m">06</span>:37 ./usr/bin/xbps-query -&gt; xbps-query.static
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>But not knowing that, I ran...</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ sudo tar -C / -xvfp xbps-static-latest.x86_64-musl.tar.xz
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>In the seconds that followed, I noticed the rapid decline of my system. The windows of my XFCE session stopped redrawing, the X server itself shut down. I couldn't run sudo. I couldn't even boot my system again. It happened so quickly and unexpectedly that I could hardly believe that my last command had caused the crash. Fortunately, booting in a single mode and detailed analysis of the tar archive revealed the root cause.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
</div>
<div class="section" id="the-root-cause">
<h2>The root cause</h2>
<p>The tar archive contains the current directory &quot;./&quot;, which became the root directory when I changed it with &quot;tar -C / ...&quot; to change it before extracting. Restoring the owner and permissions of the current (top) directory of the archive resulted in setting 700 permissions and 2002:2000 as owner:group on my directory tree, which changed its expected state. Thus, my own user completely lost access to the entire file system. Who could have expected that? ;)</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>For this little demo, I spun up a new VM. Don't try this on your running system!</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ sudo chmod <span class="m">700</span> /
$ ls -ld /
drwx------ <span class="m">17</span> root root <span class="m">4096</span> Mar <span class="m">27</span> <span class="m">11</span>:24 /
$ sudo chown <span class="m">2000</span>:2000 /
$ sudo chown <span class="m">2000</span>:2000 /usr
-bash: /usr/bin/sudo: Permission denied
$ sudo -s
-bash: /usr/bin/sudo: Permission denied
$ ls -ld /
-bash: /usr/bin/ls: Permission denied
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
</div>
<div class="section" id="what-can-be-done-to-prevent-it">
<h2>What can be done to prevent it?</h2>
<p>In general, it is convenient to create a new archive with a relative directory tree using a command similar to</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ tar -C /path/to/rootfs -czf myarchive.tar.gz .
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>because you don't have to worry about the internal directory structure, and it's just one command. All files are addressed with simple <em>&quot;.&quot;</em>. It is also useful during extraction, since <em>&quot;-C /some/path/&quot;</em> allows you to choose any destination directory. On the other hand, this approach adds a current directory to the archive (the top one in the output above), which takes away all convenience. The default behavior of GNU tar is <em>&quot;Overwrite metadata of existing directories when extracting&quot;</em>, which is equivalent to the <em>--overwrite-dir</em> option. For example, if an archive contains a backup of users' home directories with all the necessary permissions, it could be super easy to restore them by running something like <em>&quot;tar -C /home -xpf homes.tar.gz&quot;</em>. But this only works if the archive doesn't contain a current directory and the target <em>&quot;/home/&quot;</em> is not modified.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>A good way to avoid such pitfalls is to add the <strong>--no-overwrite-dir</strong> option, which <em>&quot;preserves metadata of existing directories&quot;</em>. So, if you run something like <em>&quot;tar -C /home --no-overwrite-dir -xpf homes.tar.gz&quot;</em>, all existing directories (including the current one) will remain unchanged!</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>There are also a few ways to create an archive without a current directory, but most of them require either a directory change beforehand, or defining all files/directories for the future archive. However, I found a way that, although it looks odd, does the job in one command:</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ tar --transform<span class="o">=</span><span class="s1">&#39;s|tmp/rootfs|.|&#39;</span> --show-transformed-names -cvf myarchive.tar /tmp/rootfs/*
<span class="c1"># or without a verbose mode</span>
$ tar --transform<span class="o">=</span><span class="s1">&#39;s|tmp/rootfs|.|&#39;</span> -cf myarchive.tar /tmp/rootfs/*
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Thanks to <a class="reference external" href="http://eradman.com/">Eric Radman</a> for pointing out that BSD tar has another option, <a class="reference external" href="https://man.openbsd.org/tar#s">-s</a>, for similar functionality.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Another and pretty typical way to create such archives (packages) is to use <a class="reference external" href="https://wiki.debian.org/FakeRoot">fakeroot</a>. It runs as an unprivileged user and pretends that all files are owned by root. In fact, it's just an illusion. Let's have a look at the directory with the extracted original xbps tools:</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ tree -agpu xbps-tools/ <span class="p">|</span> head
<span class="o">[</span>drwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-tools/
├── <span class="o">[</span>drwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> usr
│ └── <span class="o">[</span>drwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> bin
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-alternatives -&gt; xbps-alternatives.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-alternatives.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-checkvers -&gt; xbps-checkvers.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-checkvers.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-create -&gt; xbps-create.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-create.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-dgraph -&gt; xbps-dgraph.static
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>And this is how it looks under <em>fakeroot</em></p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ fakeroot /bin/bash
root@localhost&gt; tree -agpu xbps-tools/ <span class="p">|</span> head
<span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> xbps-tools/
├── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> usr
│ └── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> bin
│ ├── <span class="o">[</span>lrwxrwxrwx root root <span class="o">]</span> xbps-alternatives -&gt; xbps-alternatives.static
│ ├── <span class="o">[</span>-rwxr-xr-x root root <span class="o">]</span> xbps-alternatives.static
│ ├── <span class="o">[</span>lrwxrwxrwx root root <span class="o">]</span> xbps-checkvers -&gt; xbps-checkvers.static
│ ├── <span class="o">[</span>-rwxr-xr-x root root <span class="o">]</span> xbps-checkvers.static
│ ├── <span class="o">[</span>lrwxrwxrwx root root <span class="o">]</span> xbps-create -&gt; xbps-create.static
│ ├── <span class="o">[</span>-rwxr-xr-x root root <span class="o">]</span> xbps-create.static
│ ├── <span class="o">[</span>lrwxrwxrwx root root <span class="o">]</span> xbps-dgraph -&gt; xbps-dgraph.static
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>This fake environment allows you to create a tar archive with files owned by root without changing their real owners.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>One more nice solution is to use the <em>cpio</em> tool to create or extract <a class="reference external" href="https://vorakl.com/articles/posix/">POSIX</a> tar archives. This format can be enabled during archive creation by adding <em>&quot;-H ustar&quot;</em>. However, during extraction, the format is automatically detected, and it also doesn't change the permissions of the current directory, even if it exists in the archive! If you add the <em>&quot;-d&quot;</em> option and run <em>cpio</em> with <em>sudo</em>, all non-existing subdirectories will be created as root:root, which is also very convenient.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ tree -agpu newroot/
<span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> newroot/
$ xz -cd xbps-static-latest.x86_64-musl.tar.xz <span class="p">|</span> sudo cpio -D newroot -idv
.
./usr
./usr/bin
./usr/bin/xbps-uunshare
./usr/bin/xbps-uhelper
./usr/bin/xbps-uchroot
./usr/bin/xbps-rindex
./usr/bin/xbps-remove
./usr/bin/xbps-reconfigure
./usr/bin/xbps-query
./usr/bin/xbps-pkgdb
./usr/bin/xbps-install
./usr/bin/xbps-fetch
./usr/bin/xbps-fbulk
./usr/bin/xbps-digest
./usr/bin/xbps-dgraph
./usr/bin/xbps-create
./usr/bin/xbps-checkvers
./usr/bin/xbps-alternatives
./usr/bin/xbps-alternatives.static
./usr/bin/xbps-checkvers.static
./usr/bin/xbps-create.static
./usr/bin/xbps-dgraph.static
./usr/bin/xbps-digest.static
./usr/bin/xbps-fbulk.static
./usr/bin/xbps-fetch.static
./usr/bin/xbps-install.static
./usr/bin/xbps-pkgdb.static
./usr/bin/xbps-query.static
./usr/bin/xbps-reconfigure.static
./usr/bin/xbps-remove.static
./usr/bin/xbps-rindex.static
./usr/bin/xbps-uchroot.static
./usr/bin/xbps-uhelper.static
./usr/bin/xbps-uunshare.static
./var
./var/db
./var/db/xbps
./var/db/xbps/keys
./var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
./var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
<span class="m">179893</span> blocks
$ tree -agpu newroot/ <span class="p">|</span> head
<span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> newroot/
├── <span class="o">[</span>drwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> usr
│ └── <span class="o">[</span>drwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> bin
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-alternatives -&gt; xbps-alternatives.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-alternatives.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-checkvers -&gt; xbps-checkvers.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-checkvers.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-create -&gt; xbps-create.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-create.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-dgraph -&gt; xbps-dgraph.static
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Note that <em>newroot/</em> was left untouched and is still owned by root:root with 755 permissions. But <em>cpio</em> can do even more. You can create a POSIX tar and easily control which files go in it, because <em>cpio</em> only accepts filenames. So you can get the file list with <em>find</em> and then filter the output to remove (for this particular example) <em>/usr</em>, <em>/usr/bin</em>, <em>/var/</em>, <em>/var/db</em>, and that's it. Super safe and convenient for everyone, while maintaining a relative directory structure inside. Here is an example of how I created a tar archive with <em>cpio</em>, without any &quot;systems&quot; directories, and then extracted it with <em>tar</em> in the usual way:</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span><span class="c1"># Create a tar archive with &#39;cpio&#39; of previously unpacked xbps tools</span>
$ <span class="o">(</span><span class="nb">cd</span> xbps-tools <span class="o">&amp;&amp;</span> find . <span class="p">|</span> grep -v -e <span class="s1">&#39;^\.$&#39;</span> -e <span class="s1">&#39;^\./usr$&#39;</span> -e <span class="s1">&#39;^\./usr/bin$&#39;</span> -e <span class="s1">&#39;^\./var$&#39;</span> -e <span class="s1">&#39;^\./var/db$&#39;</span> <span class="p">|</span> cpio -ov -H ustar &gt; ../myxbps.tar<span class="o">)</span>
./var/db/xbps/
./var/db/xbps/keys/
./var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
./var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
./usr/bin/xbps-uunshare.static
./usr/bin/xbps-uhelper.static
./usr/bin/xbps-uchroot.static
./usr/bin/xbps-rindex.static
./usr/bin/xbps-remove.static
./usr/bin/xbps-reconfigure.static
./usr/bin/xbps-query.static
./usr/bin/xbps-pkgdb.static
./usr/bin/xbps-install.static
./usr/bin/xbps-fetch.static
./usr/bin/xbps-fbulk.static
./usr/bin/xbps-digest.static
./usr/bin/xbps-dgraph.static
./usr/bin/xbps-create.static
./usr/bin/xbps-checkvers.static
./usr/bin/xbps-alternatives.static
./usr/bin/xbps-alternatives
./usr/bin/xbps-checkvers
./usr/bin/xbps-create
./usr/bin/xbps-dgraph
./usr/bin/xbps-digest
./usr/bin/xbps-fbulk
./usr/bin/xbps-fetch
./usr/bin/xbps-install
./usr/bin/xbps-pkgdb
./usr/bin/xbps-query
./usr/bin/xbps-reconfigure
./usr/bin/xbps-remove
./usr/bin/xbps-rindex
./usr/bin/xbps-uchroot
./usr/bin/xbps-uhelper
./usr/bin/xbps-uunshare
<span class="m">179889</span> blocks
$ file myxbps.tar
myxbps.tar: POSIX tar archive
<span class="c1"># Check with &#39;tar&#39; that all files have non root user/group and the archive doesn&#39;t contain . /usr /usr/bin /var /var/db</span>
$ tar -tvf myxbps.tar
drwxr-xr-x <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 var/db/xbps/
drwxr-xr-x <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 var/db/xbps/keys/
-rw-r--r-- <span class="m">2002</span>/2000 <span class="m">1410</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
-rw-r--r-- <span class="m">2002</span>/2000 <span class="m">1410</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5623104</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-uunshare.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5643584</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-uhelper.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5631296</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-uchroot.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">6414144</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-rindex.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5779264</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-remove.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5643904</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-reconfigure.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5685440</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-query.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5643904</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-pkgdb.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5787648</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-install.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5639488</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-fetch.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5631296</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-fbulk.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5623104</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-digest.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5640384</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-dgraph.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">6402240</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-create.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5644032</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-checkvers.static
-rwxr-xr-x <span class="m">2002</span>/2000 <span class="m">5643904</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-alternatives.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-alternatives -&gt; xbps-alternatives.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-checkvers -&gt; xbps-checkvers.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-create -&gt; xbps-create.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-dgraph -&gt; xbps-dgraph.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-digest -&gt; xbps-digest.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-fbulk -&gt; xbps-fbulk.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-fetch -&gt; xbps-fetch.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-install -&gt; xbps-install.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-pkgdb -&gt; xbps-pkgdb.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-query -&gt; xbps-query.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-reconfigure -&gt; xbps-reconfigure.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-remove -&gt; xbps-remove.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-rindex -&gt; xbps-rindex.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-uchroot -&gt; xbps-uchroot.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-uhelper -&gt; xbps-uhelper.static
lrwxrwxrwx <span class="m">2002</span>/2000 <span class="m">0</span> <span class="m">2024</span>-05-21 <span class="m">16</span>:04 usr/bin/xbps-uunshare -&gt; xbps-uunshare.static
<span class="c1"># Created a new directory to emulate a root file system</span>
$ tree -agpu newroot2/
<span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> newroot2/
├── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> usr
│ └── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> bin
└── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> var
└── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> db
<span class="c1"># Extract with &#39;tar&#39; in a usual way</span>
$ sudo tar -C newroot2 -xvf myxbps.tar
var/db/xbps/
var/db/xbps/keys/
var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
usr/bin/xbps-uunshare.static
usr/bin/xbps-uhelper.static
usr/bin/xbps-uchroot.static
usr/bin/xbps-rindex.static
usr/bin/xbps-remove.static
usr/bin/xbps-reconfigure.static
usr/bin/xbps-query.static
usr/bin/xbps-pkgdb.static
usr/bin/xbps-install.static
usr/bin/xbps-fetch.static
usr/bin/xbps-fbulk.static
usr/bin/xbps-digest.static
usr/bin/xbps-dgraph.static
usr/bin/xbps-create.static
usr/bin/xbps-checkvers.static
usr/bin/xbps-alternatives.static
usr/bin/xbps-alternatives
usr/bin/xbps-checkvers
usr/bin/xbps-create
usr/bin/xbps-dgraph
usr/bin/xbps-digest
usr/bin/xbps-fbulk
usr/bin/xbps-fetch
usr/bin/xbps-install
usr/bin/xbps-pkgdb
usr/bin/xbps-query
usr/bin/xbps-reconfigure
usr/bin/xbps-remove
usr/bin/xbps-rindex
usr/bin/xbps-uchroot
usr/bin/xbps-uhelper
usr/bin/xbps-uunshare
$ tree -agpu newroot2/
<span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> newroot2/
├── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> usr
│ └── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> bin
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-alternatives -&gt; xbps-alternatives.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-alternatives.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-checkvers -&gt; xbps-checkvers.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-checkvers.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-create -&gt; xbps-create.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-create.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-dgraph -&gt; xbps-dgraph.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-dgraph.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-digest -&gt; xbps-digest.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-digest.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-fbulk -&gt; xbps-fbulk.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-fbulk.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-fetch -&gt; xbps-fetch.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-fetch.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-install -&gt; xbps-install.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-install.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-pkgdb -&gt; xbps-pkgdb.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-pkgdb.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-query -&gt; xbps-query.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-query.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-reconfigure -&gt; xbps-reconfigure.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-reconfigure.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-remove -&gt; xbps-remove.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-remove.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-rindex -&gt; xbps-rindex.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-rindex.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-uchroot -&gt; xbps-uchroot.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-uchroot.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-uhelper -&gt; xbps-uhelper.static
│ ├── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-uhelper.static
│ ├── <span class="o">[</span>lrwxrwxrwx <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-uunshare -&gt; xbps-uunshare.static
│ └── <span class="o">[</span>-rwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps-uunshare.static
└── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> var
└── <span class="o">[</span>drwxr-xr-x root root <span class="o">]</span> db
└── <span class="o">[</span>drwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> xbps
└── <span class="o">[</span>drwxr-xr-x <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> keys
├── <span class="o">[</span>-rw-r--r-- <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> 3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
└── <span class="o">[</span>-rw-r--r-- <span class="m">2002</span> <span class="m">2000</span> <span class="o">]</span> <span class="m">60</span>:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Note that all &quot;system&quot; directories such as <em>/usr</em> or <em>/var/db</em> are left unmodified with their original owners and permissions.
In fact, you can get the same result with <em>tar</em> either</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="highlight"><pre><span></span>$ <span class="o">(</span><span class="nb">cd</span> xbps-tools <span class="o">&amp;&amp;</span> find . <span class="p">|</span> grep -v -e <span class="s1">&#39;^\.$&#39;</span> -e <span class="s1">&#39;^\./usr$&#39;</span> -e <span class="s1">&#39;^\./usr/bin$&#39;</span> -e <span class="s1">&#39;^\./var$&#39;</span> -e <span class="s1">&#39;^\./var/db$&#39;</span> <span class="p">|</span> tar --verbatim-files-from -T - -cvf ../myxbps.tar<span class="o">)</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>That's how I would create such archives with files to be extracted to the root filesystem.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>Do not blindly extract an archive if you don't know what it contains! It could be fatal to your system.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<!-- Links -->
</div>
</content><category term="os"></category><category term="linux"></category><category term="tools"></category></entry><entry><title>A few facts about POSIX</title><link href="https://vorakl.com/articles/posix/" rel="alternate"></link><published>2024-04-23T10:45:58-07:00</published><updated>2024-04-23T10:45:58-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-04-23:/articles/posix/</id><summary type="html"><p class="first last">A journey to portable software</p>
</summary><content type="html"><p><a class="reference internal" href="#summary">TLDR: quick summary of the article</a></p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="section" id="how-did-we-get-there">
<h2>How did we get there?</h2>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>In the early days of computing, programmers could only dream of portability. All programs were written directly in machine code for each computer architecture they were intended to run on. <a class="reference external" href="https://en.wikipedia.org/wiki/Assembly_language">Assembly languages</a> with mnemonic names for each CPU instruction and other goodies made programmers' lives a little easier, but programs were still architecture-specific. Operating systems (OS) had not yet been invented, so a program not only controlled the entire computer system, it also had to initialize and manage the peripherals. In fact, such bare-metal programs implemented drivers for every device they used. And every time a program needed to run on hardware with a different architecture, it was literally rewritten to accommodate a difference in the <a class="reference external" href="https://en.wikipedia.org/wiki/Instruction_set_architecture">CPU instruction</a> set, memory layout, and so on.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>This is exactly what happened with Unix, which was originally written in assembly language by Ken Thompson over 50 years ago. The first versions of Unix were written for the <a class="reference external" href="https://en.wikipedia.org/wiki/PDP-7">PDP-7</a> platform, and porting it to the <a class="reference external" href="https://en.wikipedia.org/wiki/PDP-11">PDP-11</a> meant rewriting the code. When Dennis Ritchie created the C programming language, and <a class="reference external" href="https://www.invent.org/sites/default/files/2019-02/Inductee-UNIX_Thompson_Ritchie.jpg">together they</a> rewrote most of the Unix code in it, software portability suddenly became possible. There are two main reasons for this. First, the code written in a high-level programming language is platform-agnostic, because compilers translate it into the assembly language for a target architecture. This is even more important for target systems based on <a class="reference external" href="https://en.wikipedia.org/wiki/Reduced_instruction_set_computer">RISC CPUs</a>, as they require writing significantly more assembly instructions than <a class="reference external" href="https://en.wikipedia.org/wiki/Complex_instruction_set_computer">CISC CPU</a> architecture. Even porting Unix to another platform was mostly a matter of adapting the architecture-dependent parts of the code. On the other hand, the operating system itself abstracts away all hardware specifics from a user program. Programmers don't have to implement multitasking, memory management, or drivers for different devices as they used to, because it's all part of the OS kernel and runs in the kernel address space. In contrast, user programs run in the user address space and access all of the features provided by the OS through the the system call interface. In <a class="reference external" href="https://en.wikipedia.org/wiki/Real-time_operating_system">Real-time OSes</a>, such as <a class="reference external" href="https://www.zephyrproject.org/">Zephyr OS</a>, it's <a class="reference external" href="https://www.youtube.com/watch?v=4_uL43V79xw">slightly different</a>, but the idea of memory isolation and protection for user programs is preserved. This leads to two conclusions:</p>
<ul class="simple">
<li><em>User programs become portable when they are written in a high-level programming language for a particular OS</em>. Once both requirements are met, programs are compiled into instructions for a target CPU and linked with system functions provided by the <a class="reference external" href="https://en.wikipedia.org/wiki/C_standard_library">libc</a> and OS-specific libraries to access the underlying hardware.</li>
<li>Portability is intended to be achieved <strong>at the source code level</strong>.</li>
</ul>
<div class="line-block">
<div class="line"><br /></div>
</div>
</div>
<div class="section" id="the-birth-of-posix">
<h2>The birth of POSIX</h2>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>This could have been the end of the story, but something fateful happened. Due to a legal restriction, AT&amp;T was not allowed to sell Unix, so there was no money to be made from the newly born OS, which became increasingly popular after it was introduced to the world. However, it turned out to be possible to distribute Unix to any interested organization for the cost of the media. That's how Unix got to Berkeley in 1974 and many other places, leading to the creation of a number of OS derivatives. Some of the best known and still popular today are OSes based on the software distributed by Berkeley (BSD), e.g. FreeBSD and OpenBSD. Despite sharing the same ancestors and principles, each operating system followed its own unique path. Each of these operating systems had a unique interface (API) and implementation of kernel subsystems, syscalls, different system tools, etc. Even libc, which provides common functionality and wrappers on top of syscalls, used to be very OS-specific. All of these OSes were Unix-like, but at the same time, it wasn't possible to take the source code of a program written for one OS and recompile it on another.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Over 35 years ago, these problems with software portability led to the emergence of the first <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html">POSIX standard</a> in 1988. The acronym <a class="reference external" href="https://opensource.com/article/19/7/what-posix-richard-stallman-explains">was coined by Richard Stallman</a>, who added &quot;X&quot; to the end of <em>Portable Operating System Interface</em>. The <em>POSIX™</em> trademark is currently owned by <a class="reference external" href="https://www.ieee.org/about/index.html">IEEE</a>, and <em>UNIX®</em> is a registered trademark of <a class="reference external" href="https://www.opengroup.org/about-us">The Open Group</a>. It's meant to provide a <a class="reference external" href="https://www.techtarget.com/whatis/definition/POSIX-Portable-Operating-System-Interface">specification of the interface</a> that different Unix operating systems should have in common, including <a class="reference external" href="https://stackoverflow.com/a/31865755">programming languages and tools</a>. It's important to note that <strong>the interface is portable</strong>, and not the implementation.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>This was the common ground that made it possible to compile the same source code of a user program on any OS without modification, if both sides strictly followed the same standard. And this is still true to some extent today, as most modern and widely used Unix-like systems, such as Linux, and <cite>*BSD</cite>, do not strictly and completely follow POSIX standard, but rather use it as a guide. In addition to POSIX, there is also the <a class="reference external" href="https://en.wikipedia.org/wiki/Single_UNIX_Specification">Single UNIX Specification</a> (SUS), which was consolidated with a few different POSIX standards in 2001. However, the latest SUS (SUSv4 2018) extends the latest POSIX standard (POSIX.1-2017), which is essentially its base specification, with the X/Open Curses specification. There are <a class="reference external" href="https://en.wikipedia.org/wiki/POSIX#POSIX-oriented_operating_systems">a number of operating systems, such as MacOS</a>, which are fully compliant with the POSIX and SUS standards, pass The Open Group conformance tests and can therefore be called <a class="reference external" href="https://www.opengroup.org/openbrand/register/">Unix operating systems</a>, not just Unix-like. Originally, POSIX was only created for Unix-like OSes, but over time it became so popular that its specification, in the form of the <a class="reference external" href="https://en.wikipedia.org/wiki/Operating_system_abstraction_layer">Operating System Abstraction Layer (OSAL)</a>, was partially implemented (some subset of the interface that applicable to the target system) in non-Unix OSes, such as <a class="reference external" href="https://en.wikipedia.org/wiki/Cygwin">Windows</a>, <a class="reference external" href="https://www.freertos.org/FreeRTOS-Plus/FreeRTOS_Plus_POSIX/index.html">FreeRTOS</a>, <a class="reference external" href="https://docs.zephyrproject.org/latest/services/portability/posix/index.html">Zephyr</a>, etc.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
</div>
<div class="section" id="the-posix-spec">
<h2>The POSIX spec</h2>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>The very first standard was ratified by the IEEE in 1988 as IEEE Std 1003.1-1988, so it's called <em>POSIX.1-1988</em>. Since then, the standard has gone through several revisions, with different subsets of the specification being ratified under different names. For example, <em>POSIX.1-1990</em> (IEEE 1003.1-1990) defined <em>the system interface and computing environment</em>, <em>POSIX.2</em> (IEEE Std 1003.2-1992) defined <em>command language (shell) and tools</em>, etc. A very good and brief overview of the standard's revisions can be found in the <a class="reference external" href="https://man7.org/linux/man-pages/man7/standards.7.html">standards(7)</a> Linux man page. You may even come across references to some old revisions, such as POSIX.2, for example, when reading the <a class="reference external" href="https://git.savannah.gnu.org/cgit/bash.git/tree/jobs.c#n4269">Bash source code</a>. In 2001, POSIX.1, POSIX.2, and the Single UNIX Specification (SUS) were merged into a single document called <em>POSIX.1-2001</em>. Despite the somewhat misleading name, it does include the shell and tools specifications from POSIX.2. <strong>The latest version of the standard is POSIX.1-2017</strong>, also known as <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html">IEEE Std 1003.1-2017</a>, which is almost identical to POSIX.1-2008.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>The document of the standard basically describes a specification that spans over two environments (a build-time and a run-time) and is represented by a few volumes:</p>
<ul class="simple">
<li><a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/toc.html">Base Definitions</a>: defines common to all volumes general terms and concepts, conformant requirements (symbolic constants, options, option groups), computing environment (locales, regexp, directory structure, tty, environment variables, etc), and C-language header files which need to be implemented by the compliant systems.</li>
<li><a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/idx/xsh.html">System Interfaces</a>: defines the C language standard (<a class="reference external" href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf">ISO C99, ISO/IEC 9899:1999</a>), system service functions, and the extension of the C standard library (libc) in terms of header files and functions.</li>
<li><a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/idx/xcu.html">Shell &amp; Utilities</a>: defines a source code-level interface to the Shell Command Language (sh) and the system utilities (awk, sed, wc, cat, ...), including behavior, command line parameters, exit statuses, etc.</li>
<li><a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/idx/xrat.html">Rationale</a>: includes considerations for portability, subprofiling, option groups, and additional rationale that didn't fit any other volumes.</li>
</ul>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>The current POSIX standard defines source code-level compatibility for <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_04">only two programming languages</a>: <em>The C language (C99)</em> and <em>the shell command language</em>. However, some of the programs defined under &quot;Utilities&quot;, such as <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html">awk</a>, also have their own language. Strictly speaking, the C standard library (libc) doesn't have to implement any additional functionality (functions and headers) that is not defined by the C standard (ISO C99 in this case), but most of them do. For example, the ISO C99 standard, defines 24 header files, including math functions (&lt;math.h&gt;), standard input/output (&lt;stdio.h&gt;), date and time (&lt;time.h&gt;), signal management (&lt;signal.h&gt;), string operations (&lt;string.h&gt;), and so on. However, the latest POSIX standard, defines 82 header files and, being fully compliant with ISO C99, extends it with with POSIX threads (&lt;pthreads.h&gt;), semaphores (&lt;semaphore.h&gt;), and many others. Modern libc implementations, e.g. <a class="reference external" href="https://musl.libc.org/about.html">musl libc</a>, are also very OS-specific, providing library functions to access operating system services (wrappers for system calls). Sometimes, the overlap with the POSIX specifications leads to difficulties in implementing the POSIX abstraction layer in the non-Unix operating systems, which also use some portable standalone libc implementations with their own POSIX support, e.g. using <a class="reference external" href="https://keithp.com/picolibc/">picolibc</a> together with <a class="reference external" href="https://docs.zephyrproject.org/latest/services/portability/posix/implementation/index.html">Zephyr's POSIX library</a>.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
</div>
<div class="section" id="options-and-option-groups">
<h2>Options and Option Groups</h2>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>While POSIX standardizes the system interface (C language headers and functions), shell, and utilities, it is not necessary to follow the entire specification to be <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_01_03">POSIX conformant</a>. Some features in &quot;POSIX System Interfaces&quot;, &quot;POSIX Shell and Utilities&quot;, and &quot;XSI System Interfaces&quot; are optional. The <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/unistd.h.html">&lt;unistd.h&gt; header file</a> contains definitions of the <em>standard symbolic constants</em> for <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_01_06">Options</a>, which reflect a particular feature, and <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_01_05">Option Groups</a> which define a set of related functions or options. Names of option groups, unlike options, typically do not begin with the underscore symbol. POSIX Conformant systems are intended to implement and support a set of mandatory options with one or more additional options. The symbolic constants for mandatory options should have specific values, e.g. <em>200809L</em>, while other options may be</p>
<ul class="simple">
<li><em>undefined or contain -1</em>, which means that the option is not supported for compilation</li>
<li><em>0</em>, which means the option might or might not be supported at runtime</li>
<li><em>some other value</em>, which means the option is always supported</li>
</ul>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>These symbolic constants are used by user applications to check the availability of a particular feature. At the C source code-level, constants may be checked either at build time (in #if preprocessing directives) or at runtime, by calling one of the <em>sysconf()</em>, <em>pathconf()</em>, <em>fpathconf()</em>, or <em>confstr(3)</em> functions. In the shell source code, the <a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/getconf.html">getconf</a> utility should be used for runtime checks. A very good collection of the POSIX options, their corresponding names for use as the sysconf(3) parameters, and the list of header files and functions that these options represent can be found in the <a class="reference external" href="https://man7.org/linux/man-pages/man7/posixoptions.7.html">posixoptions(7)</a> Linux man page.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_subprofiles.html">Subprofiling Option Groups</a> are intended for use within the systems where implementing a full POSIX specification is not reasonable. For example, real-time embedded systems are typically resource-constrained, do not have shells, user interfaces, and OS kernels are often designed to run as a single process (with multiple threads). Such systems may only implement subsets of related functions defined by option groups.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
</div>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li>The development of high-level programming languages like C, along with operating systems that abstract away hardware details, enabled software portability at the source code level.</li>
<li>The POSIX standard emerged in 1988 to provide a portable interface specification for Unix-like operating systems, allowing programs to be compiled across different platforms.</li>
<li>The POSIX standard has evolved over time, with the latest version being POSIX.1-2017 (IEEE Std 1003.1-2017).</li>
<li>Modern Unix-like systems like Linux and <cite>*BSD</cite> do not strictly follow the POSIX standard, but rather use it as a guide.</li>
<li>POSIX standardizes a C API (header files and functions), the shell, and utilities.</li>
<li>POSIX-compliant systems are expected to implement mandatory options and may support additional optional features.</li>
<li>Applications can check for POSIX feature availability at both compile-time and runtime using symbolic constants and system functions.</li>
<li>For resource-constrained systems like real-time embedded platforms, POSIX allows for the implementation of subsets of the full specification through &quot;subprofile&quot; option groups.</li>
</ul>
<div class="line-block">
<div class="line"><br /></div>
</div>
<!-- Links -->
</div>
</content><category term="it"></category><category term="os"></category><category term="programming"></category></entry><entry><title>How to sort arrays natively in Bash</title><link href="https://vorakl.com/articles/bash-sort/" rel="alternate"></link><published>2024-02-20T18:37:45-08:00</published><updated>2024-02-20T18:37:45-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-02-20:/articles/bash-sort/</id><summary type="html"><p class="first last">Sorting arrays in pure Bash with the asort built-in command</p>
</summary><content type="html"><p><a class="reference internal" href="#summary">TLDR: quick summary of the article</a></p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>What would you do if, while implementing some solution in Bash, you suddenly needed to have an array in a sorted order? You might think of the <em>sort</em> tool from the <em>coreutils</em> package. Or you might even think that it's probably a good time to switch to Python or some other language? But it turns out that Bash supports sorting arrays natively! All you need is the <strong>asort</strong> built-in command. However, it is often not loaded by default, or even packaged on many modern Linux distributions. In this article I'll show you how to build and install Bash with all loadable modules from source, load them, and start writing faster, more advanced Bash scripts with less use of external commands.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>First of all, check your Bash version. Version 5.2-release is the target of this article:</p>
<div class="highlight"><pre><span></span><span class="nb">echo</span> <span class="si">${</span><span class="nv">BASH_VERSION</span><span class="si">}</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>The built-in loadable modules are loaded with the <strong>enable</strong> command. Bash expects to find loadable modules in one of the paths specified in the <strong>BASH_LOADABLES_PATH</strong> environment variable, which is a colon-separated list of directories. Setting this variable and enabling all the necessary commands can be done, for example, with <em>.bashrc</em>. If you are currently running a pre-installed Bash, check that the <em>asort</em> command is not loaded and it cannot be loaded due to its absence:</p>
<div class="highlight"><pre><span></span><span class="nb">enable</span> -p <span class="p">|</span> grep asort <span class="o">||</span> <span class="o">{</span> <span class="nb">enable</span> -f asort asort <span class="o">&amp;&amp;</span> <span class="nb">enable</span> -p <span class="p">|</span> grep asort<span class="p">;</span> <span class="o">}</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>If you see &quot;<em>enable asort</em>&quot; on the screen then the <em>asort</em> builtin is loaded and you can start using it, for example, by checking its help message:</p>
<div class="highlight"><pre><span></span>asort --help
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Otherwise, let's build it from source. First of all, clone the project's official git repository and enter its directory:</p>
<div class="highlight"><pre><span></span>git clone https://git.savannah.gnu.org/git/bash.git <span class="o">&amp;&amp;</span> <span class="nb">cd</span> bash
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>The following procedure is pretty standard for any software written in C: you <em>configure</em> the build tools for the specific system, then you build the software, and then you install it on the system. During a configuration step, for example, you can change a default (/usr/local) installation path prefix. I'm going to override it with the same directory as the default. The loadable built-in commands can only be built after the main tool set is built:</p>
<div class="highlight"><pre><span></span>./configure --prefix<span class="o">=</span>/usr/local
make
make -C examples/loadables all others
sudo make install
sudo make -C examples/loadables install
sudo cp -v examples/loadables/<span class="o">{</span>necho,hello,cat,pushd,asort<span class="o">}</span> /usr/local/lib/bash/
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Loadable built-in commands are installed in <em>/usr/local/lib/bash/</em> and Bash itself in <em>/usr/local/bin/</em>. The trick with copying files is needed because the <em>asort</em> command is part of the extra commands and, as of this writing and Bash version 5.2.26, the Makefile doesn't support installing it. If all commands finished with no errors, you'll be able to find the loadable commands in the <em>/usr/local/lib/bash/</em> directory. They are <em>shared objects</em> that can be analyzed in the typical way:</p>
<div class="highlight"><pre><span></span><span class="nb">cd</span> /usr/local/lib/bash
ldd asort
file asort
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>To load built-in commands from these files, you need to know a name of the structure that was defined in the source code. Some files contain only one command, so there is only one such structure, some contain two commands and two structures. You can find out these names by checking the symbol table and looking for the pattern <em>&lt;name&gt;_struct</em>:</p>
<div class="highlight"><pre><span></span>$ objdump -t asort <span class="p">|</span> grep _struct
00000000000040c0 g O .data <span class="m">0000000000000030</span> asort_struct
$ objdump -t truefalse <span class="p">|</span> grep _struct
<span class="m">0000000000004020</span> g O .data <span class="m">0000000000000030</span> false_struct
<span class="m">0000000000004060</span> g O .data <span class="m">0000000000000030</span> true_struct
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Make sure the <em>BASH_LOADABLES_PATH</em> environment variable is set and contains <em>/usr/local/lib/bash</em>, the directory where we installed the built-in commands. Now, everything is ready for testing. Let's run a newly built Bash, and load <em>asort</em> and a few other useful commands, just as an example, using the names we found in the symbol table:</p>
<div class="highlight"><pre><span></span>/usr/local/bin/bash
<span class="nb">echo</span> <span class="si">${</span><span class="nv">BASH_VERSION</span><span class="si">}</span>
<span class="nb">echo</span> <span class="si">${</span><span class="nv">BASH_LOADABLES_PATH</span><span class="si">}</span>
<span class="nb">enable</span> -f asort asort
<span class="nb">enable</span> -f truefalse <span class="nb">true</span>
<span class="nb">enable</span> -f truefalse <span class="nb">false</span>
<span class="nb">enable</span> -f dsv dsv
dsv --help
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Finally, we can perform reverse numerical sorting using only the built-in function which is dsone in-place:</p>
<div class="highlight"><pre><span></span>$ <span class="nb">declare</span> -a <span class="nv">arr</span><span class="o">=(</span><span class="m">3</span> <span class="m">1</span> <span class="m">15</span> <span class="m">6</span> <span class="m">4</span> <span class="m">5</span> <span class="m">3</span><span class="o">)</span>
$ <span class="nb">echo</span> <span class="si">${</span><span class="nv">arr</span><span class="p">[*]</span><span class="si">}</span>
<span class="m">3</span> <span class="m">1</span> <span class="m">15</span> <span class="m">6</span> <span class="m">4</span> <span class="m">5</span> <span class="m">3</span>
$ asort -nr arr
$ <span class="nb">echo</span> <span class="si">${</span><span class="nv">arr</span><span class="p">[*]</span><span class="si">}</span>
<span class="m">15</span> <span class="m">6</span> <span class="m">5</span> <span class="m">4</span> <span class="m">3</span> <span class="m">3</span> <span class="m">1</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Having commands loaded as shared objects allows the Bash to call them directly and avoid creating new processes just to call the external tools with the same functionality. Let's do a quick experiment with <em>mkdir</em> when used as an external tool and loaded into the Bash:</p>
<div class="highlight"><pre><span></span>$ strace -e execve /usr/local/bin/bash -c <span class="s1">&#39;mkdir /tmp/mydir&#39;</span>
execve<span class="o">(</span><span class="s2">&quot;/usr/local/bin/bash&quot;</span>, <span class="o">[</span><span class="s2">&quot;/usr/local/bin/bash&quot;</span>, <span class="s2">&quot;-c&quot;</span>, <span class="s2">&quot;mkdir /tmp/mydir&quot;</span><span class="o">]</span>, 0x7ffd7723d6f0 /* <span class="m">68</span> vars */<span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
execve<span class="o">(</span><span class="s2">&quot;/usr/bin/mkdir&quot;</span>, <span class="o">[</span><span class="s2">&quot;mkdir&quot;</span>, <span class="s2">&quot;/tmp/mydir&quot;</span><span class="o">]</span>, 0x1e2c010 /* <span class="m">67</span> vars */<span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
</pre></div>
<div class="highlight"><pre><span></span>$ strace -e execve /usr/local/bin/bash -c <span class="s1">&#39;enable -f mkdir mkdir; mkdir /tmp/mydir2&#39;</span>
execve<span class="o">(</span><span class="s2">&quot;/usr/local/bin/bash&quot;</span>, <span class="o">[</span><span class="s2">&quot;/usr/local/bin/bash&quot;</span>, <span class="s2">&quot;-c&quot;</span>, <span class="s2">&quot;enable -f mkdir mkdir; mkdir /tm&quot;</span>...<span class="o">]</span>, 0x7ffd37695000 /* <span class="m">68</span> vars */<span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>You can see that both executables are invoked when <em>mkdir</em> is called as an external tool. But, when <em>mkdir</em> is enabled as a built-in command, there is no an external tool execution, because the Bash calls this function directly. Besides being faster, the <em>asort</em> command has another big advantage over using an external <em>sort</em> tool. Because <em>asort</em> operates on the array data structure directly in memory, you don't have to worry about symbols contained in the array elements and just sort them in-place. They can contain newlines <cite>(0x0a or \n)</cite> or other bash specific symbols like <cite>*</cite> or <cite>?</cite>:</p>
<div class="highlight"><pre><span></span>$ <span class="nb">declare</span> -a <span class="nv">arr</span><span class="o">=(</span><span class="s1">&#39;**&#39;</span> <span class="s1">$&#39;abc\nxyz&#39;</span> <span class="s1">$&#39;abc\nefg&#39;</span> <span class="s1">&#39;*&#39;</span><span class="o">)</span>
$ <span class="nb">declare</span> -p arr
<span class="nb">declare</span> -a <span class="nv">arr</span><span class="o">=([</span><span class="m">0</span><span class="o">]=</span><span class="s2">&quot;**&quot;</span> <span class="o">[</span><span class="m">1</span><span class="o">]=</span><span class="s1">$&#39;abc\nxyz&#39;</span> <span class="o">[</span><span class="m">2</span><span class="o">]=</span><span class="s1">$&#39;abc\nefg&#39;</span> <span class="o">[</span><span class="m">3</span><span class="o">]=</span><span class="s2">&quot;*&quot;</span><span class="o">)</span>
$ <span class="nb">echo</span> <span class="s2">&quot;</span><span class="si">${</span><span class="nv">arr</span><span class="p">[1]</span><span class="si">}</span><span class="s2">&quot;</span>
abc
xyz
$ asort arr
$ <span class="nb">declare</span> -p arr
<span class="nb">declare</span> -a <span class="nv">arr</span><span class="o">=([</span><span class="m">0</span><span class="o">]=</span><span class="s2">&quot;*&quot;</span> <span class="o">[</span><span class="m">1</span><span class="o">]=</span><span class="s2">&quot;**&quot;</span> <span class="o">[</span><span class="m">2</span><span class="o">]=</span><span class="s1">$&#39;abc\nefg&#39;</span> <span class="o">[</span><span class="m">3</span><span class="o">]=</span><span class="s1">$&#39;abc\nxyz&#39;</span><span class="o">)</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>It's also worth checking out other loadable commands such as <em>id</em>, <em>ln</em>, <em>mkfifo</em>, <em>cut</em>, <em>cat</em>, <em>stat</em>, <em>tee</em>, <em>uname</em>, and others (see the loadable modules directory). These are fairly common tools used in Bash scripting. They can all be loaded into the Bash itself, resulting in a significant overall performance improvement by eliminating the need to run external commands each time.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li>Bash supports sorting arrays natively using the built-in <strong>asort</strong> command.</li>
<li>The asort and other loadable commands are not enabled by default and may need to be compiled from source.</li>
<li>To build Bash and loadable commands from source, you clone the git repository, configure, make, and install it on your system.</li>
<li>The enable command is used to load builtin commands using their struct names found in the symbol table.</li>
<li>Common loadable commands include <em>asort</em>, <em>truefalse</em>, <em>dsv</em>, <em>id</em>, <em>ln</em>, <em>mkdir</em>, <em>uname</em>, <em>mkdir</em>, and many others.</li>
<li>Loading builtins avoids running external commands, improving performance.</li>
<li>Builtin commands are shared objects that can be analyzed with <em>ldd</em>, <em>file</em>, <em>objdump</em>.</li>
<li>Loadable commands are installed in <em>/usr/local/lib/bash</em> and need <em>BASH_LOADABLES_PATH</em> set to load.</li>
</ul>
<!-- Links -->
</div>
</content><category term="bash"></category><category term="programming"></category></entry><entry><title>Availability calculation in "nines" notation</title><link href="https://vorakl.com/articles/availability/" rel="alternate"></link><published>2024-02-18T20:50:49-08:00</published><updated>2024-02-18T20:50:49-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-02-18:/articles/availability/</id><summary type="html"><p class="first last">Estimating one of SRE's most common SLO</p>
</summary><content type="html"><p><a class="reference internal" href="#summary">TLDR: quick summary of the article</a></p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>The rapidly growing interest in clouds, distributed systems, microservice architecture, and service-oriented applications has led to the emergence of a new branch of computer systems engineering - <em>Site Reliability Engineering</em> (SRE). One of the primary goals of the SRE is to ensure that a service meets certain requirements for production readiness. Services are generally considered to be <em>production</em> when they can be trusted and relied upon. A service provider and the customers, who usually pay for a service, document a common understanding of trust in a <em>Service Level Agreement</em> (SLA). It contains all expectations in the form of <em>Service Level Objectives</em> (SLO) and penalties if these expectations are not met. SLOs are <strong>performance</strong> and <strong>availability</strong> goals for a production service, defined on an annual time scale. These are the system characteristics that are both the most valuable to customers and worth committing to keep them within the defined expectations. SLOs are carefully quantified using <em>Service Level Indicators</em> (SLI). SLIs are chosen specifically for SLOs as a measurable form of some properties. It can be a metric or a value derived from logs. SLIs are typically sampled over a much shorter periods of time, from tens of seconds to a few minutes, and then a mean or an average distribution is applied to obtain a value.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><em>Site Reliability Engineers</em>, in turn, are responsible for ensuring that production services meet all target SLOs defined in the SLA. They do this by focusing on the reliability through a set of practices that are more or less standardized across the industry. Some of the most common practices include:</p>
<ul class="simple">
<li>Continuous monitoring of availability and performance characteristics;</li>
<li>Troubleshooting failures and eliminating degradation issues;</li>
<li>Improving overall stability and scalability through automation to keep all key metrics within expected ranges;</li>
<li>Preparing for disaster recovery through continuous stress testing using the error budget, an agreed upon timeframe in which a service can be degraded or unavailable.</li>
</ul>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Performance SLOs are important goals, but they are only important if a service is available. Availability is so important that it's sometimes <em>mistakenly</em> considered the only SLA component. Finding the right SLI to measure availability can be challenging. It's service-specific and depends on a variety of factors, such as the underlying infrastructure, architecture, etc. In SLO form, availability is expressed as a percentage in what is called &quot;nines&quot; notation. For example, in the clouds, the most common availability SLO is 99.9%, which is called &quot;3-nines&quot;. However, you are unlikely to find it higher than 99.999%, or &quot;5-nines&quot;. The actual availability of a service in percent is basically calculated as the ratio of the time a service is available to the total uptime (which includes downtime) over the past year.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>It is interesting that people who use the nines notation are actually referring to the time when a service is a sort of allowed to be down. This downtime, which is literally allowed by the SLA, forms what is called the <em>error budget</em>. While targeting 100% availability is hardly feasible, it turns out that from a practical point of view, it is more beneficial to commit to a lower availability. Even if all technical possibilities exist to provide more &quot;nines&quot;. At certain levels, services with a higher availability will not be noticed by the majority of customers, so it's probably not worth the effort. However, having some error budget opens the doors to experimentation and less stressful deployments of new product features.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>It is also useful to know how to estimate a potential downtime, the amount of time when your system may be out of service. To do this, remember that the availability SLO is defined for one-year period. Therefore, <em>60s by 60m by 24h by 365d</em> gives us <em>31536000</em> seconds of a total uptime. Then, if the availability is &quot;five-nines&quot; (99.999%), then the downtime is 0.001%, or <cite>31536000 * 0.001% =&gt; 31536000 * 0.00001 = 315.36</cite> sec, which is about <em>5.256</em> minutes per year that the service can be down. A similar calculation for &quot;three-nines&quot; (99.9%) availability shows that the service can be down for <cite>31536000 * 0.001 = 31536</cite> seconds, or 525.6 minutes, or <em>8.76</em> hours per year.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li><em>Site Reliability Engineering</em> (SRE) focuses on ensuring production services meet requirements for production readiness and can be trusted and relied upon.</li>
<li>A <em>Service Level Agreement</em> (SLA) contains expectations in the form of <em>Service Level Objectives</em> (SLOs) and penalties if not met.</li>
<li>SLOs define annual <em>performance</em> and <em>availability</em> goals for production services.</li>
<li><em>Service Level Indicators</em> (SLIs) are metrics chosen to measure SLOs, sampled over short periods like seconds to minutes.</li>
<li>SREs ensure services meet SLOs through standardized practices like monitoring, emergency response, and capacity planning.</li>
<li>Availability is the most important SLA component and is expressed as percentages or &quot;nines&quot; denoting hours of annual downtime allowed.</li>
<li>The 99.9% availability SLO allows 8.76 hours of annual downtime while 99.999% allows 5.256 minutes.</li>
<li>Allowing some downtime forms an &quot;<em>error budget</em>&quot; even if 100% uptime is technically possible.</li>
<li>Higher availability beyond a certain level may not be noticeable to most customers.</li>
<li>Calculating allowed downtime involves determining the total seconds in a year and applying the percentage downtime allowed.</li>
</ul>
<!-- Links -->
</div>
</content><category term="sre"></category></entry><entry><title>A little mess with function parameters in Python</title><link href="https://vorakl.com/articles/py-params/" rel="alternate"></link><published>2024-02-17T11:03:29-08:00</published><updated>2024-02-17T11:03:29-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-02-17:/articles/py-params/</id><summary type="html"><p class="first last">A variety of ways to define function parameters</p>
</summary><content type="html"><p><a class="reference internal" href="#summary">TLDR: quick summary of the article</a></p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>At first glance, Python functions look like those in most other languages, and they behave just as you'd expect. They take arguments, have default values, and can also return a value. This is intentional, of course. But once you dive deeper, you'll see how many specific nuances are hidden internally, providing a programmer with a number of features that make using functions in Python a much more powerful experience. Knowing the differences is critical to understanding why they behave the way they do, so you can get the most out of them.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>One of the key features is that functions in Python are objects that are created as soon as they are defined. This allows you to use functions as arguments in other functions or as return values, just like any other Python object. Functions' lifetime is different from the execution time, and they exist even after execution has finished. Functions, being objects, also have a set of predefined attributes that can be extended at any time, and their state is maintained outside of the execution. Parameters become local variables, which are completely different entities from function attributes, which exist only at execution time. Default values in the function definition can also be expressions, but they are evaluated only once. Function arguments are always passed by value, but the values they contain are references. This is why they're sometimes called pass-by-object-references. This also means that parameters, like any other variable in Python, are untyped, and contain a copy of a reference to an object. Changing a parameter (a local variable) generally doesn't change an object (passed as an argument) itself, but only stores a reference to another object. However, there is still a way to change an object that is passed as an argument, if it is a mutable object and the change is made directly to it rather than to a variable. For example, updating elements of a list or a dictionary.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>This tutorial will focus only on parameters, their different types, and various ways to define them. Let's start with the most common: a function definition with 4 parameters (a, b, c, d). No types, just names, with a lifetime during function execution, i.e. they are created on the stack as local variables only during function execution. When the function is called, it gets 4 arguments (w, x, y, z), which are also local variables (live on a stack), but in the calling environment, and contain references to some objects. Python takes these references stored in the arguments (w, x, y, z) and copies them into parameters (a, b, c, d) that live as local variables on a stack in the called environment:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">caller</span><span class="p">():</span>
<span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">40</span>
<span class="n">myfunc</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">)</span> <span class="c1"># 10 20 30 40</span>
<span class="n">caller</span><span class="p">()</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>When you call <em>myfunc()</em> this way, references to objects stored in arguments are copied as values to parameters according to their position, e.g. the value of <em>w</em> is copied to <em>a</em>, the value of <em>x</em> is copied to <em>b</em>, and so on. This is why such parameters are also called <strong>positional parameters</strong> - their position defines the value they get. However, you can assign values to parameters in any order by using <strong>keyword arguments</strong>, i.e. parameter_name=argument:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">caller</span><span class="p">():</span>
<span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">40</span>
<span class="n">myfunc</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="n">z</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="n">y</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="n">w</span><span class="p">)</span> <span class="c1"># 40 30 20 10</span>
<span class="n">caller</span><span class="p">()</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Although, all the 4 parameters must be defined each time the function is called. This can be avoided by setting default values for the parameters in the function definition. Keyword pairs must always be defined after positional parameters:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">caller</span><span class="p">():</span>
<span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">40</span>
<span class="n">myfunc</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="n">y</span><span class="p">)</span> <span class="c1"># 10 30 20 2</span>
<span class="n">myfunc</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="c1"># 10 40 30 2</span>
<span class="n">caller</span><span class="p">()</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Default values of parameters are stored in the <strong>__defaults__</strong> object attribute. Python allows you to do neat tricks, because this attribute is mutable, and you can assign default values directly to the attribute. This is even possible for the parameters that don't have default values in the function definition and normally need to be set on the function call:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">myfunc</span><span class="o">.</span><span class="vm">__defaults__</span><span class="p">)</span> <span class="c1"># (2,)</span>
<span class="n">myfunc</span><span class="o">.</span><span class="vm">__defaults__</span> <span class="o">=</span> <span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">300</span><span class="p">,</span> <span class="mi">400</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">myfunc</span><span class="o">.</span><span class="vm">__defaults__</span><span class="p">)</span> <span class="c1"># (100, 200, 300, 400)</span>
<span class="c1"># note that arguments are not passed at all!</span>
<span class="n">myfunc</span><span class="p">()</span> <span class="c1"># 100 200 300 400</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Default values can also be expressions, but are evaluated only once. For example, if a list is assigned as a default value, its object is created and its reference is assigned each time a default value is used. This may not be the behavior you expect, since a mutated list on a previous function call will still be passed as the default parameter value on the next call:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="p">[]):</span>
<span class="n">d</span><span class="o">.</span><span class="n">extend</span><span class="p">((</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c1"># 1 2 3 [1, 2, 3]</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">)</span> <span class="c1"># 10 20 30 [1, 2, 3, 10, 20, 30]</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>A possible workaround for having an empty list as the default value is to use <em>None</em> instead. This is a singleton, there is always only one instance. Check a parameter for equivalence to None in the code and assign an empty list during a function execution:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="k">if</span> <span class="n">d</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">d</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">d</span><span class="o">.</span><span class="n">extend</span><span class="p">((</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c1"># 1 2 3 [1, 2, 3]</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">)</span> <span class="c1"># 10 20 30 [10, 20, 30]</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><em>Positional</em> and <em>keyword</em> parameters can easily coexist in a relatively free form, with the caveat that keyword parameters are always defined after positional parameters. In general, when calling a function, arguments can be passed in a variety of combinations of positional or keyword types, or omitted with a default value:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span> <span class="c1"># 3 30 20 2</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>However, there are ways to force some parameters to be strictly positional, and others to be keyword only. The first is made possible by another nice feature - a variable number of parameters. Python supports <em>packing</em> and <em>unpacking</em> of arguments during a function call, which can be used to pass an arbitrary number of positional and keyword parameters. It has a special syntax for both cases: positional arguments are packed into <em>tuples</em> if there is a parameter prefixed with an asterisk, e.g. <strong>*params</strong>, and keyword parameters are packed into <em>dictionaries</em> if there is a parameter prefixed with a double asterisk, e.g. <strong>**kwparams</strong>. Note that keyword parameters or a <cite>**kwparams</cite> parameter, if defined, should always follow any positional parameters or a <cite>*params</cite>, if it's defined:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="n">params</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="o">**</span><span class="n">kwparams</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> <span class="c1"># 1 2 20 30</span>
<span class="k">print</span><span class="p">(</span><span class="n">params</span><span class="p">)</span> <span class="c1"># (3, 4)</span>
<span class="k">print</span><span class="p">(</span><span class="n">kwparams</span><span class="p">)</span> <span class="c1"># {&#39;e&#39;: 50, &#39;f&#39;: 60}</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">f</span><span class="o">=</span><span class="mi">60</span><span class="p">)</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Also, note that the <em>params</em> tuple and the <em>kwparams</em> dictionary are both used without asterisks in the code. It even works the other way around. If you have a tuple or a dictionary with some values, you can easily pass them to a function that takes positional or keyword arguments. Just keep an eye on the number of elements:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">4</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">kwargs</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;b&#39;</span><span class="p">:</span> <span class="mi">20</span><span class="p">,</span> <span class="s1">&#39;c&#39;</span><span class="p">:</span> <span class="mi">30</span><span class="p">,</span> <span class="s1">&#39;d&#39;</span><span class="p">:</span> <span class="mi">40</span><span class="p">}</span>
<span class="n">myfunc</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span> <span class="c1"># 1 2 10 40</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="c1"># 1 20 30 40</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>To define a unified function that can take any number of arguments of any type, it should have a definition that packs all types of parameters, e.g. <em>myfunc(*params, **kwparams)</em>. In addition, this syntax strictly separates keyword and positional parameters. If a function has any number of unaggregated keyword parameters after aggregating of positional parameters, then they are considered as <em>keyword-only parameters</em> with default values. The equivalent attribute with default values is called <strong>__kwdefaults__</strong>:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="n">params</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="o">**</span><span class="n">kwparams</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">print</span><span class="p">(</span><span class="n">myfunc</span><span class="o">.</span><span class="vm">__defaults__</span><span class="p">)</span> <span class="c1"># None</span>
<span class="k">print</span><span class="p">(</span><span class="n">myfunc</span><span class="o">.</span><span class="n">__kwdefaults__</span><span class="p">)</span> <span class="c1"># {&#39;c&#39;: 1, &#39;d&#39;: 2}</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>This syntax makes it possible to have a simpler function definition in case there is no need in an arbitrary number of parameters. Just put an asterisk between positional and keyword parameters:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="c1"># this doesn&#39;t work anymore</span>
<span class="c1"># myfunc(1, 3, 4, 5)</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 1 3 1 2</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Nevertheless, there is some room for improvisation. Positional arguments can still be passed as keywords:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">20</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="n">myfunc</span><span class="p">(</span><span class="n">b</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">a</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 4 3 1 2</span>
<span class="n">myfunc</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 4 3 1 20</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># 4 3 10 2</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c1"># 4 3 10 20</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Fortunately, Python has the syntax to strictly separate positional-only parameters (which cannot be passed as a keyword) from positional parameters (which can either be passed by a value or a keyword). Both can have default values, by the way. Just put a slash between them:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">myfunc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="o">/</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">20</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="c1"># this doesn&#39;t work anymore</span>
<span class="c1"># myfunc(a=1, b=2, c=4, d=3)</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 4 3 1 2</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 4 3 1 2</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># 4 30 1 2</span>
<span class="n">myfunc</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span> <span class="c1"># 4 30 10 20</span>
<span class="k">print</span><span class="p">(</span><span class="n">myfunc</span><span class="o">.</span><span class="vm">__defaults__</span><span class="p">)</span> <span class="c1"># (30,)</span>
<span class="k">print</span><span class="p">(</span><span class="n">myfunc</span><span class="o">.</span><span class="n">__kwdefaults__</span><span class="p">)</span> <span class="c1"># {&#39;c&#39;: 10, &#39;d&#39;: 20}</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>As a good example, let's take a look at a prototype of the built-in <em>sorted</em> function:</p>
<div class="highlight"><pre><span></span><span class="nb">sorted</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="o">/</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">reverse</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>This means that the first argument should always be passed as a positional-only argument. You can't pass it as <cite>iterable=&lt;something&gt;</cite> keyword. However, all subsequent arguments should always be defined as keywords-only. This also means that the order of these arguments, as well as how many of them are passed, is not important.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>Another good example is the <em>pop</em> method of the <em>list</em> class:</p>
<div class="highlight"><pre><span></span><span class="nb">list</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">index</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="o">/</span><span class="p">)</span>
</pre></div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><em>index</em> is a positional-only parameter, but if omitted, -1 will be passed by default.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li>Functions in Python are objects that are created when defined, allowing them to be used as arguments or return values like any other object.</li>
<li>Parameters become local variables during function execution, while function attributes exist outside of execution.</li>
<li>Arguments are passed by value, but parameters contain a copy of the reference. Changing a parameter doesn't change the original object, but changing a mutable object passed as an argument does.</li>
<li>Parameters can be defined positionally or by keyword. Expressions as the default values are evaluated only once at definition.</li>
<li>The <em>__defaults__</em> attribute stores default values of positional parameters and is mutable, allowing direct assignment.</li>
<li>An asterisk followed by a name (<cite>*var</cite>) packs positional arguments into a tuple, while a double asterisk followed by a name (<cite>**kwvar</cite>) packs keyword arguments into a dictionary.</li>
<li>Keyword arguments always follow positional arguments, with defaults filling in omitted values.</li>
<li>The use of an asterisk and a slash together could be described in the following way: <cite>&lt;positional-only parameters&gt;</cite> / <cite>&lt;positional or keyword parameters&gt;</cite> * <cite>&lt;keyword-only parameters&gt;</cite>.</li>
<li>The <em>__kwdefaults__</em> attribute stores default values of keyword-only parameters that defined after the asterisk.</li>
</ul>
<!-- Links -->
</div>
</content><category term="python"></category><category term="programming"></category></entry><entry><title>Using udp-link to enhance TCP connections stability</title><link href="https://vorakl.com/articles/udp-link/" rel="alternate"></link><published>2024-01-16T18:44:53-08:00</published><updated>2024-01-16T18:44:53-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-01-16:/articles/udp-link/</id><summary type="html"><p class="first last">A UDP transport layer implementation for proxying TCP connections</p>
</summary><content type="html"><p><a class="reference internal" href="#summary">TLDR: quick summary of the article</a></p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>I recently discovered <a class="reference external" href="https://github.com/pgul/udp-link">udp-link</a>, a very useful project for all those guys like
me who spend most of their working time in terminals over SSH connections.
The tool implements the UDP transport layer, which acts as a proxy for
TCP connections. It's designed to be integrated into the OpenSSH configuration.
However, with a little trick, it can also be used as a general-purpose
TCP-over-UDP proxy. <em>udp-link</em> greatly improves the stability of connections
over unreliable networks that experience packet loss and intermittent
connectivity. It also includes an IP roaming, which allows TCP connections
to remain alive even if an IP address changes.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><em>udp-link</em> is written in C by <a class="reference external" href="https://gul.kiev.ua">Pavel Gulchuk</a>, who has a lot of experience
in running unreliable networks. Despite being a young project, the version
<a class="reference external" href="https://github.com/pgul/udp-link/releases/tag/v0.4">v0.4</a> shows pretty stable results. Once configured, you won't think about it
anymore. Unless you're surprised every time when SSH connections don't break,
survive a laptop's sleep mode and connections
to different Wi-Fi networks.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>In the current architecture, the client-side tool takes data from the standard
input and sends it to the server side via UDP. The same copy of the tool takes
that data from the network on a specific UDP port and sends it to a TCP service
(local or remote from a server-side perspective).
The destination TCP service and a UDP listening port on the server
side can be specified on the client at startup. Otherwise, a TCP connection
will be established with <em>127.0.0.1:22</em>, and a port will be randomly chosen from
a predefined port range. Note that the server firewall should allow the
traffic to this port range on UDP. The TCP service can also reside on a different
host if the server side is used as a jumpbox. I consider it one of the greatest
features that <em>udp-link</em> uses a zero server-side configuration, all
configuration tweaks happen only on the client side.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><em>udp-link</em> on the server side does not run as a daemon or listen on a UDP port
all the time. Instead, the client initiates the invocation of the tool
on the server side in listening mode with a randomly generated key. This key
is used to authenticate the client connection. This is done on demand by
establishing a normal SSH connection over TCP with the server side, temporarily,
just to run the tool in the background. The connection is then closed.
This is where a secure client authentication comes into play. <em>udp-link</em> <strong>doesn't
encrypt the transferred data</strong>, which is useful when is used together with SSH
because it avoids a double encryption, but needs to be kept that in mind when
used with other configurations.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>To start using <em>udp-link</em>, you need to clone the repository, compile, and install
the tool on both sides</p>