-
Notifications
You must be signed in to change notification settings - Fork 0
/
02_julia_intro.Rmd
1093 lines (853 loc) · 39.7 KB
/
02_julia_intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
editor_options:
markdown:
wrap: sentence
---
# Meeting Julia
## Why Julia
```{julia, echo=FALSE}
using Markdown
using InteractiveUtils
```
People have asked us why we wrote this book using Julia instead of Python or R, which are the current standards in the data science world.
While Python and R are also great choices, Julia is an up and coming language that will surely have an impact in the coming years.
It performs faster than pure R and Python (as fast as C) while maintaining the same degree of readability, allowing us to write highly performant code in a simple way.
Julia is already being used in many top-tier tech companies and scientific research ---there are plenty of scientists and engineers of different disciplines collaborating with Julia, which gives us a wide range of possibilities to approach different problems.
Often, libraries in languages like Python or R are optimized to be performant, but this usually involves writing code in other languages better suited for this task such as C or Fortran, as well as writing code to manage the communication between the high level language and the low level one.
Julia, on the other hand, expands the possibilities of people who have concrete problems that involve a lot of computation.
Libraries can be developed to be performant in plain Julia code, following some basic coding guidelines to get the most out of it.
This enables useful libraries to be created by people without programming or Computer Science expertise.
## Julia presentation
Julia is a free and open-source general-purpose language, designed and developed by Jeff Bezanson, Alan Edelman, Viral B. Shah and Stefan Karpinski at MIT.
Julia is created from scratch to be both fast and easy to understand, even for people who are not programmers or computer scientists.
It has abstraction capabilities of high-level languages, while also being really fast, as its slogan calls "Julia looks like Python, feels like Lisp, runs like Fortran".
Before Julia, programming languages were limited to either having a simple syntax and good abstraction capabilities and therefore user-friendly or being high-performance, which was necessary to solve resource-intensive computations.
This led applied scientists to face the task of not only learning two different languages, but also learning how to have them communicating with one another.
This difficulty is called the two-language problem, which Julia creators aim to solve with this new language.
Julia is dynamically typed and great for interactive use.
It also uses multiple dispatch as a core design concept, which adds to the composability of the language.
In conventional, single-dispatched programming languages, when invoking a method, one of the arguments has a special treatment since it determines which of the methods contained in a function is going to be applied.
Multiple dispatch is a generalization of this for all the arguments of the function, so the method applied is going to be the one that matches exactly the number of types of the function call.
## Installation
For the installation process, we recommend you follow the instructions provided by the Julia team: \> [Platform Specific Instructions for Official Binaries](https://julialang.org/downloads/platform/): These instructions will get you through a fresh installation of Julia depending on the specifications of your computer.
It is a bare bones installation, so it will only include the basic Julia packages.
All along the book, we are going to use specific Julia packages that you have to install before calling them in your code.
Julia has a built-in packet manager that makes the task of installing new packages and checking compatibilities very easy.
First, you will need to start a Julia session.
For this, type in your terminal
``` {.julia}
~ julia
julia>
```
At this point, your Julia session will have started.
What you see right now is a **Julia REPL** (read-eval-print loop), an interactive command line prompt.
Here you can quickly evaluate Julia expressions, get help about different Julia functionalities and much more.
The REPL has a set of different modes you can activate with different keybindings.
The *Julian mode* is the default one, where you can directly type any Julia expression and press the Enter key to evaluate and print it.
The *help mode* is activated with an interrogation sign '?'
.
You will notice that the prompt will now change,
``` {.julia}
julia> ?
help?>
```
By typing the name of a function or a Julia package, you will get information about it as well as usage examples.
Another available mode is the *shell mode*.
This is just a way to input terminal commands in your Julia REPL. You can access this mode by typing a semicolon,
``` {.julia}
julia> ;
shell>
```
Maybe one of the most used, along with the default Julian mode, is the *package manager mode*.
When in this mode, you can perform tasks such as adding and updating packages.
It is also useful to manage project environments and controlling package versions.
To switch to the package manager, type a closing square bracket ']',
``` {.julia}
julia> ]
(@v1.5) pkg>
```
If you see the word 'pkg' in the prompt, it means you accessed the package manager successfully.
To add a new package, you just need to write
``` {.julia}
(@v1.5) pkg> add NewPackage
```
It is as simple as that!
All Julia commands are case-sensitive, so be sure to write the package name --and in the future, all functions and variables too-- correctly.
## First steps into the Julia world
As with every programming language, it is useful to know some of the basic operations and functionalities.
We encourage you to open a Julia session REPL and start experimenting with all the code written in this chapter to start developing an intuition about the things that make Julia code special.
The common arithmetical and logical operations are all available in Julia:
- $+$: Add operator
- $-$: Subtract operator
- $*$: Product operator
- $/$: Division operator
Julia code is intended to be very similar to math.
So instead of doing something like
``` {.julia}
julia> 2*x
```
you can simply do
``` {.julia}
julia> 2x
```
For this same purpose, Julia has a great variety of unicode characters, which enable us to write things like Greek letters and subscripts/superscripts, making our code much more beautiful and easy to read in a mathematical form.
In general, unicode characters are activated by using '', followed by the name of the character and then pressing the 'tab' key.
For example,
``` {.julia}
julia> \beta # and next we press tab
julia> β
```
You can add subscripts by using '\_' and superscripts by using '\^', followed by the character(s) you want to modify and then pressing Tab.
For example,
``` {.julia}
julia> L\_0 # and next we press tab
julia> L₀
```
Unicodes behave just like any other letter of your keyboard.
You can use them inside strings or as variable names and assign them a value.
``` {.julia}
julia> β = 5
5
julia> \"The ⌀ of the circle is $β \"
\"The ⌀ of the circle is 5 \"
```
Some popular Greek letters already have their values assigned.
``` {.julia}
julia> \pi # and next we press tab
julia> π
π = 3.1415926535897...
julia> \euler # and next we press tab
julia> ℯ
ℯ = 2.7182818284590...
```
You can see all the unicodes supported by Julia [here](https://docs.julialang.org/en/v1/manual/unicode-input/)
The basic number types are also supported in Julia.
We can explore this with the function `typeof()`, which outputs the type of its argument, as it is represented in Julia.
Let's see some examples,
``` {.julia}
julia>typeof(2)
Int64
julia>typeof(2.0)
Float64
julia>typeof(3 + 5im)
Complex{Int64}
```
These were examples of integers, floats and complex numbers.
All data types in Julia start with a capital letter.
Notice that if you want to do something like,
``` {.julia}
julia> 10/2
5.0
```
the output is a floating point number, although the two numbers in the operation are integers.
This is because in Julia, division of integers always results in floats.
When valid, you can always do
``` {.julia}
julia> Int64(5.0)
5
```
to convert from one data type to another.
Following with the basics, let's take a look at how logical or boolean operations are done in Julia.
Booleans are written as 'true' and 'false'.
The most important boolean operations for our purposes are the following:
``` {.julia}
!: \"not\" logical operator
&: \"and\" logical operator
|: \"or\" logical operator
==: \"equal to\" logical operator
!=: \"different to\" logical operator
>: \"greater than\" operator
<: \"less than\" operator
>=: \"greater or equal to\" operator
<=: \"less or equal to\" operator
```
Some examples of these,
``` {.julia}
julia> true & true
true
julia> true & false
false
julia> true & !false
true
julia> 3 == 3
true
julia> 4 == 5
false
julia> 7 <= 7
true
```
Comparisons can be chained to have a simpler mathematical readability, like so:
``` {.julia}
julia> 10 <= 11 < 24
true
julia> 5 > 2 < 1
false
```
The next important topic in this Julia programming basics, is the strings data type and basic manipulations.
As in many other programming languages, strings are created between '"',
``` {.julia}
julia> \"This is a Julia string!\"
\"This is a Julia string!\"
```
You can access a particular character of a string by writing the index of that character in the string between brackets right next to it.
Likewise, you can access a substring by writing the first and the last index of the substring you want, separated by a colon, all this between brackets.
This is called *slicing*, and it will be very useful later when working with arrays.
Here's an example:
``` {.julia}
julia> \"This is a Julia string!\"[1] # this will output the first character of the string and other related information.
'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)
julia> \"This is a Julia string!\"[1:4] # this will output the substring obtained of going from the first index to the fourth
\"This\"
```
A really useful tool when using strings is *string interpolation*.
This is a way to evaluate an expression inside a string and print it.
This is usually done by writing a dollar symbol \$ \$ \$ followed by the expression between parentheses.
For example,
``` {.julia}
julia> \"The product between 4 and 5 is $(4 * 5)\"
\"The product between 4 and 5 is 20\"
```
This wouldn't be a programming introduction if we didn't include printing 'Hello World!'
, right?
Printing in Julia is very easy.
There are two functions for printing: `print()` and `println()`.
The former will print the string you pass in the argument, without creating a new line.
What this means is that, for example, if you are in a Julia REPL and you call the `print()` function two or more times, the printed strings will be concatenated in the same line, while successive calls to the `println()` function will print every new string in a new, separated line from the previous one.
So, to show this we will need to execute two print actions in one console line.
To execute multiple actions in one line you just need to separate them with a ;.
``` {.julia}
julia> print("Hello"); print(" world!")
Hello world!
julia> println("Hello"); println("world!")
Hello
world!
```
It's time now to start introducing collections of data in Julia.
We will start by talking about arrays.
As in many other programming languages, arrays in Julia can be created by listing objects between square brackets separated by commas.
For example,
``` {.julia}
julia> int_array = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> str_array = [\"Hello\", \"World\"]
2-element Array{String,1}:
\"Hello\"
\"World\"
```
As you can see, arrays can store any type of data.
If all the data in the array is of the same type, it will be compiled as an array of that data type.
You can see that in the pattern that the Julia REPL prints out:
1. Firstly, it displays how many elements there are in the collection.
In our case, 3 elements in int_array and 2 elements in str_array.
When dealing with higher dimensionality arrays, the shape will be informed.
2. Secondly, the output shows the type and dimensionality of the array.
The first element inside the curly brackets specifies the type of every member of the array, if they are all the same.
If this is not the case, type 'Any' will appear, meaning that the collection of objects inside the array is not homogeneous in its type.
Compilation of Julia code tends to be faster when arrays have a defined type, so it is recommended to use homogeneous types when possible.
The second element inside the curly braces tells us how many dimensions there arein the array.
Our example shows two one-dimensional arrays, hence a 1 is printed.
Later, we will introduce matrices and, naturally, a 2 will appear in this place instead a 1.
<!-- -->
3) Finally, the content of the array is printed in a columnar way.
When building Julia, the convention has been set so that it has column-major ordering.
So you can think of standard one-dimensional arrays as column vectors, and in fact this will be mandatory when doing calculations between vectors or matrices.
A row vector (or a $1$x$n$ array), in the other hand, can be defined using whitespaces instead of commas,
``` {.julia}
julia> [3 2 1 4]
1×4 Array{Int64,2}:
3 2 1 4
```
In contrast to other languages, where matrices are expressed as 'arrays of arrays', in Julia we write the numbers in succession separated by whitespaces, and we use a semicolon to indicate the end of the row, just like we saw in the example of a row vector.
For example,
``` {.julia}
julia> [1 1 2; 4 1 0; 3 3 1]
3×3 Array{Int64,2}:
1 1 2
4 1 0
3 3 1
```
The length and shape of arrays can be obtained using the `length()` and `size()` functions respectively.
``` {.julia}
julia> length([1, -1, 2, 0])
4
julia> size([1 0; 0 1])
(2, 2)
julia> size([1 0; 0 1], 2) # you can also specify the dimension where you want the shape to be computed
2
```
An interesting feature in Julia is *broadcasting*.
Suppose you wanted to add the number 2 to every element of an array.
You might be tempted to do
``` {.julia}
julia> 2 + [1, 1, 1]
ERROR: MethodError: no method matching +(::Array{Int64,1}, ::Int64)
For element-wise addition, use broadcasting with dot syntax: array .+ scalar
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:538
+(::Complex{Bool}, ::Real) at complex.jl:301
+(::Missing, ::Number) at missing.jl:115
...
Stacktrace:
[1] top-level scope at REPL[18]:1
```
As you can see, the expression returns an error.
If you watch this error message closely, it gives you a good suggestion about what to do.
If we now try writing a period '.' right before the plus sign, we get
``` {.julia}
julia> 2 .+ [1, 1, 1]
3-element Array{Int64,1}:
3
3
3
```
What we did was broadcast the sum operator '+' over the entire array.
This is done by adding a period before the operator we want to broadcast.
In this way we can write complicated expressions in a much cleaner, simpler and compact way.
This can be done with any of the operators we have already seen,
``` {.julia}
julia> 3 .> [2, 4, 5] # this will output a bit array with 0s as false and 1s as true
3-element BitArray{1}:
1
0
0
```
If we do a broadcasting operation between two arrays with the same shape, whatever operation you are broadcasting will be done element-wise.
For example,
``` {.julia}
julia> [7, 2, 1] .* [10, 4, 8]
3-element Array{Int64,1}:
70
8
8
julia> [10 2 35] ./ [5 2 7]
1×3 Array{Float64,2}:
2.0 1.0 5.0
julia> [5 2; 1 4] .- [2 1; 2 3]
2×2 Array{Int64,2}:
3 1
-1 1
```
If we use the broadcast operator between a column vector and a row vector instead, the broadcast is done for every row of the first vector and every column of the second vector, returning a matrix,
``` {.julia}
julia> [1, 0, 1] .+ [3 1 4]
3×3 Array{Int64,2}:
4 2 5
3 1 4
4 2 5
```
Another useful tool when dealing with arrays are concatenations.
Given two arrays, you can concatenate them horizontally or vertically.
This is best seen in an example
``` {.julia}
julia> vcat([1, 2, 3], [4, 5, 6]) # this concatenates the two arrays vertically, giving us a new long array
6-element Array{Int64,1}:
1
2
3
4
5
6
julia> hcat([1, 2, 3], [4, 5, 6]) # this stacks the two arrays one next to the other, returning a matrix
3×2 Array{Int64,2}:
1 4
2 5
3 6
```
With some of these basic tools to start getting your hands dirty in Julia, we can get going into some other functionalities like loops and function definitions.
We have already seen a for loop.
For loops are started with a `for` keyword, followed by the name of the iterator and the range of iterations we want our loop to cover.
Below this `for` statement we write what we want to be performed in each loop and we finish with an `end` keyword statement.
Let's return to the example we made earlier,
``` {.julia}
julia> for i in 1:100
println(i)
end
```
The syntax `1:100` is the Julian way to define a range of all the numbers from 1 to 100, with a step of 1.
We could have set `1:2:100` if we wanted to jump between numbers with a step size of 2.
We can also iterate over collections of data, like arrays.
Consider the next block of code where we define an array and then iterate over it,
``` {.julia}
julia> arr = [1, 3, 2, 2]
julia> for element in arr
println(element)
end
1
3
2
2
```
As you can see, the loop was done for each element of the array.
It might be convenient sometimes to iterate over a collection.
Conditional statements in Julia are very similar to most languages.
Essentially, a conditional statement starts with the `if` keyword, followed by the condition that must be evaluated to true or false, and then the body of the action to apply if the condition evaluates to true.
Then, optional `elseif` keywords may be used to check for additional conditions, and an optional `else` keyword at the end to execute a piece of code if all of the conditions above evaluate to false.
Finally, as usual in Julia, the conditional statement block finishes with an `end` keyword.
``` {.julia}
julia> x = 3
julia> if x > 2
println(\"x is greater than 2\")
elseif 1 < x < 2
println(\"x is in between 1 and 2\")
else
println(\"x is less than 1\")
end
x is greater than 2
```
Now consider the code block below, where we define a function to calculate a certain number of steps of the Fibonacci sequence,
``` {.julia}
julia> n1 = 0
julia> n2 = 1
julia> m = 10
julia> function fibonacci(n1, n2, m)
fib = Array{Int64,1}(undef, m)
fib[1] = n1
fib[2] = n2
for i in 3:m
fib[i] = fib[i-1] + fib[i-2]
end
return fib
end
fibonacci (generic function with 1 method)
```
Here, we first made some variable assignments, variables $n1$, $n2$ and $m$ were assigned values 0, 1 and 10.
Variables are assigned simply by writing the name of the variable followed by an 'equal' sign, and followed finally by the value you want to store in that variable.
There is no need to declare the data type of the value you are going to store.
Then, we defined the function body for the fibonacci series computation.
Function blocks start with the `function` keyword, followed by the name of the function and the arguments between brackets, all separated by commas.
In this function, the arguments will be the first two numbers of the sequence and the total length of the fibonacci sequence.
Inside the body of the function, everything is indented.
Although this is not strictly necessary for the code to run, it is a good practice to have from the bbeginning, since we want our code to be readable.
At first, we initialize an array of integers of one dimension and length $m$, by allocating memory.
This way of initializing an array is not strictly necessary, you could have initialized an empty array and start filling it later in the code.
But it is definitely a good practice to learn for a situation like this, where we know how long our array is going to be and optimizing code performance in Julia.
The memory allocation for this array is done by initializing the array as we have already seen earlier.
`julia {Int64,1}`just means we want a one-dimensional array of integers.
The new part is the one between parenthesis, `julia (undef, m)`.
This just means we are initializing the array with undefined values --which will be later modified by us--, and that there will be a number $m$ of them.
Don't worry too much if you don't understand all this right now, though.
We then proceed to assign the two first elements of the sequence and calculate the rest with a for loop.
Finally, an `end` keyword is necessary at the end of the for loop and another one to end the definition of the function.
Evaluating our function in the variables $n1$, $n2$ and $m$ already defined, gives us:
``` {.julia}
julia> fibonacci(n1, n2, m)
10-element Array{Int64,1}:
0
1
1
2
3
5
8
13
21
34
```
Remember the broadcasting operation, that dot we added to the bbeginning of another operator to apply it on an entire collection of objects?
It turns out that this can be done with functions as well!
Consider the following function,
``` {.julia}
julia> function isPositive(x)
if x >= 0
return true
elseif x < 0
return false
end
end
isPositive (generic function with 1 method)
julia> isPositive(3)
true
julia> isPositive.([-1, 1, 3, -5])
4-element BitArray{1}:
0
1
1
0
```
As you can see, we broadcasted the `isPositive()` function over every element of an array by adding a dot next to the end of the function name.
It is as easy as that!
Once you start using this feature, you will notice how useful it is.
One thing concerning functions in Julia is the 'bang'(!) convention.
Functions that have a name ending with an exclamation mark (or bang), are functions that change their inputs in-place.
Consider the example of the pop!
function from the Julia Base package.
Watch closely what happens to the array over which we apply the function.
``` {.julia}
julia> arr = [1, 2, 3]
julia> n = pop!(arr)
3
julia> arr
2-element Array{Int64,1}:
1
2
julia> n
3
```
Did you understand what happened?
First, we defined an array.
Then, we applied the `pop!()` function, which returns the last element of the array and assigns it to n.
But notice that when we call our arr variable to see what it is storing, now the number 3 is gone.
This is what functions with a bang do and what we mean with modifying *in-place*.
Try to follow this convention whenever you define a function that will modify other objects in-place!
Sometimes, you will be in a situation where you may need to use some function, but you don't really need to give it name and store it, because it's not very relevant to your code.
For these kinds of situations, an *anonymous* or *lambda* function may be what you need.
Typically, anonymous functions will be used as arguments to higher-order functions.
This is just a fancy name to functions that accept other functions as arguments, that is what makes them of higher-order.
We can create an anonymous function and apply it to each element of a collection by using the `map()` keyword.
You can think of the `map()` function as a way to broadcast any function over a collection.
Anonymous functions are created using the arrow `->` syntax.
At the left-hand side of the arrow, you must specify what the arguments of the function will be and their name.
At the right side of the arrow, you write the recipe of the things to do with these arguments.
Let's use an anonymous function to define a not-anonymous function, just to illustrate the point.
``` {.julia}
julia> f = (x,y) -> x + y
#1 (generic function with 1 method)
julia> f(2,3)
5
```
You can think about what we did as if $f$ were a variable that is storing some function.
Then, when calling $f(2,3)$ Julia understands we want to evaluate the function it is storing with the values 2 and 3.
Let's see now how the higher-order function `map()` uses anonymous functions.
We will broadcast our anonymous function x\^2 + 5 over all the elements of an array.
``` {.julia}
julia> map(x -> x^2 + 5, [2, 4, 6, 3, 3])
5-element Array{Int64,1}:
9
21
41
14
14
```
The first argument of the map function is another function.
You can define new functions and then use them inside map, but with the help of anonymous functions you can simply create a throw-away function inside map's arguments.
This function we pass as an argument, is then applied to every member of the array we input as the second argument.
"""
Now let's introduce another data collection: Dictionaries.
A dictionary is a collection of key-value pairs.
You can think of them as arrays, but instead of being indexed by a sequence of numbers they are indexed by keys, each one linked to a value.
To create a dictionary we use the function `Dict()` with the key-value pairs as arguments.
`Dict(key1 => value1, key2 => value2)`.
``` {.julia}
julia> Dict("A" => 1, "B" => 2)
Dict{String,Int64} with 2 entries:
"B" => 2
"A" => 1
```
So we created our first dictionary.
Let's review what the Julia REPL prints out:
`Dict{String,Int64}` tells us the dictionary data type that Julia automatically assigns to the pair (key,value).
In this example, the keys will be strings and the values, integers.
Finally, it prints all the (key =\> value) elements of the dictionary.
In Julia, the keys and values of a dictionary can be of any type.
``` {.julia}
julia> Dict("x" => 1.4, "y" => 5.3)
Dict{String,Float64} with 2 entries:
"x" => 1.4
"y" => 5.3
julia> Dict(1 => 10.0, 2 => 100.0)
Dict{Int64,Float64} with 2 entries:
2 => 100.0
1 => 10.0
```
Letting Julia automatically assign the data type can cause bugs or errors when adding new elements.
Thus, it is a good practice to assign the data type of the dictionary ourselves.
To do it, we just need to indicate it in between brackets { } after the `Dict` keyword:
`Dict{key type, value type}(key1 => value1, key2 => value2)`
``` {.julia}
julia> Dict{Int64,String}(1 => "Hello", 2 => "Wormd")
Dict{Int64,String} with 2 entries:
2 => "Wormd"
1 => "Hello"
```
Now let's see the dictionary's basic functions.
First, we will create a dictionary called "languages" that contains the names of programming languages as keys and their release year as values.
``` {.julia}
julia> languages = Dict{String,Int64}("Julia" => 2012, "Java" => 1995, "Python" => 1990)
Dict{String,Int64} with 3 entries:
"Julia" => 2012
"Python" => 1990
"Java" => 1995
```
To grab a key's value we need to indicate it in between brackets [].
``` {.julia}
julia> languages["Julia"]
2012
```
We can easily add an element to the dictionary.
``` {.julia}
julia> languages["C++"] = 1980
1980
julia> languages
Dict{String,Int64} with 4 entries:
"Julia" => 2012
"Python" => 1990
"Java" => 1995
"C++" => 1980
```
We do something similar to modify a key's value:
``` {.julia}
julia> languages["Python"] = 1991
1991
julia> languages
Dict{String,Int64} with 3 entries:
"Julia" => 2012
"Python" => 1991
"C++" => 1980
```
Notice that the ways of adding and modifying a value are identical.
That is because keys of a dictionary can never be repeated or modified.
Since each key is unique, assigning a new value for a key overrides the previous one.
To delete an element we use the `delete!` method.
``` {.julia}
julia> delete!(languages,"Java")
Dict{String,Int64} with 3 entries:
"Julia" => 2012
"Python" => 1990
"C++" => 1980
```
To finish, let's see how to iterate over a dictionary.
``` {.julia}
julia> for(key,value) in languages
println("$key was released in $value")
end
Julia was released in 2012
Python was released in 1991
C++ was released in 1980
```
"""
Now that we have discussed the most important details of Julia's syntax, let's focus our attention on some of the packages in Julia's ecosystem."
## Julia's Ecosystem: Basic plotting and manipulation of DataFrames
Julia's ecosystem is composed by a variety of libraries which focus on technical domains such as Data Science (DataFrames.jl, CSV.jl, JSON.jl), Machine Learning (MLJ.jl, Flux.jl, Turing.jl) and Scientific Computing (DifferentialEquations.jl), as well as more general purpose programming (HTTP.jl, Dash.jl).
We will now consider one of the libraries that will be accompanying us throughout the book to make visualizations, Plots.jl.
To install the Plots.jl library we need to go to the Julia package manager mode as we saw earlier.
``` {.julia}
julia> ]
(@v1.5) pkg>
(@v1.5) pkg> add Plots.jl
```
There are some other great packages like Gadfly.jl and VegaLite.jl, but Plots will be the best to get you started.
Let's import the library with the 'using' keyword and start making some plots.
We will plot the first ten numbers of the fibonacci sequence using the `scatter()` function.
```{julia}
begin
using Plots
sequence = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
scatter(sequence, xlabel="n", ylabel="Fibonacci(n)", color="purple", label=false, size=(450, 300))
end
```
### Plotting with Plots.jl
Let's make a plot of the 10 first numbers in the fibonacci sequence.
For this, we can make use of the `scatter()` function:
The only really important argument of the scatter function in the example above is *sequence*, the first one, which tells the function what is the data we want to plot.
The other arguments are just details to make the visualization prettier.
Here we have used the scatter function because we want a discrete plot for our sequence.
In case we wanted a continuous one, we could have used `plot()`.
Let's see this applied to our fibonacci sequence:
```{julia}
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
```
```{julia}
begin
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
scatter!(sequence, label=false, color="purple", size=(450, 300))
end
```
In the example above, a plot is created when we call the `plot()` function.
What the `scatter!()` call then does, is to modify the global state of the plot in-place.
If not done this way, both plots wouldn't be sketched together.
A nice feature that the Plots.jl package offers, is the fact of changing plotting backends.
There exist various plotting packages in Julia, and each one has its own special features and aesthetic flavour.
The Plots.jl package integrates these plotting libraries and acts as an interface to communicate with them in an easy way.
By default, the `GR` backend is the one used.
In fact, this was the plotting engine that generated the plots we have already done.
The most used and maintained plotting backends up to date, are the already mentioned `GR`, `Plotly/PlotlyJS`, `PyPlot`, `UnicodePlots` and `InspectDR`.
The backend you choose will depend on the particular situation you are facing.
For a detailed explanation on backends, we recommend you visit the Julia Plots [documentation](https://docs.juliaplots.org/latest/backends/).
Through the book we will be focusing on the `GR`backend, but as a demonstration of the ease of changing from one backend to another, consider the code below.
The only thing added to the code for plotting that we have already used, is the `pyplot()` call to change the backend.
If you have already coded in Python, you will feel familiar with this plotting backend.
```{julia}
begin
pyplot()
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
scatter!(sequence, label=false, color="purple", size=(450, 300))
end
```
Analogously, we can use the `plotlyjs` backend, which is specially suited for interactivity.
```{julia}
begin
plotlyjs()
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
scatter!(sequence, label=false, color="purple", size=(450, 300))
end
```
Each of these backends has its own scope, so there may be plots that one backend can do that other can't.
For example, 3D plots are not supported for all backends.
The details are well explained in the Julia documentation.
"
### Introducing DataFrames.jl
When dealing with any type of data in large quantities, it is essential to have a framework to organize and manipulate it in an efficient way.
If you have previously used Python, you probably came across the Pandas package and dataframes.
In Julia, the DataFrames.jl package follows the same idea.
Dataframes are objects with the purpose of structuring tabular data in a smart way.
You can think of them as a table, a matrix or a spreadsheet.
In the dataframe convention, each row is an observation of a vector-type variable, and each column is the complete set of values of a given variable, across all observations.
In other words, for a single row, each column represents a a realization of a variable.
Let's see how to construct and load data into a dataframe.
There are many ways you can accomplish this.
Consider we had some data in a matrix and we want to organize it in a dataframe.
First, we are going to create some 'fake data' and loading that in a Julia DataFrame,
```{julia}
begin
using DataFrames, Random
Random.seed!(123)
fake_data = rand(5, 5) # this creates a 5x5 matrix with random values between 0
# and 1 in each matrix element.
df = DataFrame(fake_data)
end
```
As you can see, the column names were initialized with values $x1, x2, ...$.
We probably would want to rename them with more meaningful names.
To do this, we have the `rename!()` function.
Remember that this function has a bang, so it changes the dataframe in-place, be careful!
Below we rename the columns of our dataframe,
```{julia}
rename!(df, ["one", "two", "three", "four", "five"])
```
The first argument of the function is the dataframe we want to modify, and the second an array of strings, each one corresponding to the name of each column.
Another way to create a dataframe is by passing a list of variables that store arrays or any collection of data.
For example, "
```{julia}
DataFrame(column1=1:10, column2=2:2:20, column3=3:3:30)
```
As you can see, the name of each array is automatically assigned to the columns of the dataframe.
Furthermore, you can initialize an empty dataframe and start adding data later if you want,
```{julia}
begin
df_ = DataFrame(Names = String[],
Countries = String[],
Ages = Int64[])
df_ = vcat(df_, DataFrame(Names="Juan", Countries="Argentina", Ages=28))
end
```
We have used the `vcat()`function seen earlier to append new data to the dataframe.
You can also add a new column very easily,
```{julia}
begin
df_.height = [1.72]
df_
end
```
You can access data in a dataframe in various ways.
One way is by the column name.
For example,
```{julia}
df.three
```
```{julia}
df."three"
```
But you can also access dataframe data as if it were a matrix.
You can treat columns either as their column number or by their name,
```{julia}
df[1,:]
```
```{julia}
df[1:2, "one"]
```
```{julia}
df[3:5, ["two", "four", "five"]]
```
The column names can be accessed by the `names()` function,
```{julia}
names(df)
```
Another useful tool for having a quick overview of the dataframe, typically when in an exploratory process, is the `describe()` function.
It outputs some information about each column, as you can see below,
```{julia}
describe(df)
```
To select data following certain conditions, you can use the `filter()` function.
Given some condition, this function will throw away all the rows that don't evaluate the condition to true.
This condition is expressed as an anonymous function and it is written in the first argument.
In the second argument of the function, the dataframe where to apply the filtering is indicated.
In the example below, all the rows that have their 'one' column value greater than $0.5$ are filtered.