-
Notifications
You must be signed in to change notification settings - Fork 1
/
06-Functional_Programming.Rmd
280 lines (180 loc) · 13.6 KB
/
06-Functional_Programming.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# Functional Programming with R and Python
At this point we are ready to discuss the functional programming paradigm in R and Python. The other important paradigm, Object-Oriented, will be discussed in the following Chapter.
In using python in a functional paradigm, a script is built as a series of functions. Data is passed through one of more functions (ie. as series of functions) to carry out steps of a larger task set. When data is passed to a function to carry out some task or analysis on a dataset results may then printed out directly, stored in memory or on disk, or displayed in graphical form by passing the analysis to another set of functions such as those of matplotlib \index{matplotlib} for plotting etc.
A function is a set of stored programmatic statements stored in an accessible form for reuse. The function is called in R or Python by simply inserting it by name in a statement to be executed by one of the interpreters.
## Defining Functions
A function is defined in a specific format for each language. The definition of functions is straight forward with the previously introducted distinctions in code block syntax between R and Python. From the [overview chapter](Comparative_Overview.Rmd) it can be recalled that R uses C-like syntax to set out code blocks with brackets { }. Python uses pascalilike indentation to set out code blocks.
R by default returns the last value produced in the function. This can be controlled explicitly using the `return()` function. If the python function is required to provide return data, a return statement is used. If absent, by default, nothing is returned. In python parentheses may be required if the return 'value' is an expression or data stucture such as a list.
### R Scripting
```{r}
doArea <- function(radius){
pi * radius^2
}
```
With this R function the value of the result of the calculation is returned as result of the doArea function (see below). It can be assigned outside the function of used directly as a value.
### Python Scripting
```{python}
import math
def calcArea(radius):
area = math.pi * radius**2
return area
```
In the python script we must explicitly tell the function what to return or nothing will be.
## Calling and Using Functions
SO here we show direct printing of doArea result and assignment of the result plus 100 to x which is printed (but could be used as a value in some other way.)
### R Scripting
```{r}
radius = 4
print(doArea(radius))
x = doArea(4) + 100
print(x)
```
So the area is 50.255 (arbitary units) but x is 150.255 (units).
Lets see how we would do the same thing with python.
### Python Scripting
```{python}
import math
def calcArea(radius):
area = math.pi * radius**2
return area
print(calcArea(4))
```
Now, consider again the differences in how the functions were called to be used. COnsider first that the book is set-up as written in rMarkdown which is running in an instance of R. The memory of the code chunk and the underling R which rmarkdown is running are the same. In python terms, the namespace is the same. This is not the case with the python code chunks. We have to explicitly pass to (and retrieve from) the R namespace \index{namespace} for it to be accessible to the python chunk, running in it's on (python) namespace. As a result we had to redefine the function in the same namespace in which we were calling it.
With R code chunks the R memory space is open throughout but the python chunks name space is closed each time the python chunk closes. This is a feature of how I am using the R environment to write this book, not of python per se.
Further, recalling we are writing scripts, the print statements are required for both to display the result. We could have used the cat() function of course to display the R output as well.
```{r}
cat(x) # x is still in R memory here
```
Note that the result number [1] is not printed with R using cat() or the python print() functions.
There are more subtlies of memory management in calling one language from inside the other. These will come up below but for now knowing how things behave in the R/rMarkdown environment as above will cover our memory space issues.
## Writing Function Examples in R and Python
#### Example 1 A Surveyor's Function
Lets write a function in R and Python to calculate the height of a tower knowing the angle of elevation at a given distance from the base. This would be something we might need when sighting through a surveyors transit.
Given a distance from the base of 1000 meters, how high is a tower whose top was measured at an angle of 25 degrees at that distance.
```{r}
d = 1000 # distance in meters to base
ad = 25 # angle in degrees
# write a function to convert degrees to radians
deg2rad = function(angledeg){
angledeg * (pi/180)
}
# write function to calcular height from angle in degrees
ht = function(d, ad){
h = d*tan(deg2rad(ad))
}
# now call and output function with distance d and angle ad in degrees
cat("Tower Height in meters is: ", ht(d, ad), "\n")
```
So we have written 2 functions, one to convert degrees to radians, that is the expected measure of the angle going into ttigonometric functions of R. Then we wrote a function to accept degrees input and give the height in meters (the input distance units) The function definitions use the keyword 'function, followed by zero or more arguments and then the code block of the function is set out by brace brackets '{ }'. The function is assigned to a name like a variable.
Lets do it with python now,
```{python}
# set variables to problem givens
d = 1000 # distance in meters to base
ad = 25 # angle in degrees
# write a function to convert degrees to radians
def deg2rad(angledeg):
import math
return(angledeg * (math.pi/180))
# write function to calcular height from angle in degrees
def ht(d, ad):
import math
return (d*math.tan(deg2rad(ad)))
# now call and output function with distance d and angle ad in degrees
print("The height in meters of the tower is :", ht(d, ad))
```
In the python declarations the definition starts with the keyword 'def' a name and zero or more arguments and there is a colon ':' at the end of the first line. The code block of the function is also indented rather than set out by brace brakets '{ }'. The first line is the signature of the function.
Both functions return a value, typically the last value in the calculation chain for R but as an explicitly called return for python. Note for the two python functions one has a space after return and the other doesn't. This is because the bracket do not contain return parmeters but group the expression to be returned.
Also as previously alluded to pi in R is a core CONSTANT but in python it is a CONSTANT in the math module which must be imported to use it. We could have just explicitly imported only pi at the start and referred to it directly by name as in R.
```
# Python code
print(pi)
Gives an error:
Traceback (most recent call last):
File "C:\Users\medma\AppData\Local\Temp\RtmpYlYLtl\chunk-code-148c67392482.txt", line 1, in
<module>
print(pi)
NameError: name 'pi' is not defined
```
```{python}
from math import pi
print(pi)
```
Once again you must know where to go to get some constants and functions with python which are built-in in R. Check out the math module \index{math module} in the standard python library^3^. \index{standard python library}
## Using the Core R and Python Standard Libraries
\index{syntax}
Core R distributions currently install with a base set of function packages. The Table in the [Appendix 3](17-Appendix_3_Packages.Rmd) lists these.
The python standard libraries contain alot of modules which can be imported. They are grouped in modules by functionality. The functions contained in the modules can be looked up in the [Official Documentation](https://docs.python.org/3.7/) online or referring to the Hellmann^2^ book.
As done for the R base system in the Appendix, I also list the groups of python modules of the Standard Library in the Appendix 3 by kind as in the Python Documentation^3^, but generally including only occasional examples of the many functions each contain. Again a more exhaustive handling is available as indicated.
It's a good time to dive into the functions of the core libraries and try some more sophisticated R and Python tasks. We'll first compare loading and saving datasets in both languages. R has file functions for saving dataframes as comma separated variables, csv, files. We will then load this data back into R and python to compare these tasks in each.
### R Scripting
The \index{dataframe} \index{airquality} is a built in dataset. Rather than display the whole tales head(n) \index{head(n)} displays the first n lines (default n=6). Similarly tail(n) \index{tail(n)} would display the last n lines. The named dataframe 'table' is created from airquality data. Then we tell R to write a file names airquality.csv with write.csv \index{write.csv}. R automatically closes the file after writing is complete.
```{r}
# loading dataset from base R called
head(airquality)
table = airquality
write.csv(table,file="airquality.csv")
```
### Python Scripting
We need the io module for the file access function. The open() function opens the file "airquality.csv" for reading only ("r") and creates a filehandle, we called file. Now, file is a handle object so here is a first nod to OOP dotted function syntax to read a line from it. Something more functionally intuitive, like `readline(file)` will yield an error. Because calling of R function use dot syntax calls this is not new to us here. Next chapter moves fully to OOP paradigm.
Then we print out what we read - a single line from file (in fact, the first line)
```{python}
# Needs io module functions
import io
#load csv data with Python
file = open("airquality.csv", "r")
aline = file.readline()
print(aline)
aline = file.readline()
print(aline)
file.close
```
When we read a line again, we read the next line. Python readline maintains a pointer it moves as we continue to readlines. The Two lines we printer were the column names as we saved them and a line of comma separated values.
Lets read the airquality.csv file back into R assigning a name "newtable" to the dataframe the read.csv \index{read.csv} command creates to store the data loaded.
```{r}
# Load csv data with R
newtab = read.csv(file="airquality.csv", header=TRUE)
head(newtab)
```
R automatically loads the csv data back into a dataframe. However, the python readline reads the data as a data stream (a stream of characters) which we told it to store in the aline variable. python does not in this case parse the stream int any special structure - that is left to us to do.
We see something else too. R wrote the row numbers to the file (because we didn't say otherwise) and assigned a new set again when we loaded the data again.
### More Function Examples
Basic plotting functions are part of the r-base system we can easily plot a function,
\index{syntax}
```{r}
# Create a function to make a line plot or squares of x between 1 and 8
linePlotMySquares <-function(type = "l", ...) {
x = c(1,2,3,4,5,6,7,8)
y = x**2
plot(x, y, type = type, ...)
}
# Call our function
linePlotMySquares()
```
Python does not have a basic plotting function and the Standard Library has not provided one either. You casn write your own but there is a popular python library from the Scipy \index{Scipy} folks called Matplotlib which once installed from the shell with PIP \index{PIP},
```{}
> pip install matplotlib
OR
> python -m pip install matplotlib
```
is available for import, and it contains a module called pyplot which has a simple plot function
\index{syntax}
```{python}
from matplotlib import pyplot as plt # import pyplot library module from matplotlib
x = list(range(1,11)) # make a sequence of numbers from 1 through 8, called x
y = [0,] # create a list, must have at least one element to concat to
for val in x: # loop through list x to calculate squares
y = y + [val**2,] # concatinate each square to list y as calculated
y = y[1:11] # select the values 1 to 8 from y, call them y
plt.plot(x,y) # plot the x vs y pairs
plt.show() # show the plot
```
There are a few things to discuss here. First is the structure of the matplotlib library. Python code for import are module files (which end in the .py extension like any python scripts or programs).
> A script or program is generally a directly executable piece of code, run by itself. A module is generally a library, imported by other pieces of code.
> Note that there's no internal distinction -- both are executable and importable, although library code often won't do anything (or will just run its unit tests) when executed directly and importing code designed to be a script will cause it to execute, hence the common if __name__ == "__main__" test.^17^
So what we are doing with `from matplotlib import pyplot as plt` is saying: "from the file matploylib \index{matplotlib}(.py) \index{pyplot} import the functions in the pyplot class and call the namespace containing them plt" \index{namespace}so that we can call them.
**Note:** PIP \index{PIP} is a module, but runs independently from the command line because it contains script with a `__main__` function accessed via pip3.exe to setup and start the process.
It also runs from command line as a module executed by python with the -m switch, by
```{}
python -m pip install <some package>
```
the -m switch \index{-m switch} tells python to run the module as a script. Most library modules are not executable as python scripts or if they are, run self-tests called by a main() function.