You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`apply()` is 4x faster than the two `for` approaches, as it avoids the Python `for` loop.
@@ -438,11 +438,56 @@ pandas_apply: 390.49ms
438
438
439
439
However, rows don't exist in memory as arrays (columns do!), so `apply()` does not take advantage of NumPy's vectorisation. You may be able to go a step further and avoid explicitly operating on rows entirely by passing only the required columns to NumPy.
440
440
441
+
::::::::::::::::::::::::::::::::::::: challenge
442
+
443
+
We can extract the individual columns of the data frame. These are of the type `pandas.Series`, which supports array broadcasting, just like a NumPy array.
444
+
Instead of using the `pythagoras(row)` function, can you write a vectorised version of this calculation?
It won't always be possible to take full advantage of vectorisation, for example you may have conditional logic.
456
506
457
507
An alternate approach is converting your DataFrame to a Python dictionary using `to_dict(orient='index')`. This creates a nested dictionary, where each row of the outer dictionary is an internal dictionary. This can then be processed via list-comprehension:
0 commit comments