Scientific Computing is an extra course I teach at the JetBrains and Constructor University (Bremen) Software, Data and Technology bachelor probgram in fall semester of 2024-25 academic year. This repository will contain materials of the course (lecture notes, notebooks), available under MIT license.
The course is aimed at bridging the gap between pure mathematical courses and machine learning. We will discuss various concepts, some of which are already known to the students and some are new, in applied and computational context. We will also learn how to use various programming tools to interact with mathematical objects and visualize them.
The students are expected to know Calculus (of one and many variables), Linear Algebra and be able to program in Python.
One topic is approximately one week (maybe more), 2 lessons, 75 minutes each lesson.
Jupyter notebooks: pro and contra, best practices. Package managers:
pip
and conda
. Virtual environments. Reproducibility issues. Basic
numerical and data science packages: numpy
, scipy
, matplotlib
,
pandas
. Introduction to numpy
: array
vs. list
, why do we need
arrays, idea of vectorized calculations. Plotting of basic graphs with
matplotlib.pyplot
. Interactive widgets, %matplotlib widget
.
Animations with IPython.display
.
Numerical types in Python. int
vs. np.int64
. Floating point
arithmetic. NaN
and Inf
. 0.1 + 0.2 != 0.3
. Numerical precision.
Machine epsilon. Numerical (in)stability. Example: numerical derivative.
Calculation in logarithms, logsumexp
, example: softmax
.
Shape, reshaping data. Advanced indexing, np.where
. Broadcasting
rules. Matrix multiplication. np.linalg
. How to get rid of for
loops
with numpy
magic. Memory cost of vectorization. Processing by batch.
pd.Series
and pd.DataFrame
. Constructing dataframes. Indexing.
Queries. Grouping and aggregating. Concatenating and merging dataframes.
Melting and pivoting.
Introduction to plotting in Python. Example: gradient descent.
How to choose the best visualization method: case study.
Linear systems. Over- and underdetermined systems. Pseudoinverse, pseudosolution, normal pseudosolution. Geometrical interpretation. Matrix factorizations. Application of factorizations to solution of a linear system.
Theory recap: optimization of function of one variable, conditional
optimization, Lagrange multipliers. Numeric optimization of function of
several variables. Gradient descent. Example: gradient descent near a
minimum with ill-conditioned Hessian. Newton's method. Optimization of
matrix-vector functions. Optimization out of the box: scipy.optim
.
Random variables. Discrete and continuous distributions. Cumulative
distribution function. Expected value, variance. Generating (pseudo)
random numbers. Seed. np.random
and scipy.stats
. Probability density
function and histogram. Rejection sampling. Transformation of random
variables. Entropy. KL-divergence.
Independent and non-independent random variables. Joint distribution (continuous + continuous, continuous + discrete, discrete + discrete). Covariance and correlation. Variance-covariance matrix and correlation matrix. Conditional distribution, conditional expectation. Central limit theorem. Monte-Carlo integration. Mutual information.
More on matrix factorization. Symmetric matrices and quadratic forms. Positive definite matrices, their properties. SVD decomposition, low-rank approximations. Example: covariance matrix as a quadratic form.
Random variables and samples. Statistical estimates. Consistency, unbiasedness. Likelihood. MLE.
Statistical hypothesis testing. Type I and type II errors. Student's t-test. Permutation test.
Every week, there will be a homework. Each homework consists of problems of various difficulty. It is indicated how many points the complete and correct solution of each problem earns. Some of the problems are marked as bonus. The final grade (on a scale from 0 to 10) is calculated as a sum of all earned points (including points for bonus problems) multiplied by 10 and divided by the sum of all points of non-bonus problems.
For each homework assignment, there will be specified a due date and a
hard deadline. If the assignment is returned after the due date, but
before the hard deadline, its score is multiplied by