Skip to content

Commit

Permalink
Changing how we deal with Regularization (#55)
Browse files Browse the repository at this point in the history
* change Matrix inversions for linear system

* correct docs to new inversion

* updated deprecated examples

* works well, but why

* allow users to invert input regularization

* format

* nicer formatting

* test pass, resolve typo, and format

* remove comment blocks
  • Loading branch information
odunbar authored Apr 13, 2024
1 parent c399a6f commit 08ebbc6
Show file tree
Hide file tree
Showing 13 changed files with 215 additions and 288 deletions.
6 changes: 3 additions & 3 deletions docs/src/parallelism.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@
### Explicit bottlenecks
- By far the highest computational demand for high-dimensional and/or large data systems is the building of the system matrix, particularly the multiplication ``\Phi^T\Phi`` (for scalar) and ``\Phi_{n,i,m}\Phi_{n,j,m}``. These can be accelerated by multithreading (see below)
- For large number of features, the inversion `inv(factorization(system_matrix))` is noticeable, though typically still small when in the regime of `n_features << n_samples * output_dim`.
- For the vector case, the use of non-diagonal ``\Lambda_{i,j}`` for regularization cannot take advantage of certain matrix-tensor identities and may also cause substantial slow-down. (currently)
- For the vector case, the square output-dimensional regularization matrix ``\Lambda`` must be inverted. For high-dimensional spaces, diagonal approximation will avoid this.
- Prediction bottlenecks are largely due to allocations and matrix multiplications. Please see our `predict!()` methods which allow for users to pass in preallocated of arrays. This is very beneficial for repeated predictions.
- For systems without enough regularization, positive definiteness may need to be enforced. If done too often, it has non-negligible cost, as it involves calculating eigenvalues of the non p.d. matrix (particularly `abs(min(eigval)`) that is then added to the diagonal. It is better to add more regularization into ``\Lambda``

### Implicit bottlenecks
- The optimization of hyperparameters is a costly operation that may require construction and evaluation of thousands of `RandomFeatureMethod`s. The dimensionality (i.e. complexity) of this task will depend on how many free parameters are taken to be within a distribution though. ``\mathcal{O}(1000s)`` parameters may take several hours to optimize (on multiple threads).
- The optimization of hyperparameters is a costly operation that may require construction and evaluation of thousands of `RandomFeatureMethod`s. The dimensionality (i.e. complexity) of this task will depend on how many free parameters are taken to be within a distribution though. ``\mathcal{O}(1000s)`` parameters may take even hours to optimize (on multiple threads).

# Parallelism/memory
- We make use of [`Tullio.jl`](https://github.com/mcabbott/Tullio.jl) which comes with in-built memory management. We are phasing out our own batches in favour of using this for now.
- [`Tullio.jl`(https://github.com/mcabbott/Tullio.jl) comes with multithreading routines, Simply call the code with `julia --project -t n_threads` to take advantage of this. Depending on problem size you may wish to use your own external threading, Tullio will greedily steal threads in this case. To prevent this interference we provide a keyword argument:
- [`Tullio.jl`](https://github.com/mcabbott/Tullio.jl) comes with multithreading routines, Simply call the code with `julia --project -t n_threads` to take advantage of this. Depending on problem size you may wish to use your own external threading, Tullio will greedily steal threads in this case. To prevent this interference we provide a keyword argument:
```julia
RandomFeatureMethod(... ; tullio_threading=false) # serial threading during the build and fit! methods
predict(...; tullio_threading=false) # serial threading for prediction
Expand Down
4 changes: 2 additions & 2 deletions docs/src/setting_up_scalar.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,9 @@ The keyword `feature_parameters = Dict("sigma" => a)`, can be included to set th

The `RandomFeatureMethod` sets up the training problem to learn coefficients ``\beta\in\mathbb{R}^m`` from input-output training data ``(x,y)=\{(x_i,y_i)\}_{i=1}^n``, ``y_i \in \mathbb{R}`` and parameters ``\theta = \{\theta_j\}_{j=1}^m``:
```math
(\frac{1}{m}\Phi^T(x;\theta) \Phi(x;\theta) + \lambda I) \beta = \Phi^T(x;\theta)y
(\frac{1}{\lambda m}\Phi^T(x;\theta) \Phi(x;\theta) + I) \beta = \frac{1}{\lambda}\Phi^T(x;\theta)y
```
Where ``\lambda I`` is a regularization.
Where ``\lambda>0`` is a regularization parameter that effects the `+I`. The equation is written in this way to be consistent with the vector-output case.
```julia
rfm = RandomFeatureMethod(
sf;
Expand Down
18 changes: 5 additions & 13 deletions docs/src/setting_up_vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,20 +115,11 @@ The keyword `feature_parameters = Dict("sigma" => a)`, can be included to set th

## Method

The `RandomFeatureMethod` sets up the training problem to learn coefficients ``\beta\in\mathbb{R}^m`` from input-output training data ``(x,y)=\{(x_i,y_i)\}_{i=1}^n``, ``y_i \in \mathbb{R}^p`` and parameters ``\theta = \{\theta_j\}_{j=1}^m``. Regularization is provided through ``\Lambda = \lambda \otimes I_{n\times n}`` from a user-provided `p-by-p` positive-definite regularization matrix ``\lambda``. In Einstein summation notation the method solves the following system
The `RandomFeatureMethod` sets up the training problem to learn coefficients ``\beta\in\mathbb{R}^m`` from input-output training data ``(x,y)=\{(x_i,y_i)\}_{i=1}^n``, ``y_i \in \mathbb{R}^p`` and parameters ``\theta = \{\theta_j\}_{j=1}^m``. Regularization is provided through ``\Lambda``, a user-provided `p-by-p` positive-definite regularization matrix (or scaled identity). In Einstein summation notation the method solves the following system
```math
(\frac{1}{m}\Phi_{n,i,p}(x;\theta) \Phi_{n,j,p}(x;\theta) + R_{i,j}) \beta_j = \Phi(x;\theta)_{n,i,p}y_{n,p}
(\frac{1}{m}\Phi_{n,i,p}(x;\theta) \, [I_{n,m} \otimes \Lambda^{-1}_{p,q}]\, \Phi_{n,j,q}(x;\theta) + I_{i,j}) \beta_j = \Phi(x;\theta)_{n,i,p}\,[ I_{n,m} \otimes \Lambda^{-1}_{p,q}]\, y_{n,p}
```
and where the regularization ``R`` is built from ``\Lambda`` by solving the non-square system
```math
\Phi_{m,j,q} R_{i,j} = \Phi_{n,i,p} \Lambda_{p,q,n,m}
```
In the case the ``\lambda`` is provided as a `Real` or `UniformScaling` then ``R`` is (consistently) replaced with ``\lambda I_{m\times m}``. The authors typically recommend replacing non-diagonal ``\lambda`` with ``\mathrm{det}(\lambda)^{\frac{1}{p}}I``, which often provides a reasonable approximation, as there is additional computational expense through solving this un-batched system of equations.

!!! note "The nonsquare system"
If one chooses to solve for ``R``, note that it is also only defined for dimensions ``m < np``. For stability we also perform the following: we use a truncated SVD for when the rank of the system is less than ``m``, and we also symmetrize the resulting matrix.


So long as ``\Lambda`` is easily invertible then this system is efficiently solved.

```julia
rfm = RandomFeatureMethod(
Expand All @@ -141,7 +132,8 @@ One can select batch sizes to balance the space-time (memory-process) trade-off.

!!! warning "Conditioning"
The problem is ill-conditioned without regularization.
If you encounters a Singular or Positive-definite exceptions, try increasing the constant scaling `regularization`
If you encounters a Singular or Positive-definite exceptions or warnings, try increasing the constant scaling `regularization`.




Expand Down
11 changes: 6 additions & 5 deletions examples/Learn_hyperparameters/1d_to_1d_regression_direct.jl
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ function calculate_mean_and_coeffs(
batch_inputs = batch_generator(itest, test_batch_size, dims = 2) # input_dim x batch_size

#we want to calc lambda/m * coeffs^2 in the end
pred_mean = predictive_mean(rfm, fitted_features, DataContainer(itest))
pred_mean, features = predictive_mean(rfm, fitted_features, DataContainer(itest))
scaled_coeffs = sqrt(1 / n_features) * get_coeffs(fitted_features)
return pred_mean, scaled_coeffs

Expand Down Expand Up @@ -152,13 +152,13 @@ end
## Begin Script, define problem setting

println("starting script")
date_of_run = Date(2022, 9, 14)
date_of_run = Date(2024, 4, 10)


# Target function
ftest(x::AbstractVecOrMat) = exp.(-0.5 * x .^ 2) .* (x .^ 4 - x .^ 3 - x .^ 2 + x .- 1)

n_data = 20 * 4
n_data = 20 * 2
noise_sd = 0.1

x = rand(rng, Uniform(-3, 3), n_data)
Expand Down Expand Up @@ -285,7 +285,7 @@ for (idx, sd, feature_type) in zip(collect(1:length(σ_c)), σ_c, feature_types)
end

push!(rfms, RandomFeatureMethod(sf, batch_sizes = batch_sizes, regularization = regularizer))
push!(fits, fit(rfms[end], io_pairs_test, decomposition_type = "qr"))
push!(fits, fit(rfms[end], io_pairs_test))
end

if PLOT_FLAG
Expand All @@ -311,7 +311,8 @@ if PLOT_FLAG
for (idx, rfm, fit, feature_type, clr) in zip(collect(1:length(σ_c)), rfms, fits, feature_types, clrs)

pred_mean, pred_cov = predict(rfm, fit, DataContainer(xplt))
pred_cov = max.(pred_cov, 0.0)
pred_cov = pred_cov[1, 1, :] #just variances
pred_cov = max.(pred_cov, 0.0) #not normally needed..
plot!(
xplt',
pred_mean',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ end
## Begin Script, define problem setting

println("starting script")
date_of_run = Date(2022, 9, 14)
date_of_run = Date(2024, 4, 10)


# Target function
Expand Down Expand Up @@ -254,7 +254,7 @@ for (idx, type) in enumerate(feature_types)
string(i) *
", Error: " *
string(err[i]) *
", with parameter mean" *
", with parameter mean " *
string(mean(transform_unconstrained_to_constrained(priors, get_u_final(ekiobj[1])), dims = 2)[:, 1]),
" and sd ",
string(sqrt.(var(transform_unconstrained_to_constrained(priors, get_u_final(ekiobj[1])), dims = 2))[:, 1]),
Expand Down Expand Up @@ -298,7 +298,7 @@ for (idx, sd, feature_type) in zip(collect(1:length(σ_c)), σ_c, feature_types)
end

push!(rfms, RandomFeatureMethod(sf, batch_sizes = batch_sizes, regularization = regularizer))
push!(fits, fit(rfms[end], io_pairs_test, decomposition_type = "qr"))
push!(fits, fit(rfms[end], io_pairs_test))
end

if PLOT_FLAG
Expand Down
11 changes: 6 additions & 5 deletions examples/Learn_hyperparameters/nd_to_1d_regression_direct.jl
Original file line number Diff line number Diff line change
Expand Up @@ -93,13 +93,13 @@ function calculate_mean_and_coeffs(

# build and fit the RF
rfm = RFM_from_hyperparameters(rng, l, regularizer, n_features, batch_sizes, input_dim)
fitted_features = fit(rfm, io_train_cost, decomposition_type = "qr")
fitted_features = fit(rfm, io_train_cost)

test_batch_size = get_batch_size(rfm, "test")
batch_inputs = batch_generator(itest, test_batch_size, dims = 2) # input_dim x batch_size

#we want to calc 1/var(y-mean)^2 + lambda/m * coeffs^2 in the end
pred_mean = predictive_mean(rfm, fitted_features, DataContainer(itest))
pred_mean, features = predictive_mean(rfm, fitted_features, DataContainer(itest))
scaled_coeffs = sqrt(1 / n_features) * get_coeffs(fitted_features)
return pred_mean, scaled_coeffs

Expand Down Expand Up @@ -169,7 +169,7 @@ end

## Begin Script, define problem setting
println("Begin script")
date_of_run = Date(2022, 9, 14)
date_of_run = Date(2024, 4, 10)
input_dim = 6
println("Number of input dimensions: ", input_dim)

Expand Down Expand Up @@ -344,7 +344,7 @@ sff = ScalarFourierFeature(n_features_test, feature_sampler)
#second case with batching

rfm_batch = RandomFeatureMethod(sff, batch_sizes = batch_sizes, regularization = regularizer)
fitted_batched_features = fit(rfm_batch, io_pairs_test, decomposition_type = "qr")
fitted_batched_features = fit(rfm_batch, io_pairs_test)

if PLOT_FLAG
#plot slice through one dimensions, others fixed to 0
Expand All @@ -362,7 +362,8 @@ if PLOT_FLAG
yslice = ftest_nd_to_1d(xslicenew)

pred_mean_slice, pred_cov_slice = predict(rfm_batch, fitted_batched_features, DataContainer(xslicenew))
pred_cov_slice = max.(pred_cov_slice, 0.0)
pred_cov_slice = max.(pred_cov_slice[1, 1, :], 0.0)

plt = plot(
xrange,
yslice',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ function calculate_mean_cov_and_coeffs(

# build and fit the RF
rfm = RFM_from_hyperparameters(rng, l, regularizer, n_features, batch_sizes, input_dim)
fitted_features = fit(rfm, io_train_cost, decomposition_type = "svd")
fitted_features = fit(rfm, io_train_cost)

test_batch_size = get_batch_size(rfm, "test")
batch_inputs = batch_generator(itest, test_batch_size, dims = 2) # input_dim x batch_size
Expand Down Expand Up @@ -174,7 +174,7 @@ end

## Begin Script, define problem setting
println("Begin script")
date_of_run = Date(2022, 9, 22)
date_of_run = Date(2024, 4, 10)
input_dim_list = [8]

for input_dim in input_dim_list
Expand Down Expand Up @@ -368,7 +368,7 @@ for input_dim in input_dim_list
#second case with batching

rfm_batch = RandomFeatureMethod(sff, batch_sizes = batch_sizes, regularization = regularizer)
fitted_batched_features = fit(rfm_batch, io_pairs_test, decomposition_type = "svd")
fitted_batched_features = fit(rfm_batch, io_pairs_test)

if PLOT_FLAG
#plot slice through one dimensions, others fixed to 0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ function calculate_mean_cov_and_coeffs(

# build and fit the RF
rfm = RFM_from_hyperparameters(rng, l, regularizer, n_features, batch_sizes, input_dim)
fitted_features = fit(rfm, io_train_cost, decomposition_type = "qr")
fitted_features = fit(rfm, io_train_cost)

test_batch_size = get_batch_size(rfm, "test")
batch_inputs = batch_generator(itest, test_batch_size, dims = 2) # input_dim x batch_size
Expand Down Expand Up @@ -173,7 +173,7 @@ end

## Begin Script, define problem setting
println("Begin script")
date_of_run = Date(2022, 9, 14)
date_of_run = Date(2024, 4, 10)
input_dim = 8
println("Number of input dimensions: ", input_dim)

Expand Down Expand Up @@ -352,7 +352,7 @@ sff = ScalarFourierFeature(n_features_test, feature_sampler)
#second case with batching

rfm_batch = RandomFeatureMethod(sff, batch_sizes = batch_sizes, regularization = regularizer)
fitted_batched_features = fit(rfm_batch, io_pairs_test, decomposition_type = "svd")
fitted_batched_features = fit(rfm_batch, io_pairs_test)

if PLOT_FLAG
#plot slice through one dimensions, others fixed to 0
Expand All @@ -368,9 +368,8 @@ if PLOT_FLAG
xslicenew[direction, :] = xrange

yslice = ftest_nd_to_1d(xslicenew)

pred_mean_slice, pred_cov_slice = predict(rfm_batch, fitted_batched_features, DataContainer(xslicenew))
pred_cov_slice[1, 1, :] = max.(pred_cov_slice[1, 1, :], 0.0)
pred_cov_slice = max.(pred_cov_slice[1, 1, :], 0.0)
plt = plot(
xrange,
yslice',
Expand All @@ -384,7 +383,7 @@ if PLOT_FLAG
plot!(
xrange,
pred_mean_slice',
ribbon = [2 * sqrt.(pred_cov_slice[1, 1, :]); 2 * sqrt.(pred_cov_slice[1, 1, :])]',
ribbon = [2 * sqrt.(pred_cov_slice); 2 * sqrt.(pred_cov_slice)]',
label = "Fourier",
color = "blue",
)
Expand Down
Loading

0 comments on commit 08ebbc6

Please sign in to comment.