Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ jobs:
- name: Install Python dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install numpy scipy matplotlib pytest pytest-cov future torch gpyopt scikit-learn scikit-learn-extra
python3 -m pip install numpy scipy matplotlib pytest pytest-cov future torch bayesian-optimization scikit-learn packaging
python3 -m pip install scikit-learn-extra || echo "scikit-learn-extra installation failed but it's optional"

- name: Test with pytest
env:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/sphinx-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ jobs:
- name: Create the new documentation
uses: ammaraskar/sphinx-action@master
with:
pre-build-command: "python -m pip install pip sphinx_rtd_theme numpy
scipy matplotlib GPy GPyOpt torch scikit-learn scikit-learn-extra -U"
pre-build-command: "python -m pip install pip sphinx_rtd_theme 'numpy<2.0.0'
scipy matplotlib bayesian-optimization torch scikit-learn scikit-learn-extra -U"
docs-folder: "docs/"

- name: Deploy
Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ install:
- pip install numpy scipy matplotlib pip nose sphinx
- pip install setuptools
- pip install coveralls coverage
- pip install torch GPy GPyOpt
- pip install torch bayesian-optimization
- pip install scikit-learn==0.23.2
- pip install scikit-learn-extra
- python setup.py install
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,13 @@
## Description
**ATHENA** is a Python package for reduction of high dimensional parameter spaces in the context of numerical analysis. It allows the use of several dimensionality reduction techniques such as Active Subspaces (AS), Kernel-based Active Subspaces (KAS), and Nonlinear Level-set Learning (NLL). It is particularly suited for the study of parametric PDEs, for sensitivity analysis, and for the approximation of engineering quantities of interest. It can handle both scalar and vectorial high dimensional functions, making it a useful tool also to reduce the burden of computational intensive optimization tasks.

As of version 0.1.3, ATHENA uses the `bayesian-optimization` package instead of `GPyOpt` for Bayesian stochastic optimization in the feature map tuning process.

See the [**Examples and Tutorials**](#examples-and-tutorials) section below and the [**tutorials folder**](tutorials/README.md) to have an idea of the potential of this package. Check also out the SISSA mathLab [medium publication](https://medium.com/sissa-mathlab) where you can find stories about ATHENA (search within the publication page).


## Dependencies and installation
**ATHENA** requires `numpy`, `matplotlib`, `scipy`, `torch`, `GPyOpt`,
**ATHENA** requires `numpy`, `matplotlib`, `scipy`, `torch`, `bayesian-optimization`,
`scikit-learn`, `scikit-learn-extra`, `sphinx` (for the documentation) and `pytest` (for local test).
The code is compatible with Python 3.8 and above. It can be installed directly
from the source code or via pip.
Expand Down
9 changes: 5 additions & 4 deletions athena/active.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ class ActiveSubspaces(Subspaces):
Hristache, et al.
:param int n_boot: number of bootstrap samples. Default is 100.
"""

def __init__(self, dim, method='exact', n_boot=100):
super().__init__(dim, method, n_boot)

Expand Down Expand Up @@ -313,13 +314,13 @@ def _hit_and_run_inactive(self, reduced_input, n_points):
f, g = b - np.dot(A, z0), np.dot(A, d)

# find an upper bound on the step
min_ind = np.logical_and(g <= 0,
f < -np.sqrt(np.finfo(np.float64).eps))
min_ind = np.logical_and(g <= 0, f
< -np.sqrt(np.finfo(np.float64).eps))
eps_max = np.amin(f[min_ind] / g[min_ind])

# find a lower bound on the step
max_ind = np.logical_and(g > 0,
f < -np.sqrt(np.finfo(np.float64).eps))
max_ind = np.logical_and(g > 0, f
< -np.sqrt(np.finfo(np.float64).eps))
eps_min = np.amax(f[max_ind] / g[max_ind])

# randomly sample eps
Expand Down
177 changes: 177 additions & 0 deletions athena/compatibility.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
"""
Compatibility layer for handling different package versions.
This module provides uniform interfaces for functionality that might
depend on specific versions of packages or alternative implementations.
"""
import numpy as np
import warnings
from packaging import version

# Check if scikit-learn-extra's KMedoids is usable
# with the current NumPy version
SKLEARN_EXTRA_AVAILABLE = False
try:
import sklearn_extra
from sklearn_extra.cluster import KMedoids as SklearnExtraKMedoids
SKLEARN_EXTRA_AVAILABLE = True

# Check if NumPy version is compatible with sklearn_extra
if version.parse(np.__version__) >= version.parse('2.0.0'):
warnings.warn(
"You are using NumPy >= 2.0.0 with scikit-learn-extra which may "
"cause compatibility issues. If you encounter errors, consider "
"using the built-in KMedoids implementation in ATHENA.")
except ImportError:
SklearnExtraKMedoids = None


# Implementation based on scikit-learn's KMeans but adapted for KMedoids
class KMedoids:
"""
K-Medoids clustering.

A custom implementation that doesn't rely on scikit-learn-extra, thus
ensuring compatibility with NumPy 2.0+.

Parameters
----------
n_clusters : int, default=8
The number of clusters to form as well as the number of medoids to generate.

init : {'k-medoids++', 'random'} or array of shape (n_clusters, n_features), default='k-medoids++'
Method for initialization.

max_iter : int, default=300
Maximum number of iterations of the k-medoids algorithm for a single run.

random_state : int, RandomState instance or None, default=None
Determines random number generation for centroid initialization.
"""

def __init__(self,
n_clusters=8,
init='k-medoids++',
max_iter=300,
random_state=None):
self.n_clusters = n_clusters
self.init = init
self.max_iter = max_iter
self.random_state = random_state
self.cluster_centers_ = None
self.labels_ = None
self.inertia_ = None
self.n_iter_ = 0

def _init_medoids(self, X):
"""Initialize the medoids."""
rng = np.random.RandomState(self.random_state)
n_samples = X.shape[0]

if isinstance(self.init, str) and self.init == 'random':
# Random selection
indices = rng.permutation(n_samples)[:self.n_clusters]
self.cluster_centers_ = X[indices].copy()
elif isinstance(self.init, str) and self.init == 'k-medoids++':
# Implementation of k-medoids++ initialization
# Choose the first medoid randomly
indices = np.zeros(self.n_clusters, dtype=int)
indices[0] = rng.randint(n_samples)

# Calculate distances to the first medoid
distances = np.sum((X - X[indices[0]])**2, axis=1)

# Choose remaining medoids
for i in range(1, self.n_clusters):
# Choose point with probability proportional to distance squared
probs = distances / np.sum(distances)
indices[i] = rng.choice(n_samples, p=probs)

# Update distances
new_dist = np.sum((X - X[indices[i]])**2, axis=1)
distances = np.minimum(distances, new_dist)

self.cluster_centers_ = X[indices].copy()
else:
# Use provided initial medoids
self.cluster_centers_ = np.asarray(self.init, dtype=X.dtype)

def fit(self, X):
"""Compute k-medoids clustering."""
X = np.asarray(X)
self._init_medoids(X)

best_labels = None
best_inertia = float('inf')
best_centers = None

for i in range(self.max_iter):
# Assign each point to closest medoid
distances = np.zeros((X.shape[0], self.n_clusters))
for j in range(self.n_clusters):
distances[:, j] = np.sum((X - self.cluster_centers_[j])**2,
axis=1)

labels = np.argmin(distances, axis=1)

# Update medoids
old_centers = self.cluster_centers_.copy()

# For each cluster, update medoid to be the point minimizing inertia
for j in range(self.n_clusters):
cluster_points = X[labels == j]
if len(cluster_points) > 0:
# Compute pairwise distances within cluster
inertias = np.zeros(len(cluster_points))
for k, point in enumerate(cluster_points):
inertias[k] = np.sum(
np.sum((cluster_points - point)**2, axis=1))

# Choose point with minimal inertia as new medoid
min_idx = np.argmin(inertias)
self.cluster_centers_[j] = cluster_points[min_idx].copy()

# Compute inertia
inertia = 0
for j in range(self.n_clusters):
cluster_points = X[labels == j]
if len(cluster_points) > 0:
inertia += np.sum(
np.sum((cluster_points - self.cluster_centers_[j])**2,
axis=1))

# Store best result
if inertia < best_inertia:
best_inertia = inertia
best_labels = labels
best_centers = self.cluster_centers_.copy()

# Check for convergence
center_shift = np.sum(
np.sqrt(np.sum((old_centers - self.cluster_centers_)**2,
axis=1)))
if center_shift < 1e-4:
break

self.labels_ = best_labels
self.cluster_centers_ = best_centers
self.inertia_ = best_inertia
self.n_iter_ = i + 1

return self

def predict(self, X):
"""Predict the closest cluster for each sample in X."""
X = np.asarray(X)
distances = np.zeros((X.shape[0], self.n_clusters))
for j in range(self.n_clusters):
distances[:, j] = np.sum((X - self.cluster_centers_[j])**2, axis=1)

return np.argmin(distances, axis=1)


# Export the appropriate KMedoids implementation
if SKLEARN_EXTRA_AVAILABLE and version.parse(
np.__version__) < version.parse('2.0.0'):
# Use sklearn-extra's implementation when available and NumPy < 2.0
KMedoids = SklearnExtraKMedoids
# Otherwise use our implementation which is compatible with NumPy 2.0+
57 changes: 39 additions & 18 deletions athena/feature_map.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from functools import partial
import numpy as np
from scipy.optimize import brute, dual_annealing
import GPyOpt
from bayes_opt import BayesianOptimization
from .projection_factory import ProjectionFactory


Expand All @@ -39,6 +39,7 @@ class FeatureMap():

:raises TypeError
"""

def __init__(self, distr, bias, input_dim, n_features, params, sigma_f):
if callable(distr):
self.distr = distr
Expand Down Expand Up @@ -114,7 +115,7 @@ def tune_pr_matrix(self,
"""
Tune the parameters of the spectral distribution. Three methods are
available: log-grid-search (brute), annealing (dual_annealing) and
Bayesian stochastic optimization (bso) from GpyOpt. The default object
Bayesian stochastic optimization (bso) from bayesian-optimization package. The default object
function to optimize is athena.utils.average_rrmse, which uses a
cross-validation procedure from athena.utils, see Example and tutorial 06_kernel-based_AS.

Expand Down Expand Up @@ -199,22 +200,42 @@ def tune_pr_matrix(self,
maxiter=maxiter,
no_local_search=False).x
elif method == 'bso':
bounds = [{
'name': f'var_{str(i)}',
'type': 'continuous',
'domain': [bound.start, bound.stop],
} for i, bound in enumerate(bounds)]
func_obj = partial(func, best=best, **fn_args)
bopt = GPyOpt.methods.BayesianOptimization(func_obj,
domain=bounds,
model_type='GP',
acquisition_type='EI',
exact_feval=True)
bopt.run_optimization(max_iter=maxiter,
max_time=3600,
eps=1e-16,
verbosity=False)
self.params = 10**bopt.x_opt
# Reformat bounds for BayesianOptimization package format
# BayesianOptimization uses a dictionary of parameter names and their range tuples
# Unlike GPyOpt which used a list of dictionaries with 'name', 'type', and 'domain' keys
bounds_dict = {
f'var_{i}': (bound.start, bound.stop)
for i, bound in enumerate(bounds)
}

# Create wrapper for the objective function to handle the format difference
# BayesianOptimization passes parameters as keyword arguments, not as an array
def bayes_wrapper(**kwargs):
# Convert from the dict of parameters to array format expected by the original function
x = np.array([kwargs[f'var_{i}'] for i in range(len(bounds))])
# BayesianOptimization maximizes functions by default, but we want to minimize
# So we negate the score (lower scores are better in our original function)
return -func(x, best, **fn_args)

# Initialize optimizer with our wrapper function and parameter bounds
optimizer = BayesianOptimization(
f=bayes_wrapper,
pbounds=bounds_dict,
random_state=42 # For reproducible results
)

# Run optimization
# init_points: how many steps of random exploration to perform
# n_iter: how many steps of bayesian optimization to perform
optimizer.maximize(init_points=2, n_iter=maxiter)

# Extract the best parameters found and transform back
# optimizer.max contains the best score and parameters found
best_params = [
optimizer.max['params'][f'var_{i}'] for i in range(len(bounds))
]
# Apply 10^ transformation as done in the original implementation
self.params = 10**np.array(best_params)
else:
raise ValueError(
"Method argument can only be 'brute' or 'dual_annealing' or 'bso'."
Expand Down
1 change: 1 addition & 0 deletions athena/kas.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class KernelActiveSubspaces(Subspaces):
:cvar numpy.ndarray metric: metric matrix for vectorial active
subspaces.
"""

def __init__(self,
dim,
feature_map=None,
Expand Down
Loading
Loading