korem-lab · alex-1001 · Feb 3, 2026 · Feb 10, 2026
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 This is a python package designed to facilitate correcting for distributional bias during cross valiation.  It was recently shown that removing a fraction of a dataset into a testing fold can artificially create a shift in label averages across training folds that is inversely correlated with that of their corresponding test folds. We have demonstrated that most machine learning models' results suffer from this bias, which this package resolves by subsampling points from within the trianing set to remove any differences in label average across training folds. to begin using RebalancedCV, we recommend reading it's [documentation pages](https://korem-lab.github.io/RebalancedCV/).
 
 
-All classes from this package provide train/test indices to split data in train/test sets while rebalancing the training set to account for distributional bias. This package is designed to enable automated rebalancing for the cross-valition implementations in formats similar to scikit-learn's `LeaveOneOut`, `StratifiedKFold`, and `LeavePOut`, through the `RebalancedCV` classes `RebalancedLeaveOneOut`, `RebalancedLeaveOneOutRegression`, `RebalancedKFold`, and `RebalancedLeavePOut`. These Rebalanced classes are designed to work in the exact same code structure and implementation use cases as their scikit-learn equivalents, with the only difference being a subsampling within the provided training indices.
+All classes from this package provide train/test indices to split data in train/test sets while rebalancing the training set to account for distributional bias. This package is designed to enable automated rebalancing for the cross-valition implementations in formats similar to scikit-learn's `LeaveOneOut`, `StratifiedKFold`, `LeavePOut`, and `LeaveOneGroupOut`, through the `RebalancedCV` classes `RebalancedLeaveOneOut`, `RebalancedLeaveOneOutRegression`, `RebalancedKFold`, `RebalancedLeavePOut`, and `RebalancedLeaveOneGroupOut`. These Rebalanced classes are designed to work in the exact same code structure and implementation use cases as their scikit-learn equivalents, with the only difference being a subsampling within the provided training indices.
 
 For any support using RebalancedCV, please use our <a href="https://github.com/korem-lab/RebalancedCV/issues">issues page</a> or email: gia2105@columbia.edu.
 
@@ -114,7 +114,17 @@ Provides train/test indices to split data in train/test sets with rebalancing to
 ##### **Parameters**
      p : int
         Size of the test sets. Must be strictly less than one half of the number of samples.
-
+
+### RebalancedLeaveOneGroupOut
+
+Provides train/test indices to split data in train/test sets with rebalancing when splitting by **groups**. Each fold holds out one group as the test set and uses the rest for training; the training set is then subsampled so that every fold has the same number of samples per class (avoiding distributional bias). The test set is never subsampled (full left-out group). The `groups` parameter is **required** (same as sklearn's LeaveOneGroupOut). At least two groups are needed.
+
+**When to use rebalancing:** Use **RebalancedLeaveOneGroupOut** when you want comparable train conditions across folds (e.g when reporting an average over groups, when comparing per-group performance in an "even" manner), or when groups merely a blocking factor and you care about unbiased overall or class-wise metrics. **When not to:** Use plain **LeaveOneGroupOut** when you only care about performance on each left-out group and are not aggregating in a way that is sensitive to train-fold balance, or when you prefer realistic train composition per fold. If groups already have similar class distributions, rebalancing is optional but doesn't hurt.
+
+See sklearn.model_selection.LeaveOneGroupOut for Leave-one-group-out cross-validation.
+
+##### **Parameters**
+No parameters are used for this class. `groups` must be passed to `split(X, y, groups)` and `get_n_splits(groups=groups)`.
 
 ### RebalancedLeaveOneOutRegression
 
@@ -131,7 +141,7 @@ All three of this package's classes use the `split` method, which all use the fo
 `y` : array-like of shape (n_samples,); The target variable for supervised learning problems.  At least two observations per class are needed for RebalancedLeaveOneOut
 
 `groups` : array-like of shape (n_samples,), default=None; Group labels for the samples used while splitting the dataset into
-    train/test set.
+    train/test set. Required for RebalancedLeaveOneGroupOut; optional (and ignored) for other classes.
 
 `seed` : Integer, default=None; can be specified to enforce consistency in the subsampling
 

diff --git a/rebalancedcv/__init__.py b/rebalancedcv/__init__.py
@@ -1,3 +1,9 @@
 __version__ = "0.0.1"
-from .classification import RebalancedLeaveOneOut, RebalancedKFold, RebalancedLeavePOut, MulticlassRebalancedLeaveOneOut
+from .classification import (
+    RebalancedLeaveOneOut,
+    RebalancedKFold,
+    RebalancedLeavePOut,
+    MulticlassRebalancedLeaveOneOut,
+    RebalancedLeaveOneGroupOut,
+)
 from .regression import RebalancedLeaveOneOutRegression
diff --git a/rebalancedcv/classification.py b/rebalancedcv/classification.py
@@ -2,6 +2,7 @@
 from sklearn.model_selection import BaseCrossValidator
 from sklearn.utils.validation import _num_samples, check_array, column_or_1d
 import numpy as np
+import warnings
 
 import numbers
 from sklearn.utils.validation import _deprecate_positional_args
@@ -656,20 +657,211 @@ def get_n_splits(self, X, y, groups=None):
         if X is None:
             raise ValueError("The 'X' parameter should not be None.")
         return _num_samples(X)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
+
+class RebalancedLeaveOneGroupOut(BaseCrossValidator):
+    """Rebalanced Leave-One-Group-Out cross-validator.
+
+    Provides train/test indices to split data such that each training set is
+    comprised of all samples except ones belonging to one specific group,
+    with subsampling so that every training fold has the same number of
+    samples per class (avoiding distributional bias). Rebalancing is
+    applied only to the training set; the test set is always the full
+    left-out group. Arbitrary domain-specific group information is provided
+    as an array of integers that encodes the group of each sample. For
+    instance the groups could be the year of collection of the samples and
+    thus allow for cross-validation against time-based splits.
+
+    The ``groups`` parameter is required (same as sklearn's
+    ``LeaveOneGroupOut``). At least two groups are required. For
+    rebalancing to be non-degenerate, every class should appear in at least
+    two groups; if a class has no samples in a training fold, it is omitted
+    from that fold's training set and a warning is issued.
+
+    Notes
+    -----
+    Splits are ordered according to the index of the group left out. The
+    first split has testing set consisting of the group whose index in
+    ``groups`` is lowest, and so on.
+
+    Use this class when you want leave-one-group-out *and* need to remove
+    training-fold label imbalance (e.g. comparing models or tuning
+    hyperparameters). Use plain ``LeaveOneGroupOut`` when you only care
+    about generalization to a new group.
+
+    See Also
+    --------
+    sklearn.model_selection.LeaveOneGroupOut : Leave-one-group-out without
+        training rebalancing.
+    sklearn.model_selection.GroupKFold : K-fold variant with
+        non-overlapping groups.
+
+    Examples
+    --------
+    >>> import numpy as np
+    >>> from rebalancedcv import RebalancedLeaveOneGroupOut
+    >>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
+    >>> y = np.array([0, 0, 1, 1, 0, 1])
+    >>> groups = np.array([1, 1, 1, 2, 2, 2])
+    >>> rlogo = RebalancedLeaveOneGroupOut()
+    >>> rlogo.get_n_splits(groups=groups)
+    2
+    >>> print(rlogo)
+    RebalancedLeaveOneGroupOut()
+    >>> for i, (train_index, test_index) in enumerate(rlogo.split(X, y, groups, seed=42)):
+    ...     print(f"Fold {i}:")
+    ...     print(f"  Train: index={train_index}")
+    ...     print(f"  Test:  index={test_index}")
+    Fold 0:
+      Train: index=[4 5]
+      Test:  index=[0 1 2]
+    Fold 1:
+      Train: index=[0 2]
+      Test:  index=[3 4 5]
+    """
+
+    def _iter_test_masks(self, X, y, groups):
+        if groups is None:
+            raise ValueError("The 'groups' parameter should not be None.")
+        # We make a copy of groups to avoid side-effects during iteration
+        groups = check_array(
+            groups, input_name="groups", copy=True, ensure_2d=False, dtype=None
+        )
+        unique_groups = np.unique(groups)
+        if len(unique_groups) <= 1:
+            raise ValueError(
+                "The groups parameter contains fewer than 2 unique groups "
+                "(%s). RebalancedLeaveOneGroupOut expects at least 2."
+                % unique_groups
+            )
+        for i in unique_groups:
+            yield groups == i
+
+    def split(self, X, y, groups=None, seed=None):
+        """Generate indices to split data into training and test set.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Training data, where `n_samples` is the number of samples and
+            `n_features` is the number of features.
+
+        y : array-like of shape (n_samples,)
+            The target variable for supervised learning problems.
+
+        groups : array-like of shape (n_samples,)
+            Group labels for the samples used while splitting the dataset
+            into train/test set. Must be specified.
+
+        seed : int or None, default=None
+            Random seed for subsampling reproducibility.
+
+        Yields
+        ------
+        train : ndarray
+            The training set indices for that split (subsampled for
+            consistent class balance).
+
+        test : ndarray
+            The testing set indices for that split (full left-out group).
+        """
+        if groups is None:
+            raise ValueError("The 'groups' parameter should not be None.")
+        if seed is not None:
+            np.random.seed(seed)
+
+        X, y, groups = indexable(X, y, groups)
+        n_samples = _num_samples(X)
+        groups = np.asarray(groups)
+        y = np.asarray(y)
+        type_of_target_y = type_of_target(y)
+        if type_of_target_y not in ("binary", "multiclass"):
+            raise ValueError(
+                "Supported target types are: binary, multiclass. Got {!r}."
+                .format(type_of_target_y)
+            )
+        y = column_or_1d(y)
+
+        # Encode labels as 0, 1, ... for bincount (works for any dtype, binary or multiclass)
+        unique_labels, y_encoded = np.unique(y, return_inverse=True)
+        n_classes = len(unique_labels)
+        total_count = np.bincount(y_encoded, minlength=n_classes) # (n_classes,) note that y_encoded are indices, not actual labels
+
+        unique_groups = np.unique(groups)
+        # Per-group class counts: (n_groups, n_classes)
+        group_class_count = np.zeros((len(unique_groups), n_classes), dtype=int)
+        for ig, g in enumerate(unique_groups):
+            mask = groups == g
+            group_class_count[ig] = np.bincount(y_encoded[mask], minlength=n_classes)
+
+        # Minimum training samples per class across all folds (same in every fold after subsample)
+        # For fold leaving out group g: train count for class k = total_count[k] - group_class_count[g,k]
+        min_train_count = total_count - group_class_count.max(axis=0) # (n_classes,)
+
+        # Warn if any class has no samples in the training set for at least one fold (all in one group)
+        omitted = np.where(min_train_count <= 0)[0]
+        if len(omitted) > 0:
+            omitted_labels = [unique_labels[k] for k in omitted]
+            warnings.warn(
+                "The following classes have no samples in the training set for at least one fold "
+                "(all samples belong to a single group) and are omitted from the rebalanced "
+                "training set: {}. Consider checking group/label alignment.".format(omitted_labels),
+                UserWarning,
+                stacklevel=2,
+            )
+
+        indices = np.arange(n_samples)
+        for test_mask in self._iter_test_masks(X, y, groups):
+            train_mask = ~test_mask
+            train_index = indices[train_mask]
+            test_index = indices[test_mask]
+
+            # Subsample training set so each class has min_train_count[k] samples
+            train_parts = []
+            for k in range(n_classes):
+                n_k = int(min_train_count[k])
+                if n_k <= 0: # if the class has no samples in the training set, skip it
+                    continue
+                train_k = train_index[y_encoded[train_index] == k] # indices of the samples of class k in the training set
+                if len(train_k) < n_k:
+                    class_label = unique_labels[k]
+                    raise ValueError(
+                        "Fold has {} samples of class '{}' in train but need {} (rebalancing impossible)."
+                        .format(len(train_k), class_label, n_k)
+                    )
+                train_parts.append(
+                    np.random.choice(train_k, size=n_k, replace=False)
+                )
+            if train_parts:
+                train_index = np.sort(np.concatenate(train_parts))
+            else:
+                train_index = np.array([], dtype=int)
+
+            yield train_index, test_index
+
+    def get_n_splits(self, X=None, y=None, groups=None):
+        """Returns the number of splitting iterations in the cross-validator.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features), default=None
+            Always ignored, exists for API compatibility.
+
+        y : array-like of shape (n_samples,), default=None
+            Always ignored, exists for API compatibility.
+
+        groups : array-like of shape (n_samples,), default=None
+            Group labels for the samples used while splitting the dataset
+            into train/test set. This 'groups' parameter must always be
+            specified to calculate the number of splits, though the other
+            parameters can be omitted.
+
+        Returns
+        -------
+        n_splits : int
+            Returns the number of splitting iterations in the cross-validator.
+        """
+        if groups is None:
+            raise ValueError("The 'groups' parameter should not be None.")
+        groups = check_array(groups, input_name="groups", ensure_2d=False, dtype=None)
+        return len(np.unique(groups))
diff --git a/rebalancedcv/tests/test_rebalancedcv.py b/rebalancedcv/tests/test_rebalancedcv.py
@@ -6,7 +6,7 @@
 from sklearn.model_selection import LeaveOneOut
 from rebalancedcv import RebalancedLeaveOneOut, RebalancedKFold, \
                         RebalancedLeavePOut, RebalancedLeaveOneOutRegression, \
-                            MulticlassRebalancedLeaveOneOut
+                            MulticlassRebalancedLeaveOneOut, RebalancedLeaveOneGroupOut
 
 from sklearn.metrics import roc_auc_score
 
@@ -121,7 +121,50 @@ def run_regression_cv(self,
     def test_all_classification_cvs(self):
         for cv in [RebalancedLeaveOneOut, RebalancedKFold, RebalancedLeavePOut, MulticlassRebalancedLeaveOneOut]:
             self.run_classification_cv(cv)
-
+
+    def test_rebalanced_leave_one_group_out(self):
+        rlogo = RebalancedLeaveOneGroupOut()
+
+        ## --- API: groups required ---
+        with self.assertRaises(ValueError):
+            rlogo.get_n_splits(groups=None)
+        with self.assertRaises(ValueError):
+            list(rlogo.split(np.random.rand(6, 2), np.array([0, 0, 1, 1, 0, 1]), groups=None))
+
+        ## --- Binary: 6 samples, 2 groups of 3 ---
+        np.random.seed(1)
+        n_samples, n_features = 6, 2
+        X = np.random.rand(n_samples, n_features)
+        y = np.array([0, 0, 1, 1, 0, 1])
+        groups = np.array([1, 1, 1, 2, 2, 2])
+        self.assertEqual(rlogo.get_n_splits(groups=groups), 2)
+        train_means = []
+        for train_index, test_index in rlogo.split(X, y, groups, seed=1):
+            self.assertEqual(len(np.unique(groups[test_index])), 1,
+                             "each test set should be exactly one group")
+            train_means.append(y[train_index].mean())
+        self.assertTrue(np.max(train_means) == np.min(train_means),
+                        "train class balance should be identical across folds (binary)")
+
+        ## --- Multi-class: 12 samples, 3 classes, 3 groups ---
+        np.random.seed(2)
+        X = np.random.rand(12, 3)
+        y = np.array([0, 0, 1, 1, 2, 2, 0, 1, 2, 0, 1, 2])   # 4 per class
+        groups = np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
+        self.assertEqual(rlogo.get_n_splits(groups=groups), 3)
+        train_class_counts = []
+        for train_index, test_index in rlogo.split(X, y, groups, seed=2):
+            self.assertEqual(len(np.unique(groups[test_index])), 1,
+                             "each test set should be exactly one group")
+            ## rebalancing: same number of each class in train every fold
+            counts = np.bincount(y[train_index], minlength=3)
+            train_class_counts.append(tuple(counts))
+        self.assertEqual(len(set(train_class_counts)), 1,
+                         "train class counts should be identical across folds (multi-class)")
+        ## sanity: each fold had some of every class in train (we have 3 groups, 3 classes spread across)
+        for counts in train_class_counts:
+            self.assertTrue(all(c >= 0 for c in counts))
+
     def test_all_regression_cvs(self):
         for cv in [RebalancedLeaveOneOutRegression,
                    ]: