[Refactor] Made CrossValTypes, HoldoutValTypes to have split functions directly #108

nabenabe0928 · 2021-02-22T20:13:15Z

While maintaining the changes as small as possible, I made the changes.

autoPyTorch/datasets/resampling_strategy.py

nabenabe0928 · 2021-02-22T20:17:54Z

autoPyTorch/datasets/resampling_strategy.py

+class HoldoutValTypes(Enum):
+    """The type of hold out validation (refer to CrossValTypes' doc-string)"""
+    holdout_validation = partial(HoldoutValFuncs.holdout_validation)
+    stratified_holdout_validation = partial(HoldoutValFuncs.stratified_holdout_validation)


Major change: IntEnum -> Enum and holding functions directly

nabenabe0928 · 2021-02-22T20:19:15Z

autoPyTorch/datasets/resampling_strategy.py

+
+    def __call__(self, val_share: float, indices: np.ndarray, stratify: Optional[Any]
+                 ) -> Tuple[np.ndarray, np.ndarray]:
+        self.value(val_share=val_share, indices=indices, stratify=stratify)


Now we can call the function directly in a way that HoldoutValTypes.holdout_validation().

autoPyTorch/datasets/resampling_strategy.py

nabenabe0928 · 2021-02-22T20:23:57Z

autoPyTorch/datasets/resampling_strategy.py

    def __call__(self,
                 num_splits: int,
                 indices: np.ndarray,
                 stratify: Optional[Any]) -> List[Tuple[np.ndarray, np.ndarray]]:
        ...


-class HoldOutFunc(Protocol):
+class HoldoutValFunc(Protocol):


Since we often use holdout_validator, I unified the name.

…ectly

…funcs

Since the previous codes had the default shuffle = True and the indices shuffle before splitting, the test cases for CV and Holdout did not match. More specifically, when I bring back the followings, I could reproduce the original outputs: 1. Bring back _get_indices in BaseDataset 2. Make the default value of self.shuffle in BaseDataset True 3. Input shuffle = True in KFold instead of using ShuffleSplit These reproduce the original outputs. Note that KFold(shuffle=True) and ShuffleSplit() are not identical and even when we input the same random_state, the results do not reproduce.

ravinkohli · 2021-05-20T13:12:41Z

autoPyTorch/datasets/resampling_strategy.py

-                                indices: np.ndarray,
-                                **kwargs: Any
-                                ) -> List[Tuple[np.ndarray, np.ndarray]]:
+    Additionally, HoldoutValTypes.<function> can be called directly.


can you add an example to use it directly?

ravinkohli · 2021-05-20T13:17:59Z

autoPyTorch/datasets/resampling_strategy.py

+
+
+class CrossValFuncs():
+    # (shuffle, is_stratify) -> split_fn


can we also have documentation similar to HoldoutFuncs here?

ravinkohli

Hey, thanks a lot for this PR. I have left a few suggestions. Also, could you state the reason for making this PR. What issues were there in the previous implementation? How does this PR solve them?

nabenabe0928 commented Feb 22, 2021

View reviewed changes

nabenabe0928 added the refactoring Improvement of readability and abstract codes label Feb 26, 2021

nabenabe0928 force-pushed the refactoring-base-dataset_splitting-functions_major-change branch from 62b326e to c7fd2d5 Compare March 15, 2021 20:54

nabenabe0928 mentioned this pull request Mar 17, 2021

Refactoring base dataset splitting functions #106

Merged

nabenabe0928 force-pushed the refactoring-base-dataset_splitting-functions_major-change branch from c7fd2d5 to a7e8a7f Compare March 18, 2021 23:10

franchuterivera changed the base branch from refactor_development to development May 7, 2021 09:02

nabenabe0928 force-pushed the refactoring-base-dataset_splitting-functions_major-change branch 2 times, most recently from 1e82b21 to d313a48 Compare May 10, 2021 03:30

nabenabe0928 added 12 commits May 19, 2021 14:08

[refactor] Update the split functions to be able to call function dir…

b803770

…ectly

[feat] Deprecate shuffle inside BaseDataset and enable only in split …

4d901f9

…funcs

[fix] Fix flake8 and mypy issues

6c31f61

[fix] Fix mypy issues

b7d3531

[fix] Fix most test cases

910e7d4

[fix] Bring back the data generator shuffle

8c9b895

[fix] Fix the test value caused by putting back the shuffle generator

93e6862

[fix] Fix pytest errors

bef4323

[refactor] Change files so that we can see the difference easier

2d2ebb8

[refactor] Gether kwargs for get splits for CV and Holdout

8d90b85

[fix] Fix a mypy issue

6ef981d

nabenabe0928 force-pushed the refactoring-base-dataset_splitting-functions_major-change branch from af8059b to 6ef981d Compare May 19, 2021 05:13

nabenabe0928 requested review from franchuterivera and ravinkohli May 20, 2021 03:06

ravinkohli reviewed May 20, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Made CrossValTypes, HoldoutValTypes to have split functions directly #108

[Refactor] Made CrossValTypes, HoldoutValTypes to have split functions directly #108

nabenabe0928 commented Feb 22, 2021

nabenabe0928 Feb 22, 2021

nabenabe0928 Feb 22, 2021

nabenabe0928 Feb 22, 2021

ravinkohli May 20, 2021

ravinkohli May 20, 2021

ravinkohli left a comment

[Refactor] Made CrossValTypes, HoldoutValTypes to have split functions directly #108

Are you sure you want to change the base?

[Refactor] Made CrossValTypes, HoldoutValTypes to have split functions directly #108

Conversation

nabenabe0928 commented Feb 22, 2021

nabenabe0928 Feb 22, 2021

Choose a reason for hiding this comment

nabenabe0928 Feb 22, 2021

Choose a reason for hiding this comment

nabenabe0928 Feb 22, 2021

Choose a reason for hiding this comment

ravinkohli May 20, 2021

Choose a reason for hiding this comment

ravinkohli May 20, 2021

Choose a reason for hiding this comment

ravinkohli left a comment

Choose a reason for hiding this comment