Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] Separate all_nan_columns detection #401

Open
wants to merge 27 commits into
base: development
Choose a base branch
from

Commits on Dec 21, 2021

  1. [feat] Support statistics print by adding results manager object (aut…

    …oml#334)
    
    * [feat] Support statistics print by adding results manager object
    
    * [refactor] Make SearchResults extract run_history at __init__
    
    Since the search results should not be kept in eternally,
    I made this class to take run_history in __init__ so that
    we can implicitly call extraction inside.
    From this change, the call of extraction from outside is not recommended.
    However, you can still call it from outside and to prevent mixup of
    the environment, self.clear() will be called.
    
    * [fix] Separate those changes into PR#336
    
    * [fix] Fix so that test_loss includes all the metrics
    
    * [enhance] Strengthen the test for sprint and SearchResults
    
    * [fix] Fix an issue in documentation
    
    * [enhance] Increase the coverage
    
    * [refactor] Separate the test for results_manager to organize the structure
    
    * [test] Add the test for get_incumbent_Result
    
    * [test] Remove the previous test_get_incumbent and see the coverage
    
    * [fix] [test] Fix reversion of metric and strengthen the test cases
    
    * [fix] Fix flake8 issues and increase coverage
    
    * [fix] Address Ravin's comments
    
    * [enhance] Increase the coverage
    
    * [fix] Fix a flake8 issu
    nabenabe0928 authored and ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    d498677 View commit details
    Browse the repository at this point in the history
  2. [doc] Add the workflow of the Auto-Pytorch (automl#285)

    * [doc] Add workflow of the AutoPytorch
    
    * [doc] Address Ravin's comment
    nabenabe0928 authored and ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    4d28006 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    54ee98e View commit details
    Browse the repository at this point in the history
  4. [feat] Add an object that realizes the perf over time viz (automl#331)

    * [feat] Add an object that realizes the perf over time viz
    
    * [fix] Modify TODOs and add comments to avoid complications
    
    * [refactor] [feat] Format visualizer API and integrate this feature into BaseTask
    
    * [refactor] Separate a shared raise error process as a function
    
    * [refactor] Gather params in Dataclass to look smarter
    
    * [refactor] Merge extraction from history to the result manager
    
    Since this feature was added in a previous PR, we now rely on this
    feature to extract the history.
    To handle the order by the start time issue, I added the sort by endtime
    feature.
    
    * [feat] Merge the viz in the latest version
    
    * [fix] Fix nan --> worst val so that we can always handle by number
    
    * [fix] Fix mypy issues
    
    * [test] Add test for get_start_time
    
    * [test] Add test for order by end time
    
    * [test] Add tests for ensemble results
    
    * [test] Add tests for merging ensemble results and run history
    
    * [test] Add the tests in the case of ensemble_results is None
    
    * [fix] Alternate datetime to timestamp in tests to pass universally
    
    Since the mapping of timestamp to datetime variates on machine,
    the tests failed in the previous version.
    In this version, we changed the datetime in the tests to the fixed
    timestamp so that the tests will pass universally.
    
    * [fix] Fix status_msg --> status_type because it does not need to be str
    
    * [fix] Change the name for the homogeniety
    
    * [fix] Fix based on the file name change
    
    * [test] Add tests for set_plot_args
    
    * [test] Add tests for plot_perf_over_time in BaseTask
    
    * [refactor] Replace redundant lines by pytest parametrization
    
    * [test] Add tests for _get_perf_and_time
    
    * [fix] Remove viz attribute based on Ravin's comment
    
    * [fix] Fix doc-string based on Ravin's comments
    
    * [refactor] Hide color label settings extraction in dataclass
    
    Since this process makes the method in BaseTask redundant and this was
    pointed out by Ravin, I made this process a method of dataclass so that
    we can easily fetch this information.
    Note that since the color and label information always depend on the
    optimization results, we always need to pass metric results to ensure
    we only get related keys.
    
    * [test] Add tests for color label dicts extraction
    
    * [test] Add tests for checking if plt.show is called or not
    
    * [refactor] Address Ravin's comments and add TODO for the refactoring
    
    * [refactor] Change KeyError in EnsembleResults to empty
    
    Since it is not convenient to not be able to instantiate EnsembleResults
    in the case when we do not have any histories,
    I changed the functionality so that we can still instantiate even when
    the results are empty.
    In this case, we have empty arrays and it also matches the developers
    intuition.
    
    * [refactor] Prohibit external updates to make objects more robust
    
    * [fix] Remove a member variable _opt_scores since it is confusing
    
    Since opt_scores are taken from cost in run_history and metric_dict
    takes from additional_info, it was confusing for me where I should
    refer to what. By removing this, we can always refer to additional_info
    when fetching information and metrics are always available as a raw
    value. Although I changed a lot, the functionality did not change and
    it is easier to add any other functionalities now.
    
    * [example] Add an example how to plot performance over time
    
    * [fix] Fix unexpected train loss when using cross validation
    
    * [fix] Remove __main__ from example based on the Ravin's comment
    
    * [fix] Move results_xxx to utils from API
    
    * [enhance] Change example for the plot over time to save fig
    
    Since the plt.show() does not work on some environments,
    I changed the example so that everyone can run at least this example.
    nabenabe0928 authored and ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    6992609 View commit details
    Browse the repository at this point in the history
  5. Cleanup of simple_imputer (automl#346)

    * cleanup of simple_imputer
    
    * Fixed doc and typo
    
    * Fixed docs
    
    * Made changes, added test
    
    * Fixed init statement
    
    * Fixed docs
    
    * Flake'd
    eddiebergman authored and ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    0ae9cbf View commit details
    Browse the repository at this point in the history
  6. [feat] Add the option to save a figure in plot setting params (automl…

    …#351)
    
    * [feat] Add the option to save a figure in plot setting params
    
    Since non-GUI based environments would like to avoid the usage of
    show method in the matplotlib, I added the option to savefig and
    thus users can complete the operations inside AutoPytorch.
    
    * [doc] Add a comment for non-GUI based computer in plot_perf_over_time method
    
    * [test] Add a test to check the priority of show and savefig
    
    Since plt.savefig and plt.show do not work at the same time due to the
    matplotlib design, we need to check whether show will not be called
    when a figname is specified. We can actually raise an error, but plot
    will be basically called in the end of an optimization, so I wanted
    to avoid raising an error and just sticked to a check by tests.
    nabenabe0928 authored and ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    fd001a6 View commit details
    Browse the repository at this point in the history
  7. Update workflow files (automl#363)

    * update workflow files
    
    * Remove double quotes
    
    * Exclude python 3.10
    
    * Fix mypy compliance check
    
    * Added PEP 561 compliance
    
    * Add py.typed to MANIFEST for dist
    
    * Update .github/workflows/dist.yml
    
    Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>
    
    Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>
    eddiebergman and ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    aa927a3 View commit details
    Browse the repository at this point in the history
  8. [ADD] fit pipeline honoring API constraints with tests (automl#348)

    * Add fit pipeline with tests
    
    * Add documentation for get dataset
    
    * update documentation
    
    * fix tests
    
    * remove permutation importance from visualisation example
    
    * change disable_file_output
    
    * add
    
    * fix flake
    
    * fix test and examples
    
    * change type of disable_file_output
    
    * Address comments from eddie
    
    * fix docstring in api
    
    * fix tests for base api
    
    * fix tests for base api
    
    * fix tests after rebase
    
    * reduce dataset size in example
    
    * remove optional from  doc string
    
    * Handle unsuccessful fitting of pipeline better
    
    * fix flake in tests
    
    * change to default configuration for documentation
    
    * add warning for no ensemble created when y_optimization in disable_file_output
    
    * reduce budget for single configuration
    
    * address comments from eddie
    
    * address comments from shuhei
    
    * Add autoPyTorchEnum
    
    * fix flake in tests
    
    * address comments from shuhei
    
    * Apply suggestions from code review
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    
    * fix flake
    
    * use **dataset_kwargs
    
    * fix flake
    
    * change to enforce keyword args
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    ravinkohli and nabenabe0928 committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    62e9764 View commit details
    Browse the repository at this point in the history
  9. [ADD] Docker publish workflow (automl#357)

    * Add workflow for publishing docker image to github packages and dockerhub
    
    * add docker installation to docs
    
    * add workflow dispatch
    ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    e3aeb55 View commit details
    Browse the repository at this point in the history
  10. fix error after merge

    ravinkohli committed Dec 21, 2021
    Configuration menu
    Copy the full SHA
    f612f46 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2022

  1. Fix 361 (automl#367)

    * check if N==0, and handle this case
    
    * change position of comment
    
    * Address comments from shuhei
    ravinkohli authored Jan 24, 2022
    Configuration menu
    Copy the full SHA
    c0fb82e View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2022

  1. [ADD] Test evaluator (automl#368)

    * add test evaluator
    
    * add no resampling and other changes for test evaluator
    
    * finalise changes for test_evaluator, TODO: tests
    
    * add tests for new functionality
    
    * fix flake and mypy
    
    * add documentation for the evaluator
    
    * add NoResampling to fit_pipeline
    
    * raise error when trying to construct ensemble with noresampling
    
    * fix tests
    
    * reduce fit_pipeline accuracy check
    
    * Apply suggestions from code review
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    
    * address comments from shuhei
    
    * fix bug in base data loader
    
    * fix bug in data loader for val set
    
    * fix bugs introduced in suggestions
    
    * fix flake
    
    * fix bug in test preprocessing
    
    * fix bug in test data loader
    
    * merge tests for evaluators and change listcomp in get_best_epoch
    
    * rename resampling strategies
    
    * add test for get dataset
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    ravinkohli and nabenabe0928 authored Jan 25, 2022
    Configuration menu
    Copy the full SHA
    6554702 View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2022

  1. [fix] Hotfix debug no training in simple intensifier (automl#370)

    * [fix] Fix the no-training-issue when using simple intensifier
    
    * [test] Add a test for the modification
    
    * [fix] Modify the default budget so that the budget is compatible
    
    Since the previous version does not consider the provided budget_type
    when determining the default budget, I modified this part so that
    the default budget does not mix up the default budget for epochs
    and runtime.
    Note that since the default pipeline config defines epochs as the
    default budget, I also followed this rule when taking the default value.
    
    * [fix] Fix a mypy error
    
    * [fix] Change the total runtime for single config in the example
    
    Since the training sometimes does not finish in time,
    I increased the total runtime for the training so that we can accomodate
    the training in the given amount of time.
    
    * [fix] [refactor] Fix the SMAC requirement and refactor some conditions
    nabenabe0928 authored Jan 27, 2022
    Configuration menu
    Copy the full SHA
    224aa44 View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2022

  1. Configuration menu
    Copy the full SHA
    bd4fabf View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2022

  1. [ADD] variance thresholding (automl#373)

    * add variance thresholding
    
    * fix flake and mypy
    
    * Apply suggestions from code review
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    ravinkohli and nabenabe0928 authored Feb 9, 2022
    Configuration menu
    Copy the full SHA
    466bc18 View commit details
    Browse the repository at this point in the history
  2. [ADD] scalers from autosklearn (automl#372)

    * Add new scalers
    
    * fix flake and mypy
    
    * Apply suggestions from code review
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    
    * add robust scaler
    
    * fix documentation
    
    * remove power transformer from feature preprocessing
    
    * fix tests
    
    * check for default in include and exclude
    
    * Apply suggestions from code review
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    ravinkohli and nabenabe0928 authored Feb 9, 2022
    Configuration menu
    Copy the full SHA
    2601421 View commit details
    Browse the repository at this point in the history
  3. [FIX] Remove redundant categorical imputation (automl#375)

    * remove categorical strategy from simple imputer
    
    * fix tests
    
    * address comments from eddie
    
    * fix flake and mypy error
    
    * fix test cases for imputation
    ravinkohli authored Feb 9, 2022
    Configuration menu
    Copy the full SHA
    ba9c86a View commit details
    Browse the repository at this point in the history
  4. [feat] Add coalescer (automl#376)

    * [fix] Add check dataset in transform as well for test dataset, which does not require fit
    * [test] Migrate tests from the francisco's PR without modifications
    * [fix] Modify so that tests pass
    * [test] Increase the coverage
    nabenabe0928 authored Feb 9, 2022
    Configuration menu
    Copy the full SHA
    bf264d6 View commit details
    Browse the repository at this point in the history

Commits on Feb 18, 2022

  1. Fix: keyword arguments to submit (automl#384)

    * Fix: keyword arguments to submit
    
    * Fix: Missing param for implementing AbstractTA
    
    * Fix: Typing of multi_objectives
    
    * Add: mutli_objectives to each ExecuteTaFucnWithQueue
    eddiebergman authored Feb 18, 2022
    Configuration menu
    Copy the full SHA
    b5c1757 View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2022

  1. [FIX] Datamanager in memory (automl#382)

    * remove datamanager instances from evaluation and smbo
    
    * fix flake
    
    * Apply suggestions from code review
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    
    * fix flake
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    ravinkohli and nabenabe0928 authored Feb 23, 2022
    Configuration menu
    Copy the full SHA
    4a0c773 View commit details
    Browse the repository at this point in the history
  2. [feat] Add new task inference for APT (automl#386)

    * [fix] Fix the task inference issue mentioned in automl#352
    
    Since sklearn task inference regards targets with integers as
    a classification task, I modified target_validator so that we always
    cast targets for regression to float.
    This workaround is mentioned in the reference below:
    scikit-learn/scikit-learn#8952
    
    * [fix] [test] Add a small number to label for regression and add tests
    
    Since target labels are required to be float and sklearn requires
    numbers after a decimal point, I added a workaround to add the almost
    possible minimum fraction to array so that we can avoid a mis-inference
    of task type from sklearn.
    Plus, I added tests to check if we get the expected results for
    extreme cases.
    
    * [fix] [test] Adapt the modification of targets to scipy.sparse.xxx_matrix
    
    * [fix] Address Ravin's comments and loosen the small number choice
    nabenabe0928 authored Feb 23, 2022
    Configuration menu
    Copy the full SHA
    2306c45 View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2022

  1. Configuration menu
    Copy the full SHA
    dafd480 View commit details
    Browse the repository at this point in the history
  2. [ADD] dataset compression (automl#387)

    * Initial implementation without tests
    
    * add tests and make necessary changes
    
    * improve documentation
    
    * fix tests
    
    * Apply suggestions from code review
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    
    * undo change in  as it causes tests to fail
    
    * change name from InputValidator to input_validator
    
    * extract statements to methods
    
    * refactor code
    
    * check if mapping is the same as expected
    
    * update precision reduction for dataframes and tests
    
    * fix flake
    
    Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
    ravinkohli and nabenabe0928 authored Feb 25, 2022
    Configuration menu
    Copy the full SHA
    a679b09 View commit details
    Browse the repository at this point in the history

Commits on Mar 2, 2022

  1. Configuration menu
    Copy the full SHA
    1b8e76a View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2022

  1. [ADD] feature preprocessors from autosklearn (automl#378)

    * in progress
    
    * add remaining preprocessors
    
    * fix flake and mypy after rebase
    
    * Fix tests and add documentation
    
    * fix tests bug
    
    * fix bug in tests
    
    * fix bug where search space updates were not honoured
    
    * handle check for score func in feature preprocessors
    
    * address comments from shuhei
    
    * apply suggestions from code review
    
    * add documentation for feature preprocessors with percent to int value range
    
    * fix tests
    
    * fix tests
    
    * address comments from shuhei
    
    * fix tests which fail due to scaler
    ravinkohli authored Mar 3, 2022
    Configuration menu
    Copy the full SHA
    048656e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    13fa571 View commit details
    Browse the repository at this point in the history
  3. [fix] [test] Fix an error in test

    The previous version provided non-sparse data at the training and sparse
    data at the test time, which is invalid.
    That is why I fixed this error.
    nabenabe0928 committed Mar 3, 2022
    Configuration menu
    Copy the full SHA
    56f58b4 View commit details
    Browse the repository at this point in the history