Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tqdm to KElbowVisualizer #1070

Closed
wants to merge 2 commits into from

Conversation

senovr
Copy link

@senovr senovr commented Jun 8, 2020

This PR adds tqdm (visualizer of the progress bar) for the KElbovVisualizer class.
This allows user to better estimate time required for fitting when iterating through the number of clusters.

I have made the following changes:

1.modified a .fit() method of KElbovVisualizer class
for k in tqdm(self.k_values_):
2.add tqdm in requirements.txt file

Sample Code and Plot

The only difference is displaying of progress bar.
Below you may find a snapshot from my Jupiter notebook:
image

TODOs and questions

feel free to contact me @senovr

@bbengfort
Copy link
Member

Hi @senovr thank you so much for your first-time contribution to Yellowbrick, we really appreciate you putting time and thought into our package and diving into the code - this is essential for Yellowbrick to grow and thrive!

I certainly feel and understand the pain point of how long it takes to produce a visualization (or fit a scikit-learn model) and without feedback, it is sometimes tempting to assume that things are stuck and not moving. This is a problem for KElbow, RFECV, and Alpha selection which fit many models and also the long-running fit models like Manifolds and ParallelCoordinates. In several places, we've looked to improve the performance of the models in order to minimize this effect PR #1048 and PC fast param.

Yellowbrick is intended to be used as a library package and because of that we do our best to minimize the dependencies that users have to install alongside it (currently we only really depend on scitkit-learn and matplotlib and their dependencies). We also assume that Yellowbrick will be used in a variety of different user contexts including but not limited to:

  1. CLI programs that automate model fitting
  2. Jupyter notebooks
  3. Celery distributed task queues
  4. Headless servers in the background

Each of these contexts has very different stdout requirements, and although tqdm is effective at the first two environments, it is not a small library and not a one-size fits all library. Because of this we cannot make it a standard dependency of Yellowbrick.

One possible compromise is that we can make tqdm an optional dependency that does not affect the code or operation of Yellowbrick. If it is installed, then we can optionally make it available to the user (you can see an example of an optional dependency in the commented out portion of requirements.txt. I'm happy to discuss moving in this direction, or I'm happy to invite you to join a discussion about an application-grade form of Yellowbrick that might have different dependencies. Otherwise, it's great to have this documented in our PRs even if we close it without merging. Let me know how you'd like to proceed!

@rebeccabilbro rebeccabilbro added the gone-stale PR or issue has not seen activity in >30 days and/or develop branch has since diverged significantly label Aug 4, 2020
@rebeccabilbro
Copy link
Member

Going to go ahead and archive this since there hasn't been much activity for a while; thanks again for your interest in Yellowbrick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gone-stale PR or issue has not seen activity in >30 days and/or develop branch has since diverged significantly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants