Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[good first issue - intermediate] Plot Materializer for Sklearn Plots #456

Closed
skrawcz opened this issue Oct 12, 2023 · 2 comments
Closed
Labels
good first issue Good for newcomers hacktoberfest Hacktoberfest issues help wanted Extra attention is needed

Comments

@skrawcz
Copy link
Collaborator

skrawcz commented Oct 12, 2023

Is your feature request related to a problem? Please describe.
It is common for debugging scenarios in machine learning to produce plots for debugging purposes as a pipeline runs. You can do this with Hamilton, but we believe the ergonomics could be improved with materializers.

Describe the solution you'd like
Given the following Hamilton functions:

from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

def y_pred(clf: Classifier, X_test: pd.Dataframe) -> pd.Series:
     return clf.predict(X_test)

def confusion_matrix(y_test: pd.Series,  y_pred: pd.Series) -> np.ndarray:
    return confusion_matrix(y_test, y_pred)

def cm_display(confusion_matrix: np.ndarray) -> ConfusionMatrixDisplay:
      return ConfusionMatrixDisplay(cm).plot()

We want to capture cm_display and save the produced plot somewhere.

One possible materializer API:

to.png(
   id="foo_bar_confusion_matrix",
   dependencies=["cm_display"]
   path="./foo_bar_confusion_matrix.png"
   height=12,
   withd=12, # some sort of kwargs...
)

That would handle taking the object and creating a PNG from it.

We should try to handle a few of the common plot types in sklearn.

Describe alternatives you've considered
You do this in the function -- but I think the materializer pattern helps abstract this.

@skrawcz skrawcz added good first issue Good for newcomers help wanted Extra attention is needed hacktoberfest Hacktoberfest issues labels Oct 12, 2023
@VPraharsha03 VPraharsha03 mentioned this issue Oct 17, 2023
10 tasks
skrawcz pushed a commit that referenced this issue Oct 23, 2023
Adds the ability to save a SKLearn plot as a PNG easily.

Adding plots is a something you may in development and not production,
or vice-versa. So having a this allows one to easily inject this 
at driver time, so you don't have to modify one's logic for concerns
that needn't be hard coded into your dataflow. But there's nothing
wrong with putting it in your dataflow, having it as a materializer
just enables you a little more freedom to decouple concerns.

--- Squashed commits

* add sklear plot saver

* remove unnecessary comment

* Fix review points

* use attrib method

* added tests, fix data saver method

* added test for calibration display

* update req

* handle keyword arguments

* Selective import of display classes

* make compatible with py3.7

* add version checking

* change to 3.8

* add checking to test func

* add additional display classes

* fix failing 3.7 build

* using skipif

* fix signature

* add to plugins_modules list

---------

Co-authored-by: Vivek Praharsha <vpraharsha@outlook.com>
Co-authored-by: vpraharsha03 <vpr03>
@luahan77m
Copy link

Hi, I'm new to open source and want to try to solve some simple problems. Can you assign this issue to me? I would appreciate it!

@skrawcz
Copy link
Collaborator Author

skrawcz commented Oct 30, 2023

@luahan77m sorry this issue has actually been done. I forgot to close it. Apologies.

If you want a simple task - #247 - is a good one to get started with. It will require you to become familiar with the process of contributing to open source and so I recommend that as a place to start.

@skrawcz skrawcz closed this as completed Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers hacktoberfest Hacktoberfest issues help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants