Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown label type in autosplit #2

Open
rogue26 opened this issue Dec 2, 2019 · 1 comment
Open

Unknown label type in autosplit #2

rogue26 opened this issue Dec 2, 2019 · 1 comment

Comments

@rogue26
Copy link

rogue26 commented Dec 2, 2019

I get the following error when trying to use the autosplit feature, regardless of whether I split on a numerical or categorical variable.

2019-12-02 20:08:06,514 ERROR Traceback (most recent call last):
  File "<string>", line 182, in auto_split
  File "/home/dataiku/dss/plugins/installed/decision-tree-builder/python-lib/dku_idtb_decision_tree/autosplit.py", line 24, in autosplit
    return compute_splits(df[[feature]], df[target], max_splits)
  File "/home/dataiku/dss/plugins/installed/decision-tree-builder/python-lib/dku_idtb_decision_tree/autosplit.py", line 58, in compute_splits
    tree_estimator.fit(feature_df, target_col)
  File "/home/dataiku/dataiku-dss-5.1.2/python.packages/sklearn/tree/tree.py", line 801, in fit
    X_idx_sorted=X_idx_sorted)
  File "/home/dataiku/dataiku-dss-5.1.2/python.packages/sklearn/tree/tree.py", line 140, in fit
    check_classification_targets(y)
  File "/home/dataiku/dataiku-dss-5.1.2/python.packages/sklearn/utils/multiclass.py", line 171, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

Could there be a problem in my data causing this. Alternatively, I wonder if the autosplit module should use DecisionTreeRegressor rather than DecisionTreeClassifier.

@AgatheG
Copy link
Contributor

AgatheG commented Dec 3, 2019

Hi,

This issue is due to your target feature containing floats. Autosplit would work with any discrete values (booleans, integers or strings).
If you are doing a multiclass classification with floats as classes, a workaround would be to convert your classes (for instance, add a string prefix in a Prepare recipe) so that they would be handled properly by scikit-learn.

Please note that for now, we do not support regression but only classification (binary or multiclass); thus we use DecisionTreeClassifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants