Update plr.py : classifier prediction method option added to ml_l #283

PragyanTiwari · 2025-01-07T07:34:10Z

Description

Added the classification condition to the ml_l model. Hence, it will set the prediction (predict or predict_proba) method accordingly.

Reference to Issues or PRs

PR for the issue: #236

PR Checklist

Please fill out this PR checklist (see our contributing guidelines for details).

The title of the pull request summarizes the changes made.
The PR contains a detailed description of all changes and additions.
References to related issues or PRs are added.
The code passes all (unit) tests.
Enhancements or new feature are equipped with unit tests.
The changes adhere to the PEP8 standards.

SvenKlaassen

Would it also be possible to add a test e.g. test_plr_binary to test the setting with binary outcomes (this might require adjustment of the manual function fit_plr_single_split

doubleml-for-py/doubleml/plm/tests/_utils_plr_manual.py

Line 94 in 9d8f50f

    
           def fit_plr_single_split(y, x, d, learner_l, learner_m, learner_g, smpls, score,

SvenKlaassen · 2025-01-07T09:47:23Z

doubleml/plm/plr.py

@PragyanTiwari Thanks.
I would suggest slight changes.
The learner type is checked via the _check_learner() method. To allow for both options i would suggest to define

ml_l_is_classifier = self._check_learner(ml_l, 'ml_l', regressor=True, classifier=True)

This could then be used directly in the _predict_method.

Further it would be great if there would be a UserWarning if ml_l is a classifer to alert the user that this model treats the probability as additive.

Hi @SvenKlaassen,

If this is the case, then both ml_l_is_classifier and ml_m_is_classifier would likely behave similarly when setting the _predict_method.

Since ml_l_is_classifier also requires the same if-else logic as ml_m_is_classifier (

doubleml-for-py/doubleml/plm/plr.py

Lines 131 to 138 in 9d8f50f

if ml_m_is_classifier:

if self._dml_data.binary_treats.all():

self._predict_method['ml_m'] = 'predict_proba'

else:

raise ValueError(f'The ml_m learner {str(ml_m)} was identified as classifier '

'but at least one treatment variable is not binary with values 0 and 1.')

else:

self._predict_method['ml_m'] = 'predict'

),

Which option do you prefer??

Define a separate function that takes the model as parameter and sets its _predict_method accordingly for that particular model i.e. ml_l or ml_m.

Or, Simply replicate the same conditions for ml_l_is_classifier.

Which approach do you think is more maintainable in the long run?

Maybe approach 2 would be slightly prefered as the conditions logic is not completely the same.
Since only one outcome is possible in the package ml_l does not need to check if all treatments are binary.

Well, indeed we need a test for the ml_l (classifier) to check the binary outcomes; hence test_plr_binary becomes essential. Right now, I've made some basic changes, please look into that.

Thanks. That looks fine.
I think a check on the binary outcome should look similar to this

doubleml-for-py/doubleml/irm/irm.py

Lines 147 to 151 in 92b057d

if obj_dml_data.binary_outcome:

self._predict_method = {'ml_g': 'predict_proba', 'ml_m': 'predict_proba'}

else:

raise ValueError(f'The ml_g learner {str(ml_g)} was identified as classifier '

'but the outcome variable is not binary with values 0 and 1.')

Further, the tests should at least check the Userwarning and the exception if the outcome is not binary.
I will not have time to look into changes immediately.

Base condition written for ml_l (classifier).

PragyanTiwari · 2025-01-12T17:10:19Z

If you want to proceed with this pull request, then please reply. I would be happy to reopen this pull request.

SvenKlaassen requested changes Jan 7, 2025

View reviewed changes

PragyanTiwari closed this Jan 11, 2025

PragyanTiwari force-pushed the main branch from f0466d5 to 92b057d Compare January 11, 2025 09:07

PragyanTiwari added 2 commits January 11, 2025 14:43

Update plr.py

ade053c

Update plr.py

a44c724

PragyanTiwari reopened this Jan 11, 2025

PragyanTiwari added 3 commits January 11, 2025 18:37

Update plr.py

375bdf7

Base condition written for ml_l (classifier).

Update plr.py

6819f2d

Update plr.py

15fba49

PragyanTiwari closed this Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update plr.py : classifier prediction method option added to ml_l #283

Update plr.py : classifier prediction method option added to ml_l #283

PragyanTiwari commented Jan 7, 2025

SvenKlaassen left a comment

SvenKlaassen Jan 7, 2025

PragyanTiwari Jan 10, 2025

SvenKlaassen Jan 11, 2025

PragyanTiwari Jan 11, 2025

SvenKlaassen Jan 13, 2025

PragyanTiwari commented Jan 12, 2025

	if ml_m_is_classifier:
	if self._dml_data.binary_treats.all():
	self._predict_method['ml_m'] = 'predict_proba'
	else:
	raise ValueError(f'The ml_m learner {str(ml_m)} was identified as classifier '
	'but at least one treatment variable is not binary with values 0 and 1.')
	else:
	self._predict_method['ml_m'] = 'predict'

	if obj_dml_data.binary_outcome:
	self._predict_method = {'ml_g': 'predict_proba', 'ml_m': 'predict_proba'}
	else:
	raise ValueError(f'The ml_g learner {str(ml_g)} was identified as classifier '
	'but the outcome variable is not binary with values 0 and 1.')

Update plr.py : classifier prediction method option added to ml_l #283

Update plr.py : classifier prediction method option added to ml_l #283

Conversation

PragyanTiwari commented Jan 7, 2025

Description

Reference to Issues or PRs

PR Checklist

SvenKlaassen left a comment

Choose a reason for hiding this comment

SvenKlaassen Jan 7, 2025

Choose a reason for hiding this comment

PragyanTiwari Jan 10, 2025

Choose a reason for hiding this comment

SvenKlaassen Jan 11, 2025

Choose a reason for hiding this comment

PragyanTiwari Jan 11, 2025

Choose a reason for hiding this comment

SvenKlaassen Jan 13, 2025

Choose a reason for hiding this comment

PragyanTiwari commented Jan 12, 2025