Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate state transition validation from nonState transition valdations #12120

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Sep 27, 2024

Fixes #12037

Status

READY

Description

SUPERSEDES: #12146 && #12148

With the current change we allow SiteLists related actions for statuses: staging, acquired, running-open in ReqMgr2

Before making a call to central services for changing any of the request parameters, an additional step is executed to
to check which are the allowed parameter modifications for the given status and if the so provided new values from the rest call actually differ from the workflow parameters already defined in central couchdb.

With the current change we no longer ignore all the rest of the request arguments provided with the REST call in the cases of a change of the Request priority. See: #8457 (comment)

The current change is also meant to address the issue with vanishing parameters during assignment of ACDC workflows as explained here: #12037 (comment). Even though, the issue would have manifested itself for regular workflows as well, if they were to experience any parameter change in a state transition from assignment-approved to assigned. Currently this process is taken by Unified (and I believe no parameter change was happening during this, so automated, step) and the only manual intervention we perform at this state transition was for ACDC ... ,hence why we noticed the misbehavior with an ACDC workflow.

With the current PR we suggest a separate logic in the validation function between state and non-state transition workflow updates.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

This PR provides a cherry-pick of 3 pull requests that have been recently reverted. Changes have been originally provided by:
#12077
#12108
#12111

External dependencies / deployment changes

No

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 7 warnings
    • 46 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15251/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov as we have already found 2 issues with this development, I would like to ask you to run a representative validation for this fix. By representative, I can think of at least the following use cases:

  • create and assign standard workflow
  • create and assign ACDC workflow
  • change priority for active workflow
  • change site lists for active workflow
  • change priority and site lists for active workflow

Once we cover all these use cases, then you could look into fixing the unit tests as well.

@cmsdmwmbot
Copy link

Can one of the admins verify this patch?

@amaltaro
Copy link
Contributor

amaltaro commented Oct 3, 2024

@todor-ivanov I am trying to organize WM central services upgrade and I need to know how this validation is progressing and when you think you can finish it? If you think it can still take a few days, then we might actually revert the 2 changes that went in and give you the time you need to validate this.

@todor-ivanov
Copy link
Contributor Author

@amaltaro I am pretty sure I will not be able to finish this before mid next week.

Add proper checks for allowed request properties changes && Stop reducing request_args only to RequestPriority for noStatusTransition actions

Extend allowed arguments including stat_keys and RequestStatus && Call the relevant modifiers from _handleNoStatusUpdate

Fix missing RequestName from request_args on mutipple calls of validate_request_update_args

Add proper log messages for Sitelists changes

Remove redundant validation calls && Update reqdb couch with single action && Move reduceReport to Utils.

Typo

Remove forgotten commented lines of code

Review comments

Source files pylint fixes

Review comments
Unit tests

Unit tests pylint fixes

Unit tests - remove tests for reduceReport
@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Oct 18, 2024

hi @amaltaro:
I have patched my central services cmsweb-test1.cern.sh with those both patches: #12148 && #12120

Coming back to the tests requested here: #12120 (review)

  • create and assign standard workflow - DONE
  • create and assign ACDC workflow - Waiting for a complete Workflow
  • change priority for active workflow - DONE
  • change site lists for active workflow - DONE
  • change priority and site lists for active workflow - DONE

Here are two of the workflows I used for these tests:

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Oct 18, 2024

hi gain @amaltaro , and here are the results for an artificially created ACDC:

This is the workflow: https://cmsweb-test1.cern.ch/reqmgr2/fetch?rid=tivanov_ACDC_TaskChain_LumiMask_multiRun_SiteListsTest_v6_241018_134936_9617

  • The workflow is properly transferred from: new to assignment-parroved
  • If I try to change any parameter without a change of the status from assignment-approved then I get this error:
[18/Oct/2024:13:56:41]  SERVER REST ERROR WMCore.ReqMgr.DataStructs.RequestError.InvalidSpecParameterValue 5e296467497b0a2f6e4069a199ef7e28 (Invalid spec parameter value: There were unhandled arguments left for no-status update: ['TrustSitelists', 'TrustPUSitelists', 'CustodialSites', 'NonCustodialSites'])
[18/Oct/2024:13:56:41]    Traceback (most recent call last):
[18/Oct/2024:13:56:41]      File "/usr/local/lib/python3.8/site-packages/WMCore/REST/Server.py", line 749, in default
[18/Oct/2024:13:56:41]        return self._call(RESTArgs(list(args), kwargs))
[18/Oct/2024:13:56:41]      File "/usr/local/lib/python3.8/site-packages/WMCore/REST/Server.py", line 832, in _call
[18/Oct/2024:13:56:41]        obj = apiobj['call'](*safe.args, **safe.kwargs)
[18/Oct/2024:13:56:41]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 559, in put
[18/Oct/2024:13:56:41]        result = self._updateRequest(workload, request_args)
[18/Oct/2024:13:56:41]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 537, in _updateRequest
[18/Oct/2024:13:56:41]        report = self._handleNoStatusUpdate(workload, request_args, dn)
[18/Oct/2024:13:56:41]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 446, in _handleNoStatusUpdate
[18/Oct/2024:13:56:41]        raise InvalidSpecParameterValue(msg)
[18/Oct/2024:13:56:41]    WMCore.ReqMgr.DataStructs.RequestError.InvalidSpecParameterValue: InvalidSpecParameterValue 5e296467497b0a2f6e4069a199ef7e28 [HTTP 400, APP 1102, MSG "Invalid spec parameter value: There were unhandled arguments left for no-status update: ['TrustSitelists', 'TrustPUSitelists', 'CustodialSites', 'NonCustodialSites']", INFO None, ERR None]

Which is kind of expected, since in the assignment-approved map of allowed arguments we do have a big set, which we indeed do not suport as a NON-STATUS_UPDATE arguments, but we support them in combination with a status update.

  • If I move the workflow from assignemnt-aproved to assigned together with the change of site whitelist or priority or anything else, it all goes smoothly.

@todor-ivanov
Copy link
Contributor Author

@amaltaro

I did yet another test. With and without the patch from this PR applied:

  • Without this patch the workflow update is completely broken as you report it.
  • With this patch the workflow update proceeds only under the combination of change of parameters together with the state transition from assignment-approved to assigned

Which means we end up here:

report = self._handleAssignmentStateTransition(workload, request_args, dn)

instead of here:

report = self._handleNoStatusUpdate(workload, request_args, dn)

So which means we may need to either repeat the actions from _handleAssignementStateTransiotion in _handleNoStateTransition or we stop treating the state transition and non-state transition arguments differently so we unify those _handle* auxiliary methods as I already suggested here: #12099 (comment)
Quote:

Of course, if you ask me - I am completely up for moving the whole logic to be implemented here in a more generic way .... for all status updates, then get rid of a big chunk of code covering custom cases ... and only make the proper calls to this generic method here from upstream modules (e.g. Request in the current case)

What do you think?

@todor-ivanov
Copy link
Contributor Author

BTW, up until now we did not feel the difference for workflow parameter change with and without state update from assignment-approved (which I am explaining in my previous comment) only because we were completely ignoring anything but RequestPriority for _handleNosStatusUpdate calls. So we were not failing the call (as we correctly do now) if we do not handle properly the arguments provided, but just ignoring anything that the user sent to us..... So what I'd say here is that the correct behavior is exposed.... the question is what would we do to fix it. And the two possible paths I can see I listed in my comment above. We need to choose one of those paths.

@amaltaro any ideas?

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Oct 21, 2024

Ok @amaltaro , I tested a really nasty workaround, following the path for calling _handleAssingmentApprovedStateTransition actions from the _handleNonStateTranstion call. This works. I'll provide the workaround in my latest commit: 62a6d43 , but I must stress - I really do not like this approach.

Here are the logs from:

  • Transforming some arguments of an ACDC workflow in assignment-approved status, without actually moving it to assigned
[21/Oct/2024:12:35:26] reqmgr2-bcdccd8c6-hsmlj 188.185.122.76:54180 "GET /reqmgr2/data/request?status=assigned&detail=True HTTP/1.1" 200 OK [data: 1652 in 9632 out 35882 us ] [auth: ok ***]
[21/Oct/2024:12:35:29] reqmgr2-bcdccd8c6-hsmlj 127.0.0.1 "GET /reqmgr2/data/info HTTP/1.1" 200 OK [data: 296 in 668 out 28878 us ] [auth: OK "" "" ] [ref: "" "Go-http-client/1.1" ]
[21/Oct/2024:12:35:46]  Updating request "tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621" with these user-provided args: {'RequestPriority': 200000, 'Team': 'testbed-vocms0290', 'SiteWhitelist': '', 'SiteBlacklist': ['T1_DE_KIT', 'T1_ES_PIC', 'T1_FR_CCIN2P3', 'T1_IT_CNAF', 'T1_UK_RAL', 'T2_BE_IIHE', 'T2_BE_UCL'], 'AcquisitionEra': {'myTask1': 'RunIISummer20UL16wmLHEGENAPV', 'myTask2': 'RunIISummer20UL16SIMAPV', 'myTask3': 'RunIISummer20UL16DIGIPremixAPV', 'myTask4': 'RunIISummer20UL16HLTAPV', 'myTask5': 'RunIISummer20UL16RECOAPV', 'myTask6': 'RunIISummer20UL16MiniAODAPV'}, 'ProcessingString': {'myTask1': 'myTask1_TaskChain_Prod_SiteListsTest_v6', 'myTask2': 'myTask2_TaskChain_Prod_SiteListsTest_v6', 'myTask3': 'myTask3_TaskChain_Prod_SiteListsTest_v6', 'myTask4': 'myTask4_TaskChain_Prod_SiteListsTest_v6', 'myTask5': 'myTask5_TaskChain_Prod_SiteListsTest_v6', 'myTask6': 'myTask6_TaskChain_Prod_SiteListsTest_v6'}, 'ProcessingVersion': {'myTask1': 11, 'myTask2': 12, 'myTask3': 13, 'myTask4': 14, 'myTask5': 15, 'myTask6': 16}, 'Dashboard': 'production', 'MergedLFNBase': '/store/backfill/1', 'TrustSitelists': 'False', 'UnmergedLFNBase': '/store/unmerged', 'MinMergeSize': 2147483648, 'MaxMergeSize': 4294967296, 'MaxMergeEvents': 100000000, 'BlockCloseMaxWaitTime': 66400, 'BlockCloseMaxFiles': 500, 'BlockCloseMaxEvents': 25000000, 'BlockCloseMaxSize': 5000000000000, 'SoftTimeout': 129600, 'GracePeriod': 300, 'TrustPUSitelists': 'True', 'CustodialSites': '', 'NonCustodialSites': '', 'Override': {'eos-lfn-prefix': 'root://eoscms.cern.ch//eos/cms/store/logs/prod/recent/PRODUCTION'}, 'SubscriptionPriority': 'Low'}
[21/Oct/2024:12:35:46]  Updated priority of "tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621" to: 200000
[21/Oct/2024:12:35:46]  Unhandled argument for no-status update: Team
[21/Oct/2024:12:35:46]  Updated SiteWhitelist of "tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621", with:  
[21/Oct/2024:12:35:46]  Updated SiteBlacklist of "tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621", with:  ['T1_DE_KIT', 'T1_ES_PIC', 'T1_FR_CCIN2P3', 'T1_IT_CNAF', 'T1_UK_RAL', 'T2_BE_IIHE', 'T2_BE_UCL']
[21/Oct/2024:12:35:46]  Unhandled argument for no-status update: TrustPUSitelists
[21/Oct/2024:12:35:46]  CurrentRequest status: assignment-approved
[21/Oct/2024:12:35:46]  Handling assignment-approved arguments differently!
[21/Oct/2024:12:35:46]  Assign request tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621, input args: {'RequestPriority': 200000, 'Team': 'testbed-vocms0290', 'SiteWhitelist': '', 'SiteBlacklist': ['T1_DE_KIT', 'T1_ES_PIC', 'T1_FR_CCIN2P3', 'T1_IT_CNAF', 'T1_UK_RAL', 'T2_BE_IIHE', 'T2_BE_UCL'], 'TrustPUSitelists': 'True'} ...
[21/Oct/2024:12:35:46] reqmgr2-bcdccd8c6-hsmlj 188.185.16.223:35786 "PUT /reqmgr2/data/request/tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621 HTTP/1.1" 200 OK [data: 3598 in 110 out 455516 us ] [auth: ok ***]
  • Transitioning the same workflow from assignment-approved to assigned with again few arguments changed:
[21/Oct/2024:12:40:36] reqmgr2-bcdccd8c6-hsmlj 188.184.96.94:20730 "GET /reqmgr2/data/wmagentconfig/vocms0290.cern.ch HTTP/1.1" 200 OK [data: 1567 in 306 out 16027 us ] [auth: ok "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=cmst1/CN=718748/CN=Robot: cms t1" "" ] [ref: "https://cmsweb-test1.cern.ch" "WMCore.Services.Requests/v002" ]
[21/Oct/2024:12:40:55]  Updating request "tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621" with these user-provided args: {'RequestStatus': 'assigned', 'RequestPriority': 200000, 'Team': 'testbed-vocms0290', 'SiteWhitelist': ['T1_US_FNAL', 'T2_CH_CERN'], 'SiteBlacklist': '', 'AcquisitionEra': {'myTask1': 'RunIISummer20UL16wmLHEGENAPV', 'myTask2': 'RunIISummer20UL16SIMAPV', 'myTask3': 'RunIISummer20UL16DIGIPremixAPV', 'myTask4': 'RunIISummer20UL16HLTAPV', 'myTask5': 'RunIISummer20UL16RECOAPV', 'myTask6': 'RunIISummer20UL16MiniAODAPV'}, 'ProcessingString': {'myTask1': 'myTask1_TaskChain_Prod_SiteListsTest_v6', 'myTask2': 'myTask2_TaskChain_Prod_SiteListsTest_v6', 'myTask3': 'myTask3_TaskChain_Prod_SiteListsTest_v6', 'myTask4': 'myTask4_TaskChain_Prod_SiteListsTest_v6', 'myTask5': 'myTask5_TaskChain_Prod_SiteListsTest_v6', 'myTask6': 'myTask6_TaskChain_Prod_SiteListsTest_v6'}, 'ProcessingVersion': {'myTask1': 11, 'myTask2': 12, 'myTask3': 13, 'myTask4': 14, 'myTask5': 15, 'myTask6': 16}, 'Dashboard': 'production', 'MergedLFNBase': '/store/backfill/1', 'TrustSitelists': 'False', 'UnmergedLFNBase': '/store/unmerged', 'MinMergeSize': 2147483648, 'MaxMergeSize': 4294967296, 'MaxMergeEvents': 100000000, 'BlockCloseMaxWaitTime': 66400, 'BlockCloseMaxFiles': 500, 'BlockCloseMaxEvents': 25000000, 'BlockCloseMaxSize': 5000000000000, 'SoftTimeout': 129600, 'GracePeriod': 300, 'TrustPUSitelists': 'True', 'CustodialSites': '', 'NonCustodialSites': '', 'Override': {'eos-lfn-prefix': 'root://eoscms.cern.ch//eos/cms/store/logs/prod/recent/PRODUCTION'}, 'SubscriptionPriority': 'Low'}
[21/Oct/2024:12:40:55]  Assign request tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621, input args: {'RequestStatus': 'assigned', 'RequestPriority': 200000, 'Team': 'testbed-vocms0290', 'SiteWhitelist': ['T1_US_FNAL', 'T2_CH_CERN'], 'SiteBlacklist': [], 'AcquisitionEra': {'myTask1': 'RunIISummer20UL16wmLHEGENAPV', 'myTask2': 'RunIISummer20UL16SIMAPV', 'myTask3': 'RunIISummer20UL16DIGIPremixAPV', 'myTask4': 'RunIISummer20UL16HLTAPV', 'myTask5': 'RunIISummer20UL16RECOAPV', 'myTask6': 'RunIISummer20UL16MiniAODAPV'}, 'ProcessingString': {'myTask1': 'myTask1_TaskChain_Prod_SiteListsTest_v6', 'myTask2': 'myTask2_TaskChain_Prod_SiteListsTest_v6', 'myTask3': 'myTask3_TaskChain_Prod_SiteListsTest_v6', 'myTask4': 'myTask4_TaskChain_Prod_SiteListsTest_v6', 'myTask5': 'myTask5_TaskChain_Prod_SiteListsTest_v6', 'myTask6': 'myTask6_TaskChain_Prod_SiteListsTest_v6'}, 'ProcessingVersion': {'myTask1': 11, 'myTask2': 12, 'myTask3': 13, 'myTask4': 14, 'myTask5': 15, 'myTask6': 16}, 'Dashboard': 'production', 'MergedLFNBase': '/store/backfill/1', 'TrustSitelists': False, 'UnmergedLFNBase': '/store/unmerged', 'MinMergeSize': 2147483648, 'MaxMergeSize': 4294967296, 'MaxMergeEvents': 100000000, 'BlockCloseMaxWaitTime': 66400, 'BlockCloseMaxFiles': 500, 'BlockCloseMaxEvents': 25000000, 'BlockCloseMaxSize': 5000000000000, 'SoftTimeout': 129600, 'GracePeriod': 300, 'TrustPUSitelists': True, 'CustodialSites': [], 'NonCustodialSites': [], 'Override': {'eos-lfn-prefix': 'root://eoscms.cern.ch//eos/cms/store/logs/prod/recent/PRODUCTION'}, 'SubscriptionPriority': 'Low', 'HardTimeout': 129900} ...
[21/Oct/2024:12:40:56] reqmgr2-bcdccd8c6-hsmlj 188.185.16.223:59098 "PUT /reqmgr2/data/request/tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621 HTTP/1.1" 200 OK [data: 3561 in 110 out 236476 us ] [auth: ok

There is a side effect, though, Besides the fact that we get into to spiral of calls within calls of methods which could be aligned sequentially and only skip the irrelevant ones.... but anyway. The side effect I am speaking is actually something I think I've seen in the past, which is - Once you update any of the request parameters in the web interface while the ACDC is in assignment-approved but you do not update the status to assigned the web interface looses the the default values for Team and SiteWhiteList and you need to beware not to hit the button Submit without checking those, because otherwise you'll end up with the following error:

[21/Oct/2024:12:40:30]  Updating request "tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621" with these user-provided args: {'RequestStatus': 'assigned', 'RequestPriority': 200000, 'Team': '', 'SiteWhitelist': ['T1_US_FNAL', 'T2_CH_CERN'], 'SiteBlacklist': '', 'AcquisitionEra': {'myTask1': 'RunIISummer20UL16wmLHEGENAPV', 'myTask2': 'RunIISummer20UL16SIMAPV', 'myTask3': 'RunIISummer20UL16DIGIPremixAPV', 'myTask4': 'RunIISummer20UL16HLTAPV', 'myTask5': 'RunIISummer20UL16RECOAPV', 'myTask6': 'RunIISummer20UL16MiniAODAPV'}, 'ProcessingString': {'myTask1': 'myTask1_TaskChain_Prod_SiteListsTest_v6', 'myTask2': 'myTask2_TaskChain_Prod_SiteListsTest_v6', 'myTask3': 'myTask3_TaskChain_Prod_SiteListsTest_v6', 'myTask4': 'myTask4_TaskChain_Prod_SiteListsTest_v6', 'myTask5': 'myTask5_TaskChain_Prod_SiteListsTest_v6', 'myTask6': 'myTask6_TaskChain_Prod_SiteListsTest_v6'}, 'ProcessingVersion': {'myTask1': 11, 'myTask2': 12, 'myTask3': 13, 'myTask4': 14, 'myTask5': 15, 'myTask6': 16}, 'Dashboard': 'production', 'MergedLFNBase': '/store/backfill/1', 'TrustSitelists': 'False', 'UnmergedLFNBase': '/store/unmerged', 'MinMergeSize': 2147483648, 'MaxMergeSize': 4294967296, 'MaxMergeEvents': 100000000, 'BlockCloseMaxWaitTime': 66400, 'BlockCloseMaxFiles': 500, 'BlockCloseMaxEvents': 25000000, 'BlockCloseMaxSize': 5000000000000, 'SoftTimeout': 129600, 'GracePeriod': 300, 'TrustPUSitelists': 'True', 'CustodialSites': '', 'NonCustodialSites': '', 'Override': {'eos-lfn-prefix': 'root://eoscms.cern.ch//eos/cms/store/logs/prod/recent/PRODUCTION'}, 'SubscriptionPriority': 'Low'}
[21/Oct/2024:12:40:30]  Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 189, in validate
    self._validateRequestBase(param, safe, validate_request_update_args, requestName)
  File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 101, in _validateRequestBase
    workload, r_args = valFunc(args, self.config, self.reqmgr_db_service, param)
  File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Utils/Validation.py", line 102, in validate_request_update_args
    workload.validateArgumentForAssignment(request_args)
  File "/usr/local/lib/python3.8/site-packages/WMCore/WMSpec/WMWorkload.py", line 1945, in validateArgumentForAssignment
    validateArgumentsUpdate(schema, argumentDefinition)
  File "/usr/local/lib/python3.8/site-packages/WMCore/WMSpec/WMWorkloadTools.py", line 293, in validateArgumentsUpdate
    _validateArgumentOptions(arguments, argumentDefinition, "assign_optional")
  File "/usr/local/lib/python3.8/site-packages/WMCore/WMSpec/WMWorkloadTools.py", line 160, in _validateArgumentOptions
    arguments[arg] = _validateArgument(arg, arguments[arg], argDef)
  File "/usr/local/lib/python3.8/site-packages/WMCore/WMSpec/WMWorkloadTools.py", line 101, in _validateArgument
    _validateArgFunction(argument, value, argumentDefinition["validate"])
  File "/usr/local/lib/python3.8/site-packages/WMCore/WMSpec/WMWorkloadTools.py", line 133, in _validateArgFunction
    raise WMSpecFactoryException(msg)
WMCore.WMSpec.WMSpecErrors.WMSpecFactoryException: <@========== WMException Start ==========@>
Exception Class: WMSpecFactoryException
Message: Argument 'Team' with value '', doesn't pass the validate function.
It's definition is:
                              "validate": lambda x: len(x) > 0},

	ClassName : None
	ModuleName : WMCore.WMSpec.WMWorkloadTools
	MethodName : _validateArgFunction
	ClassInstance : None
	FileName : /usr/local/lib/python3.8/site-packages/WMCore/WMSpec/WMWorkloadTools.py
	LineNumber : 133
	ErrorNr : 0

Traceback: 

<@---------- WMException End ----------@>

[21/Oct/2024:12:40:30]  SERVER REST ERROR WMCore.ReqMgr.DataStructs.RequestError.InvalidSpecParameterValue b605a54fcc92e4fbf18b20eabe763378 (Invalid spec parameter value: Argument 'Team' with value '', doesn't pass the validate function.
It's definition is:
                              "validate": lambda x: len(x) > 0},
)
[21/Oct/2024:12:40:30]    Traceback (most recent call last):
[21/Oct/2024:12:40:30]      File "/usr/local/lib/python3.8/site-packages/WMCore/REST/Server.py", line 749, in default
[21/Oct/2024:12:40:30]        return self._call(RESTArgs(list(args), kwargs))
[21/Oct/2024:12:40:30]      File "/usr/local/lib/python3.8/site-packages/WMCore/REST/Server.py", line 828, in _call
[21/Oct/2024:12:40:30]        v(apiobj, request.method, api, param, safe)
[21/Oct/2024:12:40:30]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 224, in validate
[21/Oct/2024:12:40:30]        raise InvalidSpecParameterValue(msg) from None
[21/Oct/2024:12:40:30]    WMCore.ReqMgr.DataStructs.RequestError.InvalidSpecParameterValue: InvalidSpecParameterValue b605a54fcc92e4fbf18b20eabe763378 [HTTP 400, APP 1102, MSG 'Invalid spec parameter value: Argument \'Team\' with value \'\', doesn\'t pass the validate function.\nIt\'s definition is:\n                              "validate": lambda x: len(x) > 0},\n', INFO None, ERR None]
[21/Oct/2024:12:40:30] reqmgr2-bcdccd8c6-hsmlj 188.185.16.223:33620 "PUT /reqmgr2/data/request/tivanov_ACDC_TaskChain_Prod_SiteListsTest_v6_241021_114436_6621 HTTP/1.1" 400 Bad Request [data: 3544 in 895 out 53804 us ] [auth: ok  ***]

@todor-ivanov todor-ivanov force-pushed the feature_SeteWhitelist_SupportChangeInReqmgr2_fix-12307 branch from 0111d4d to 62a6d43 Compare October 21, 2024 13:18
@amaltaro
Copy link
Contributor

@todor-ivanov given that this PR was started while we had the 3 pull requests merged into master/head, wouldn't it make this development more clean if we apply the changes in this PR on top of #12148?

Or having a second look into the commits in this one, it looks like it has all the commits from #12148. If that is the case, should we close out #12148 to avoid any possible confusion?

In addition, please make sure the initial PR description is up-to-date.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Oct 23, 2024

hi @amaltaro that was my idea as well. I am pretty much in favor of closing #12148. While testing the fix about the broken assignment-approved transition, provided here with my last two commits, I had to rebase on top of it #12148 already, so the current PR has all that was there as well.

I'll merge the two PR descriptions as well.

@amaltaro
Copy link
Contributor

And before I forget to say it once again, when validating this feature a couple of weeks ago, I noticed that we could update the site lists in ReqMgr2 Web UI as well.

It looks like the original idea for get_modifiable_properties was to render fields in the Web UI, instead of making that an authoritative list of parameters that are allowed to change in each status.

If we want to repurpose that, I think it is fine. But I am not very comfortable with allowing users to change site lists in the Web UI. It's very much error-prone, and given the cost of this operation across the system, we need to keep it as low as we can.

Said that, @todor-ivanov can you please check with the P&R team on the actual needs to have this feature in the Web UI as well? If they are happy having this feature only through REST API - programmatically - that would be the best IMO.

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro, if I am to revert this at this stage, it would require a significant refactoring of the whole idea behind the whole change. I am not even sure it would be possible, because this is how it is validated - this is my understanding of the code even at the current stage. WE were simply ignoring anything sent with the user's request and just setting up the priority field (all the rest we were setting to zero). I am not sure we are actually changing any behavior. To it seems we are all good. . And I actually do find it quite useful to have it exposed to the WEB UI. At the end adding a feature to be available through one interface and not through the another... would be yet one more if we should remember forever.

@amaltaro
Copy link
Contributor

The way the REST and Web UI are constructed is different, so there is no conditional statement involved in this story.

I also agree that having it in the Web UI is useful. But my concern is on typos and mistakes by using the Web UI, which can perhaps lead to error. Right now, Web UI is only used for ACDC assignment, AFAIK.
I am curious to know the position of the P&R team on this. Can you please communicate it with Ahmed/Hassan?

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Oct 23, 2024

typos and mistakes by using the Web UI,

the SiteLists are drop down menus.

@todor-ivanov
Copy link
Contributor Author

And we got the P&R reply -The'd prefer to have the WEB UI interface as well

@amaltaro
Copy link
Contributor

Thank you Todor. That would have been my answer as well, from the user perspective.

@khurtado
Copy link
Contributor

test this please

@anpicci
Copy link
Contributor

anpicci commented Oct 28, 2024

@amaltaro @todor-ivanov I propose to keep going with the proposed solution, with claryfing to operators they will be accountable for any disruptions induced with improper use of this functionality.

I have only question:

  • This solution prevents anyone to pick a site that isn't part of our resource site pool, am I correct? For example, I can select T1_FNAL_US, but I cannot select T2_FNAL_US

Btw, @todor-ivanov there are some failing checks for Jenkins

@todor-ivanov
Copy link
Contributor Author

hi @anpicci :

This solution prevents anyone to pick a site that isn't part of our resource site pool, am I correct? For example, I can select T1_FNAL_US, but I cannot select T2_FNAL_US

Yes. The site names are predefined and already populated in the list of possibilities:
siteLists_assignment-approved_WEBUI

Btw, @todor-ivanov there are some failing checks for Jenkins

I cannot make sense of these so far. @khurtado how should I read them? I see you have cancelled the tests, are those reliable at this stage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support change to SiteWhitelist/SiteBlacklist in ReqMgr2 for active workflows
5 participants