Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: add option --clean-workdir to verdi node delete #6756

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 65 additions & 1 deletion src/aiida/cmdline/commands/cmd_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,9 +328,14 @@ def extras(nodes, keys, fmt, identifier, raw):
@click.argument('identifier', nargs=-1, metavar='NODES')
@options.DRY_RUN()
@options.FORCE()
@click.option(
'--clean-workdir',
is_flag=True,
Copy link
Member

@unkcpz unkcpz Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put default to True is backward incompatible and potentially danger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's exactly what this command does :-)
It define it as a flag with is_flag=True so the default value is False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Then could you write default=False to be more explicit?

help='Also clean the remote work directory. (only applies to workflows and CalcJobNodes)',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only applies to workflows and CalcJobNodes

It is better to revise this message because this applies to any node, as a node can be an input of the workchain or calcjob.

e.g., but could be improved.

Suggested change
help='Also clean the remote work directory. (only applies to workflows and CalcJobNodes)',
help='Also clean the remote work directories associated with the nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point I wanted to make clear, is that doesn't apply if one pass a RemoteData node.
Just to respect traversal rule, and to make it easier, RemoteData directories can only be cleaned if the parent calcjob is passed to the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point I wanted to make clear, is that doesn't apply if one pass a RemoteData node.

I think it will also work if you pass a RemoteData node, because the associated process node will be deleted thus the RemoteData node itself will be cleaned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I can improve the help message.

)
@options.graph_traversal_rules(GraphTraversalRules.DELETE.value)
@with_dbenv()
def node_delete(identifier, dry_run, force, **traversal_rules):
def node_delete(identifier, dry_run, force, clean_workdir, **traversal_rules):
"""Delete nodes from the provenance graph.

This will not only delete the nodes explicitly provided via the command line, but will also include
Expand All @@ -349,6 +354,65 @@ def node_delete(identifier, dry_run, force, **traversal_rules):
except ValueError:
pks.append(NodeEntityLoader.load_entity(obj).pk)

if clean_workdir:
from aiida.manage import get_manager
from aiida.orm import AuthInfo, CalcJobNode, QueryBuilder, User, load_computer
from aiida.orm.utils.remote import get_calcjob_remote_paths
from aiida.tools.graph.graph_traversers import get_nodes_delete

backend = get_manager().get_backend()
# For here we ignore missing nodes will be raised via func:``delete_nodes`` in the next block
pks_set_to_delete = get_nodes_delete(
pks, get_links=False, missing_callback=lambda missing_pks: None, backend=backend, **traversal_rules
)['nodes']

qb = QueryBuilder()
qb.append(CalcJobNode, filters={'id': {'in': pks_set_to_delete}}, project='id')
calcjobs_pks = [result[0] for result in qb.all()]

if not calcjobs_pks:
echo.echo_report('--clean-workdir ignored. No CalcJobNode associated with the given node, found.')
Comment on lines +385 to +386
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do early return here to avoid nested if..else.

else:
path_mapping = get_calcjob_remote_paths(
calcjobs_pks,
only_not_cleaned=True,
)

if not path_mapping:
echo.echo_report('--clean-workdir ignored. CalcJobNode work directories are already cleaned.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return here as well?

else:
descendant_pks = [remote_folder.pk for paths in path_mapping.values() for remote_folder in paths]

if not force and not dry_run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return?

echo.echo_warning(
f'YOU ARE ABOUT TO CLEAN {len(descendant_pks)} REMOTE DIRECTORIES! ' 'THIS CANNOT BE UNDONE!'
)
echo.echo_info(
'Remote directories of nodes with the following pks would be cleaned: '
+ ' '.join(map(str, descendant_pks))
)
click.confirm('Shall I continue?', abort=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command asks for confirmation twice: 1) cleaning remote folders and 2) deleting nodes. I would prefer a single confirmation prompt by merging the messages into one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I did this, was to provide additional flexibility so to only clean Remote directories without actually deleting the nodes.
For instance in #4693 that might be useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes sense to me. 👍


if dry_run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early return as well?

echo.echo_report(
'Remote folders of these node are marked for deletion: ' + ' '.join(map(str, descendant_pks))
)
else:
user = User.collection.get_default()
counter = 0
for computer_uuid, paths in path_mapping.items():
computer = load_computer(uuid=computer_uuid)
transport = AuthInfo.collection.get(
dbcomputer_id=computer.pk, aiidauser_id=user.pk
).get_transport()

with transport:
for remote_folder in paths:
remote_folder._clean(transport=transport)
Copy link
Member

@superstar54 superstar54 Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the remote folder is already cleaned or not, either here or inside the _clean method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That already been done via flag only_not_cleaned:

path_mapping = get_calcjob_remote_paths(calcjobs_pks, only_not_cleaned=True,)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

counter += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar code is used in calcjob_cleanworkdir. Consider refactoring into a reusable function to avoid duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted modularize, but then I thought it's also used only in these two functions (here and calcjob_cleanworkdir), Also it's specific codeme, I'm not sure if it worth to modularize it. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to modularize as much as we can 😆 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it reduces the readability. If it's only for two use case.
But ok, I'll see what I can do. ;)


echo.echo_report(f'{counter} remote folders cleaned on {computer.label}')

def _dry_run_callback(pks):
if not pks or force:
return False
Expand Down
Loading