Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --recursive option to fs ls command #513

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

justinTM
Copy link

@justinTM justinTM commented Jul 8, 2022

No description provided.

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR, this looks useful.

Two general comments:

  • It seems to me that recursive implies absolute, or at least relative to the specified path. As is, it would allow for recursively showing the basename only, which doesn't make a lot of sense. Can you make the behavior of non-absolute recursive listing show the relative paths?
  • Please add a couple tests for this behavior.

if f.is_dir:
recursive_echo(this_dbfs_path.join(f.basename))

recursive_echo(dbfs_path) if recursive else echo_path(dbfs_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modifies existing behavior to no longer list files in the specified path, but just the path itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @pietern thank you for your helpful feedback! i believe i fixed this in the latest commit:

def _recursive_list(self, **kwargs):


def echo_path(files):
table = tabulate([f.to_row(is_long_form=l, is_absolute=absolute) for f in files],
tablefmt='plain')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to tabulate if the max width of these files is different?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pietern thanks again. i believe tabulate will only be called once now after the latest commit, and the max file width should be a single value.

@pietern pietern requested a review from stormwindy July 11, 2022 11:46
Comment on lines 67 to 69
for f in files:
if f.is_dir:
recursive_echo(this_dbfs_path.join(f.basename))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a rough estimation how much more requests this would generate? Do we have customers who already manually doing this and we are just giving them a shortcut?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @stormwindy thanks for your review. i truly have no idea how many more requests it will generate; totally dependent on customer file structure. will be 1 extra request for each directory in the path and all of their subdirectories.

Copy link
Collaborator

@stormwindy stormwindy Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. What are the use cases for this, at least in your perspective? Unless there is a crazy deep folder structure this shouldn't cause any problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting filepaths of all worker logs in a cluster: #512

Copy link
Collaborator

@stormwindy stormwindy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the idea to add this option is good on our end. I am also happy with the overall implementation direction. I have left one not comment about the code. After that @pietern can decide if/when to approve the PR. Thanks a lot for the contribution.

databricks_cli/dbfs/cli.py Outdated Show resolved Hide resolved
fix Dbfs files assignment
@stormwindy stormwindy requested a review from pietern July 20, 2022 13:35
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinTM Thanks for addressing my earlier comment. You refactored a bunch of the PR and I think there are correctness issues. Please have a look.

paths = self.client.list_files(*args, **kwargs)
files = [p for p in paths if not p.is_dir]
for p in paths:
files = files + self._recursive_list(p) if p.is_dir else files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this a couple times to understand what you're doing. I think it should be simplified to iterate over paths just once and then have an if p.is_dir in there with both dealing with a file or a directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate comment: this also need to pass along the headers from the **kwargs to be consistent with the list_files API today. Otherwise you use it on the first call but not later calls.

assert len(files) == 2
assert TEST_FILE_INFO0 == files[0]
assert TEST_FILE_INFO1 == files[1]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a test for a real recursive call.

The variable TEST_FILE_JSON2 is unused. When you make a test that uses it and does a real recursive call, I expect it to fail for the reason I commented about above; the mismatch between FileInfo and DbfsPath.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants