Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/hackathon feedback #35

Merged
merged 7 commits into from
Oct 4, 2024
Merged

Update/hackathon feedback #35

merged 7 commits into from
Oct 4, 2024

Conversation

LTDakin
Copy link
Contributor

@LTDakin LTDakin commented Oct 3, 2024

Based on developer feedback from the hackathon, certain aspects of the experience stood to be improved

  1. get_hdu now works on a fits file path instead of doing the fetching for it from the archive. This way developers can fetch the file once and use the get_hdu utility to work with different headers without re-fetching the whole file.
  2. current_percent was renamed to operation progress due to the many confusions about what it was representing the percent for
  3. new function create_output() that accepts a numpy array and then creates the FITS, Jpgs, and uploads it to the bucket. Developers didn't want to have to do all of this manually and once they had finished mutating the numpy array were ready to send output. All the old functions remain for edge cases where you would want more control over the process.
  4. create_fits now takes a comment argument to better track what operations were performed and their inputs. Could help an instructor verify that a student ran the right operations or help a developer debug a broken product.

Copy link
Collaborator

@mgdaily mgdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few places where the logic is starting to get complicated, and without some tests of these new functions directly, I am starting to worry about the maintainability of this

@@ -112,3 +109,18 @@ def scale_points(height_1: int, width_1: int, height_2: int, width_2: int, x_poi
x_points = width_2 - x_points

return x_points, y_points

def create_output(cache_key, np_array=None, fits_file=None, large_jpg=None, small_jpg=None, index=None, comment=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're doing a lot here for the programmer, and there's a ton of combinations of args here that can result in errors that aren't being handled. What happens, for instance, if np_array is none and fits_file is none? Presumably save_fits_and_thumbnails will be called and it will error out. There's nothing indicating to the developer that certain args in certain combinations are required, nor is it clear what exceptions will be raised.

I think we can create a class that does some input validation, throws errors if things are mal-defined and has some nice helper methods to abstract some of the work away from the developer - maybe we can whiteboard this out tomorrow?


log = logging.getLogger()
log.setLevel(logging.INFO)

def get_hdu(basename: str, extension: str = 'SCI', source: str = 'archive') -> list[fits.HDUList]:
def get_hdu(path: str, extension: str = 'SCI') -> list[fits.HDUList]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I think a lot of these get_hdu, get_fits, etc... could live in a class that abstracts them away into a very simple, testable interface. I'm having trouble following what a few of these functions do and how they should be called, so maybe we can meet about this tomorrow?

@mock.patch('datalab.datalab_session.data_operations.median.save_fits_and_thumbnails')
@mock.patch('datalab.datalab_session.data_operations.median.create_jpgs')
@mock.patch('datalab.datalab_session.data_operations.data_operation.get_fits')
@mock.patch('datalab.datalab_session.file_utils.save_fits_and_thumbnails')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I had noticed, but glossed over before, is the amount of mocking we're having to do which points to the fact that the code isn't easily testable (components are tightly coupled, for instance) and means we should consider re-factoring a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from this, for Test classes that will mock a whole set of things to test a data operation for instance, you can mock out the common things at the class level so you don't have to repeat them for each test in the class.

Copy link
Contributor

@jnation3406 jnation3406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I would like to maintain updating progress while downloading input files

image_data_list.append(sci_hdu.data)

if percent is not None and cur_percent is not None:
self.set_percent_completion(cur_percent + index/total_files * percent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the operation progress is used when downloading input files to work on. It seems bad to remove any reporting of that progress. I.e. if you do an operation with 100 images, you will sit there with 0% progress for a long while before it finishes loading all 100 images. Can we retain the reporting of progress up to some maximum based on the number of images its downloading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. it's a lot harder to track the progress like you said, downloading is most of the work for many operations. I also was dubious about removing the download percent. Next time I should call a meeting to go over the feedback on whether we should action on it or not. Customer isn't always right haha

fits_paths = []
for file in rgb_input_list:
fits_paths.append(get_fits(file.get('basename')))
self.set_operation_progress(self.get_operation_progress() + 0.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generic comment and I know we had already been doing this before, but calling the .get_operation_progress() every time we want to set the value causes an extra lookup into the remote cache, which is wasteful. We should probably just maintain a local variable with the operation progress in the class and use this during the operate method so we don't waste time retrieving it from the cache each time we set it.

@mock.patch('datalab.datalab_session.data_operations.median.save_fits_and_thumbnails')
@mock.patch('datalab.datalab_session.data_operations.median.create_jpgs')
@mock.patch('datalab.datalab_session.data_operations.data_operation.get_fits')
@mock.patch('datalab.datalab_session.file_utils.save_fits_and_thumbnails')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from this, for Test classes that will mock a whole set of things to test a data operation for instance, you can mock out the common things at the class level so you don't have to repeat them for each test in the class.

…on progress but boost to performance with not checking the cache so often
@LTDakin LTDakin merged commit 92b7a11 into main Oct 4, 2024
3 checks passed
@LTDakin LTDakin deleted the update/hackathon-feedback branch October 4, 2024 19:09
@LTDakin LTDakin restored the update/hackathon-feedback branch October 4, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants