Update/hackathon feedback #35

LTDakin · 2024-10-03T03:22:10Z

Based on developer feedback from the hackathon, certain aspects of the experience stood to be improved

get_hdu now works on a fits file path instead of doing the fetching for it from the archive. This way developers can fetch the file once and use the get_hdu utility to work with different headers without re-fetching the whole file.
current_percent was renamed to operation progress due to the many confusions about what it was representing the percent for
new function create_output() that accepts a numpy array and then creates the FITS, Jpgs, and uploads it to the bucket. Developers didn't want to have to do all of this manually and once they had finished mutating the numpy array were ready to send output. All the old functions remain for edge cases where you would want more control over the process.
create_fits now takes a comment argument to better track what operations were performed and their inputs. Could help an instructor verify that a student ran the right operations or help a developer debug a broken product.

…ts files

…rk of creating fits, jpgs, and storing the output for you

mgdaily

There's a few places where the logic is starting to get complicated, and without some tests of these new functions directly, I am starting to worry about the maintainability of this

mgdaily · 2024-10-03T19:06:21Z

datalab/datalab_session/file_utils.py

@@ -112,3 +109,18 @@ def scale_points(height_1: int, width_1: int, height_2: int, width_2: int, x_poi
    x_points = width_2 - x_points

  return x_points, y_points
+
+def create_output(cache_key, np_array=None, fits_file=None, large_jpg=None, small_jpg=None, index=None, comment=None):


I think we're doing a lot here for the programmer, and there's a ton of combinations of args here that can result in errors that aren't being handled. What happens, for instance, if np_array is none and fits_file is none? Presumably save_fits_and_thumbnails will be called and it will error out. There's nothing indicating to the developer that certain args in certain combinations are required, nor is it clear what exceptions will be raised.

I think we can create a class that does some input validation, throws errors if things are mal-defined and has some nice helper methods to abstract some of the work away from the developer - maybe we can whiteboard this out tomorrow?

mgdaily · 2024-10-03T19:23:15Z

datalab/datalab_session/file_utils.py


 log = logging.getLogger()
 log.setLevel(logging.INFO)

-def get_hdu(basename: str, extension: str = 'SCI', source: str = 'archive') -> list[fits.HDUList]:
+def get_hdu(path: str, extension: str = 'SCI') -> list[fits.HDUList]:


Similarly, I think a lot of these get_hdu, get_fits, etc... could live in a class that abstracts them away into a very simple, testable interface. I'm having trouble following what a few of these functions do and how they should be called, so maybe we can meet about this tomorrow?

mgdaily · 2024-10-03T19:36:14Z

datalab/datalab_session/tests/test_operations.py

-    @mock.patch('datalab.datalab_session.data_operations.median.save_fits_and_thumbnails')
-    @mock.patch('datalab.datalab_session.data_operations.median.create_jpgs')
+    @mock.patch('datalab.datalab_session.data_operations.data_operation.get_fits')
+    @mock.patch('datalab.datalab_session.file_utils.save_fits_and_thumbnails')


Something I had noticed, but glossed over before, is the amount of mocking we're having to do which points to the fact that the code isn't easily testable (components are tightly coupled, for instance) and means we should consider re-factoring a bit.

Aside from this, for Test classes that will mock a whole set of things to test a data operation for instance, you can mock out the common things at the class level so you don't have to repeat them for each test in the class.

jnation3406

Looks good but I would like to maintain updating progress while downloading input files

jnation3406 · 2024-10-03T19:12:45Z

datalab/datalab_session/data_operations/data_operation.py

            image_data_list.append(sci_hdu.data)
-
-            if percent is not None and cur_percent is not None:
-                self.set_percent_completion(cur_percent + index/total_files * percent)


A lot of the operation progress is used when downloading input files to work on. It seems bad to remove any reporting of that progress. I.e. if you do an operation with 100 images, you will sit there with 0% progress for a long while before it finishes loading all 100 images. Can we retain the reporting of progress up to some maximum based on the number of images its downloading?

Yeah.. it's a lot harder to track the progress like you said, downloading is most of the work for many operations. I also was dubious about removing the download percent. Next time I should call a meeting to go over the feedback on whether we should action on it or not. Customer isn't always right haha

jnation3406 · 2024-10-03T20:44:12Z

datalab/datalab_session/data_operations/rgb_stack.py

+        fits_paths = []
+        for file in rgb_input_list:
+            fits_paths.append(get_fits(file.get('basename')))
+            self.set_operation_progress(self.get_operation_progress() + 0.2)


This is a generic comment and I know we had already been doing this before, but calling the .get_operation_progress() every time we want to set the value causes an extra lookup into the remote cache, which is wasteful. We should probably just maintain a local variable with the operation progress in the class and use this during the operate method so we don't waste time retrieving it from the cache each time we set it.

jnation3406 · 2024-10-03T20:49:01Z

datalab/datalab_session/tests/test_operations.py

-    @mock.patch('datalab.datalab_session.data_operations.median.save_fits_and_thumbnails')
-    @mock.patch('datalab.datalab_session.data_operations.median.create_jpgs')
+    @mock.patch('datalab.datalab_session.data_operations.data_operation.get_fits')
+    @mock.patch('datalab.datalab_session.file_utils.save_fits_and_thumbnails')


Aside from this, for Test classes that will mock a whole set of things to test a data operation for instance, you can mock out the common things at the class level so you don't have to repeat them for each test in the class.

…on progress but boost to performance with not checking the cache so often

LTDakin added 4 commits September 30, 2024 12:00

rename percents to operation progress, refactor get_hdu to work on fi…

4f91efc

…ts files

new create_output function that accepts a numpy array and does the wo…

26ed7f5

…rk of creating fits, jpgs, and storing the output for you

fix tests for create_output change, overwrite temp files

e0411d1

output comments to product FITS files

7ee0233

LTDakin requested review from mgdaily and jnation3406 October 3, 2024 03:22

LTDakin mentioned this pull request Oct 3, 2024

Hackathon Feedback Changes LCOGT/datalab-ui#108

Merged

update test fits

6130cb0

mgdaily requested changes Oct 3, 2024

View reviewed changes

mgdaily reviewed Oct 3, 2024

View reviewed changes

jnation3406 approved these changes Oct 3, 2024

View reviewed changes

LTDakin added 2 commits October 4, 2024 11:49

np_array mandatory, downloading images is 50% of work, dumber operati…

69ae917

…on progress but boost to performance with not checking the cache so often

improved normalization loading feel

9564803

LTDakin merged commit 92b7a11 into main Oct 4, 2024
3 checks passed

LTDakin deleted the update/hackathon-feedback branch October 4, 2024 19:09

LTDakin restored the update/hackathon-feedback branch October 4, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update/hackathon feedback #35

Update/hackathon feedback #35

LTDakin commented Oct 3, 2024

mgdaily left a comment

mgdaily Oct 3, 2024

mgdaily Oct 3, 2024

mgdaily Oct 3, 2024

jnation3406 Oct 3, 2024

jnation3406 left a comment

jnation3406 Oct 3, 2024

LTDakin Oct 4, 2024

jnation3406 Oct 3, 2024

jnation3406 Oct 3, 2024

Update/hackathon feedback #35

Update/hackathon feedback #35

Conversation

LTDakin commented Oct 3, 2024

mgdaily left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnation3406 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment