JP-3121: initial Code to Implement C-Extensions for Ramp Fit (#156)

* Adding initial C code to ramp fitting. Adding the setup.py necessary to install C extensions for ramp fitting. Adding first attempt at C code. Adding a setup.cfg file to be used for setup. Updating calling location and C code. Updated include ordering, as well as returning NoneType. Can compile calling setup.py directly and running a basic script. Still cannot run 'pip install -e .'. This generates a failure saying it's unable to find the numpy module, raising ModuleNotFoundError. Updating setup. Fixing install files to properly work with C extension framework in ramp fitting. Changing names. Updating C code to parse input parameters. Adding ramp handling for each pixel. Updating ramp data structure and pixel ramp data structure. Getting a pixel ramp and printing it to the screen. Updating code style and adding a simple linked list to handle segment computations. Completed median rate computation without flags and without special cases. Cleaning up the code and comments. The non-special case median rate computation now works. Commenting out calls to C extension function. Putting functions in alphabetical order to make them easier to navigate. Alphabetizing the functions to make them easier to search. Finding a bug in the median rate computation. Updating setup and Nympy macro for C based on code review. Fixed the local copy of the DQ for an integration for median rate computation. Completed the median rate calculations that accounts for flags in the ramp. Beginning to update type checking for ndarrays passed to C. Figuring out endianness solution. Still need to figure out how to detect endianness. Checking for a computing byteswapping makes things slower than python. Testing and comparing python to C code. Working on the weighting of a segment for ramp fitting. Continuing with weighted fitting. Finished the segment computations. Removed 'real_t' typedef to make code cleaner. Finished pixel ramp computations but the read noise computation is different from python code. JIC commit. JIC commit. Debugging the segment computation of the read noise variance. Updated the read noise computations for normal segments. Updated the documentation for ramp fitting to make the documentation for clear on the details of the computations. Removed extra blank lines from CI test file. Creating output base arrays and fix use of pixeldq in pixel_ramp. Packaged up output results from the C computations to be passed back to the python code. Adding a square root to the final computation of the combined error for the rate and rateints products. Successful CI tests for basic ramps run through the C extension. JIC commit. Fixing segment pruner. Adding test cases for C extension. Adding cases. Updated the final DQ array. Started implementing the optional results product. Moving debugging print statements. Adding optional results memory managment during ramp fitting. Outputting the optional results product. Some of the values still need to be computed. Updating computing the optional results product. Updating dimensions of the pedestal array. Adding pedestal calculation. Updating where the group time divide happens. Adding special case computation and testing. Tracing bug in special case. Working out slope computation bug. Forcing the use of C extension to ensure CI tests use the C extensions. Updating tests and DQ flag computations. Adding special case for one group segment. Working on special cases. Updating one group ramp testing. The variance computation is correct now, but need to investigate the slope computation, which may also be wrong in python. Working on a new test. Rearranging code to make it easier to read. Refactoring module API. Splitting ramp data getter into multiple, smaller functions. Updating tests, as well as refactoring the module interface. Updating the flags for suppressed ramps. Changing the C interface to make it simpler and easier to read. Cleaning up old code and adding one group suppression flagging. Modifying setup.py to get it to properly install C and cython stuff. Modifying setup to get it to work, since I am not sure how to resolve the conflicts resulting from the use of by C and cython. Updating invalid integrations test. ZEROFRAME test works. Suppressed one group tests work. Updating return code from C extension. Updating test_2_group_cases testing. Bugs were found in the python code, so those should be corrected first before finishing the C code. Updating code and tests for python to account for invalid integrations and invalid groups for median rate calculations. Updating error in median rate computation. Investigating differences on branch with main branch. Properly updating ols_fit.py from main branch updates. Finishing up C translation. Will need to further investigate two group ramp special case for rateints (see test_invalid_integrations). Updating the setup.py file to properly install the cython and c extension modules. Updating tests and setup.py format. Removing unneeded comments. Removing debugging imports. Fixing ZEROFRAME logic bug, as well as removing debugging code. All STCAL CI tests are passing with the C code. Need to check the JWST tests. Updating segment slope calculation for NaN values. Updating computation of read noise variance for bad gain value. Updating the debugging comments, as well as finishing the first group orphan test in JWST CI testing. Updating how the pedestal gets computed. Updating median rate computation for case 1 in JWST. Updating slope fitter to pass case 4 in the JWST CI test. Updating debugging functions. JIC. Changing variable name for failing test. Cleaning up the code. Skipping case 5 which fails for C due to architectural differences. Updating the handling of bad gain values. Base ramp fit and case testing pass. Updating the computation of SNR to handle non-positive values. Updating the ramp fit OLS code for use of C code. Changing declaration statements causing build failures. Added debugging macros. Removing importation of debugging functions used for debugging. Removed endian handlers for C, since it should be handled in python. Changing switch to use python code. Endianness now handled in pythong, instead of the C-extension. Adding C code usage flag and comment to figure out how to change the dtype of a byte swapped ndarray. Adding a toggle to switch between double and floats for internal C computation. Using doubles cause some problems with testing, which need to be corrected or debugged for use of doubles vs floats. Removing float switch for uint32_t type. Switching to doubles in the C code from floats fixed a test case. Updating how pixel DQ's from input ramp models propagate to rateints DQ's. Removing debugging import from CI test module. Removing unneeded comments. Updating debugging methods with more intermediate variables and functions. Commenting out debugging code. Commenting out debugging code. Updating median rate check for segment variance computation for a 2 group ramp. Updated the median rate computation to account for NaN differences in the first two group difference. Adding debugging method to RampData class. Changing variable names to make them more descriptive. Updating debugging functions for the RampData class and updating the C median rate computation, fixing many differences with the python code for MIRI image regression test. Expanding computation for ease of reading. Updating the computation of SNR and power of a segment. Updating debugging function. Updating slope fitter to properly handle the first difference NaN case. Pruning segments of unnecessary one group segments. Style changes to make the code easier to read and debug. Added checking of large total variance integration. Final draft commit of C code for ramp fitting. Endianness correction for CHARGELOSS processing. Updating the changelog. Added the 'sys' import that got deleted during a rebase. Also, updated the C extension switch to make sure the C extension runs for regression testing on Jenkins. Changing initialization due to regression testing build failures. Adding the average dark current to ramp fitting C extension, fixing all but a handful of differences in regression tests, which may just be expected differences. Adding a comment for possible update of Poisson variance computation using the average dark current. Updating comments and changing dark current usage due to STCAL PR #254. Cleaning up code. * Updating the dark current processing and skipping a failing test. * Cleaning up code. * Changing long integration test to have the correct dtype for the ramp data arrays. * Updating comments. * Updating the STCAL code to use the algorithm class variable in the JWST RampFitStep class to call the C code vs the python code. To call the C code set the algorithm class variable needs to be set to 'OLS_C'. * Removing unnecessary symbol. * Simplifying returns and removing labels. * Checking return values and properly deallocating memory for the optional results product. * Checking return values and managing memory when errors occur for packaging the results. * Removing endianness references to ramp data getters. * Removing usage of a PyDataMem_FREE in favor of Py_XDECREF. * Initializing variable at declaration. * Updating C code for better memory management. Updated change log. Removed print statements. Updated comments in testing. There is a problem in the '_cases' negative average dark test. * Removing unused functions. * Removing unused functions and updating comments. * Cleaning up the code, removing unused functions, and expanding comments. * Cleaning up code. Updating comments. * Updating comments. * Updating comments. * Adding extension build tests according to code review feedback. * Changing the use of C as default to use of the python code as default. * Updating the use of the C code to be able to be selected programmatically, but the python code is chosen by default. * Adding logging for processing time. * Forcing python code usage. * Default to python code, but the C extension can be selected by using the 'OLS_C' algorithm. * Updating comments. * Expand comments related to endian issues for the C extension. * Expanding comments for clarity. * Using incorrect variable name causing errors. --------- Co-authored-by: Howard Bushouse <bushouse@stsci.edu>
spacetelescope · May 20, 2024 · 3cc52f1 · 3cc52f1
1 parent ccd93bb
commit 3cc52f1
Show file tree

Hide file tree

Showing 11 changed files with 4,105 additions and 143 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -23,6 +23,6 @@ jobs:
         # Until we have arm64 runners, we can't automatically test arm64 wheels
         - cp3*-macosx_arm64
       sdist: true
-      test_command: python -c "from stcal.ramp_fitting.ols_cas22 import _ramp, _jump, _fit"
+      test_command: python -c "from stcal.ramp_fitting.ols_cas22 import _ramp, _jump, _fit; from stcal.ramp_fitting import slope_fitter"
     secrets:
       pypi_token: ${{ secrets.PYPI_PASSWORD_STSCI_MAINTAINER }}
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -24,6 +24,8 @@ ramp_fitting
   for catching all-zero variance cases when average dark current was not
   specified. [#255]
 
+- Refactor ramp fitting using a C extension to improve performance. [#156]
+
 1.7.0 (2024-03-25)
 ==================
 

diff --git a/docs/stcal/ramp_fitting/description.rst b/docs/stcal/ramp_fitting/description.rst
@@ -149,21 +149,27 @@ the least-squares fit is calculated with the first and last samples. In most pra
 cases, the data will fall somewhere in between, where the weighting is scaled between the
 two extremes.
 
-The signal-to-noise ratio :math:`S` used for weighting selection is calculated from the
-last sample as:
+
+For segment :math:`k` of length :math:`n`, which includes groups :math:`[g_{k}, ...,
+g_{k+n-1}]`, the signal-to-noise ratio :math:`S` used for weighting selection is
+calculated from the last sample as:
 
 .. math::
     S = \frac{data \times gain} { \sqrt{(read\_noise)^2 + (data \times gain) } } \,,
 
+where :math:`data = g_{k+n-1} - g_{k}`.
+
 The weighting for a sample :math:`i` is given as:
 
 .. math::
-    w_i = (i - i_{midpoint})^P \,,
+    w_i = \frac{ [(i - i_{midpoint}) / i_{midpoint}]^P }{ (read\_noise)^2 } \,,
+
+where  :math:`i_{midpoint} = \frac{n-1}{2}` and :math:`i = 0, 1, ..., n-1`.
 
-where :math:`i_{midpoint}` is the the sample number of the midpoint of the sequence, and
-:math:`P` is the exponent applied to weights, determined by the value of :math:`S`. Fixsen
-et al. 2000 found that defining a small number of P values to apply to values of S was
-sufficient; they are given as:
+
+is the the sample number of the midpoint of the sequence, and :math:`P` is the exponent
+applied to weights, determined by the value of :math:`S`. Fixsen et al. 2000 found that
+defining a small number of P values to apply to values of S was sufficient; they are given as:
 
 +-------------------+------------------------+----------+
 | Minimum S         | Maximum S              | P        |
@@ -185,12 +191,14 @@ Segment-specific Computations
 +++++++++++++++++++++++++++++
 The variance of the slope of a segment due to read noise is:
 
-.. math::
-   var^R_{s} = \frac{12 \ R^2 }{ (ngroups_{s}^3 - ngroups_{s})(tgroup^2) } \,,
+.. math::  
+   var^R_{s} = \frac{12 \ R^2 }{ (ngroups_{s}^3 - ngroups_{s})(tgroup^2)(gain^2) } \,,
 
-where :math:`R` is the noise in the difference between 2 frames,
-:math:`ngroups_{s}` is the number of groups in the segment, and :math:`tgroup` is the group
-time in seconds (from the keyword TGROUP).
+where :math:`R` is the noise in the difference between 2 frames, 
+:math:`ngroups_{s}` is the number of groups in the segment, and :math:`tgroup` is the group 
+time in seconds (from the keyword TGROUP).  The divide by gain converts to
+:math:`DN`.  For the special case where as segment has length 1, the
+:math:`ngroups_{s}` is set to :math:`2`.
 
 The variance of the slope in a segment due to Poisson noise is:
 
@@ -258,10 +266,10 @@ The combined variance of the slope is the sum of the variances:
 The square-root of the combined variance is stored in the ERR array of the output product.
 
 The overall slope depends on the slope and the combined variance of the slope of each integration's
-segments, and hence is a sum over integrations and segments:
+segments, so is a sum over integration values computed from the segements:
 
-.. math::
-    slope_{o} = \frac{ \sum_{i,s}{ \frac{slope_{i,s}} {var^C_{i,s}}}} { \sum_{i,s}{ \frac{1} {var^C_{i,s}}}}
+.. math::    
+    slope_{o} = \frac{ \sum_{i}{ \frac{slope_{i}} {var^C_{i}}}} { \sum_{i}{ \frac{1} {var^C_{i}}}}
 
 
 .. _ramp_error_propagation:

diff --git a/setup.py b/setup.py
@@ -6,6 +6,17 @@
 Options.docstrings = True
 Options.annotate = False
 
+# package_data values are glob patterns relative to each specific subpackage.
+package_data = {
+    "stcal.ramp_fitting.src": ["*.c"],
+}
+
+# Setup C module include directories
+include_dirs = [np.get_include()]
+
+# Setup C module macros
+define_macros = [("NUMPY", "1")]
+
 # importing these extension modules is tested in `.github/workflows/build.yml`; 
 # when adding new modules here, make sure to add them to the `test_command` entry there
 extensions = [
@@ -27,6 +38,12 @@
         include_dirs=[np.get_include()],
         language="c++",
     ),
+    Extension(
+        "stcal.ramp_fitting.slope_fitter",
+        ["src/stcal/ramp_fitting/src/slope_fitter.c"],
+        include_dirs=include_dirs,
+        define_macros=define_macros,
+    ),
 ]
 
 setup(ext_modules=cythonize(extensions))