-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I know there's an existing issue #48 that covers reduce with drift mode, but I thought it would be cleaner to open a new one here to keep track of my efforts.
I'm looking at ways to speed up reduce, with two common use cases:
- Full-frame, unbinned data is very slow to process
- Ultra-fast drift mode reductions can't keep up with the data
I've started looking at some profiling for both cases. First off, simply timing how long each frame takes to get reduced. This is only running on my laptop, so the absolute times probably aren't that useful, but it's a benchmark to get started with.
FF, 1x1 images (using a random standard star observation):
- Without plotting: dt = 0.64
$\pm$ 0.18 s - With plotting: dt = 0.66
$\pm$ 0.13 s - Actual time between frames: 3.06 s
Drift mode (using one of Vik's FRB observations):
- Without plotting: dt = 0.043
$\pm$ 0.026 s - With plotting: dt = 0.088
$\pm$ 0.041 s - Actual time between frames: 0.013 s
For the FF images we're obviously well within the frame time, but over half a second for each frame feels painfully slow. Plotting makes very little difference.
On the other hand with drift mode it's well known that reduce can't keep up with the instrument. Here it's very clear: without plotting we're ~3.1x slower than real time, while with plotting we're ~6.5x slower. The code is already pretty well optimised, so reaching those speedups might be a challenge.
I've gone through and done some line profiling of the code called in reduce. These are all with plotting disabled for now, since for FF runs there's no difference and drift mode runs already need improving before we start considering plotting.
- ~90% of the time spent in
scripts.reduce()is spent calling thereduction.ProcessCCDsprocesser class, not a surprise since that does the actual reduction. - 99.7% of time in
reduction.ProcessCCDs.__call__()goes straight to thereduction.ccdprocfunction. - ~90% of the time in
reduction.ccdproc()goes to themoveApersfunction, with only 10% going to the actual extraction function (e.g.reduction.extractFlux) - ~90% of the time in
reduction.moveApers()goes to the profile fit (in this casefitting.fitMoffat) viafitting.combFit(which just selects Moffat or Gaussian fit functions) - Finally in
fitting.fitMoffat()90% of the time is spent in the least squares fit, which usesscipy.optimize.least_squares().
So there are a few other losses along the way, but the majority (~66%) of the time spent when reducing is in carrying out the Moffat fit for adjusting the aperture positions. Which I think wasn't unexpected. So that's the main focus of trying to find extra speed. The fitting code is already very well optimised by Tom, the actual functions fitting.moffat() and fitting.dmoffat() etc use Numba JIT compiling and (at least in the case with ndiv=0) there's isn't much obvious to improve. But this is only an initial look.
Incidentally going through the fitting module I did find one awkward big introduced in 03783f6 (there are actually two errors I found traced back to that commit, the minor one is that Gaussian fit has nfev += nfev rather than nfev += res.nfev). The more major error (I think) is in fitMoffat where the initial doubled threshold which is supposed to only be for the first pass is actually used in all cases:
Lines 403 to 408 in 817f2fa
| # reject any above the defined threshold | |
| if first_fit: | |
| # first fit carried out with higher threshold for safety | |
| sigma[ok & (np.abs(resid) > 2*sfac*thresh)] *= -1 | |
| else: | |
| sigma[ok & (np.abs(resid) > 2*sfac*thresh)] *= -1 |
fitGaussian has the correct code
Lines 1205 to 1209 in 817f2fa
| if first_fit: | |
| # first fit carried out with higher threshold for safety | |
| sigma[ok & (np.abs(resid) > 2*sfac*thresh)] *= -1 | |
| else: | |
| sigma[ok & (np.abs(resid) > sfac*thresh)] *= -1 |
This means that using the default RMS rejection threshold fit_thresh=8 from any .red file has actually been using 16 instead, when using the Moffat profile.
I'm bringing this up here because when I tried fixing the bug and removing the wayward 2* the code massively slowed down, going from 0.6 seconds per loop for FF data to 1.2s! That's a bit concerning, but I need to read more into it to understand exactly what's going on and why that change make such a difference.