Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bilinear #157

Merged
merged 7 commits into from
Nov 27, 2023
Merged

Bilinear #157

merged 7 commits into from
Nov 27, 2023

Conversation

amaurea
Copy link
Contributor

@amaurea amaurea commented Aug 24, 2023

Support for bilinear mapmaking. Standard nearest neighbor mapmaking is still the default, and its performance is unchanged in my tests.

@amaurea
Copy link
Contributor Author

amaurea commented Aug 24, 2023

I'll make pull requests for sotodlib and pwg-scripts that depend on this when this one is accepted.

@amaurea amaurea requested a review from mhasself August 25, 2023 07:32
Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review, see comments -- haven't been able to run this yet due to conda / glibc conflicts... but will try on a different system tomorrow.

I think my biggest concern is interface breakage -- it seems like this changes the shape of the RangesMatrix "threads" object, even in the case of interpol="nn". So existing sotodlib.coords code, if it inspected threads for some reasons, would probably choke. I am not sure whether any of the important code does so. Alas there don't seem to be any sotodlib unit tests that probe pmat, so that doesn't help us.

Comment on lines 504 to 512
// For nearest neighbor this basically reduces to GetPixel
// FIXME: In this function and its siblings (including GetPixel),
// the use of int() is buggy for // x < 0 or y < 0 because int
// rounds towards zero, not towards -inf.int(floor()) would be
// correct, but probably also slightly slow.
int ix = int(coords[0] / cdelt[1] + crpix[1] - 1 + 0.5);
if (ix < 0 || ix >= naxis[1]) return 0;
int iy = int(coords[1] / cdelt[0] + crpix[0] - 1 + 0.5);
if (iy < 0 || iy >= naxis[0]) return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unnecessary regression. In the original code, the bounds check was done on the float computation result, before the int truncation. That was safe (though I should have had a comment warning about that).

Comment on lines 528 to 529
int x1 = int(x);
int y1 = int(y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like leaving in the trunc-towards-zero bug. How about:

int x1 = int(x) - int(x < 0);
int y1 = int(y) - int(y < 0);

@@ -444,6 +449,8 @@ class Pixelizor2_Flat<NonTiled> {
pixel_index[0] = int(iy);
pixel_index[1] = int(ix);
}
inline
int GetPixels(int i_det, int i_time, const double *coords, int pixinds[interp_count][index_count], FSIGNAL pixweights[interp_count]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of interp_count here is giving me compilation failures:

/home/mhasse/work/code/simonso/so3g/src/Projection.cxx:453:76: error: size of array ‘pixinds’ is not an integral constant-expression
  453 |     int GetPixels(int i_det, int i_time, const double *coords, int pixinds[interp_count][index_count], FSIGNAL pixweights[interp_count]);
      |                                                                            ^~~~~~~~~~~~
/home/mhasse/work/code/simonso/so3g/src/Projection.cxx:453:123: error: size of array ‘pixweights’ is not an integral constant-expression
  453 | e, const double *coords, int pixinds[interp_count][index_count], FSIGNAL pixweights[interp_count]);
      |                                                                                     ^~~~~~~~~~~~

/.../so3g/src/Projection.cxx:646:76: error: size of array ‘pixinds’ is not an integral constant-expression
  646 |     int GetPixels(int i_det, int i_time, const double *coords, int pixinds[interp_count][index_count], FSIGNAL pixweights[interp_count]);
      |                                                                            ^~~~~~~~~~~~
/.../so3g/src/Projection.cxx:646:123: error: size of array ‘pixweights’ is not an integral constant-expression
  646 | e, const double *coords, int pixinds[interp_count][index_count], FSIGNAL pixweights[interp_count]);
      |                                                                                     ^~~~~~~~~~~~

I see why what you did could work, if the compiler were smart enough ... but it does not work with my gcc 13.2.1.

Can you make interp_count a property of the Interpol class, and use that instead? (e.g. see what is done with comp_count). That worked for me -- in the main class templates (lines 397 and 557) I set

    static const int interp_count = Interpol::count;

and then didn't have to specialize the values afterwards.

Copy link
Contributor Author

@amaurea amaurea Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all works for me though! This is all tested and runs, and produces the right output. interp_count is an integral constant-expression. I think somehow something must be missing from this pull request since you're getting inconsistent results from what I get, when we both use gcc.

That said, I think having it in interpol is a good idea (though it could technically be more limited, as the number of pixels that are touched in a given interpolation scheme could in theory on the pixelization itself. That doesn't seem very likely in practice though).

I used gcc 9.3.1, btw. Very odd that the newer compiler should be stupider.

for (int imap=0; imap<S::comp_count; ++imap)
*_pixelizor.pix(imap, pixel_offset) += sig * pf[imap] * det_wt;
const FSIGNAL sig = _signalspace->data_ptr[i_det][_signalspace->steps[0]*i_time];
// This is where things change for bilinear mapmaking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment block is narrative from the perspective of someone modifying the nearest-neighbor code for bilinear. Maybe edit a bit for future readers?

}
}
return ivals;
}

// WIP for bilinear mapmaking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful discussion, perhaps edit a bit and move to top of file?

@@ -578,6 +592,7 @@ def get_active_tiles(self, assembly, assign=False):
q_native = self._cache_q_fp_to_native(assembly.Q)
# This returns a G3VectorInt of length n_tiles giving count of hits per tile.
hits = np.array(projeng.tile_hits(q_native, assembly.dets))
print("hits", hits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray print

@amaurea
Copy link
Contributor Author

amaurea commented Sep 5, 2023

I think my biggest concern is interface breakage -- it seems like this changes the shape of the RangesMatrix "threads" object, even in the case of interpol="nn". So existing sotodlib.coords code, if it inspected threads for some reasons, would probably choke. I am not sure whether any of the important code does so. Alas there don't seem to be any sotodlib unit tests that probe pmat, so that doesn't help us.

Yes, it does make a breaking change, but I went through all sotodlib code I could find that uses this, and updated it accordingly. Did I miss anything? Edit: Oh, right. I haven't made that pull request yet because it depends on this one. Chicken-and-egg.

I think this breaking change is worth it because the new structure is both general enough to handle basically any parallelization scheme while also having practically no overhead. I think it would be a bad idea to drag around two different shapes for the threads object depending on the interpolation used or the threading approach used. Maybe we can discuss this in a meeting if you disagree.

Edit: The old threads object was 3d with shape [parallel_groups][ndet][nrange], and was used like this (pseudo-code):

omp_parallel_for each group in threads:
   process_group(group)

The new one is 4d with shape [serial_groups][parallel_groups][ndet][nrange], and is used like this

for each bunch in threds:
   omp_parallel_for each group in bunch:
      process_group(group)

This allows for cases where not everything can be done in parallel at once, but there are still subsets of data that it is safe to do in parallel. This interface is very general, so it can support many different parallelization schemes, including the old one. However, as you say it is a breaking change in the interface. It would be possible to shoehorn this functionality into the old interface like this:

nbunch = ceil(len(threads)/nthread)
for i in range(nbunch):
   omp_parallel_for each group in threads[i*nthread:(i+1)*nthread]:
      process_group(group)

By using empty groups as necessary, this construction should support everything the 4d object can without needing any changes to the interface. However, I think it's much less clear what it does, so I think this could be error-prone in the future. What do you think?

Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's get this out there!

@mhasself mhasself merged commit a21add6 into master Nov 27, 2023
4 checks passed
@mhasself mhasself deleted the bilinear branch November 27, 2023 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants