-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solvent mask partial occ 202410 #345
base: master
Are you sure you want to change the base?
Solvent mask partial occ 202410 #345
Conversation
occupancies and alternate conformations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Considering altlocs is a clear improvement.
I added several comments, but haven't reviewed the main changes yet.
I'll go through it this week.
If you want to change anything in the meantime, force-push it into your branch.
@@ -301,7 +301,9 @@ struct Ccp4 : public Ccp4Base { | |||
read_ccp4_file(input.path()); | |||
} | |||
|
|||
void write_ccp4_map(const std::string& path) const; | |||
/// @param path filename for output CCP4/MRC map | |||
/// @param mode_out (optional) map mode (default of -1 leaves it at the current value); typical values are 0 (binary/integer mask) or 2 (float map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are doxygen @param
comments wrappable? Currently max. line length in gemmi is 100.
assert(ccp4_header.size() >= 256); | ||
fileptr_t f = file_open(path.c_str(), "wb"); | ||
std::fwrite(ccp4_header.data(), 4, ccp4_header.size(), f.get()); | ||
int mode = header_i32(4); | ||
if (mode == 0) | ||
if (mode_out < 0 ) mode_out = mode; | ||
if (mode_out == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if mode_out
is different than mode
, the file will be incorrect, because ccp4_header.data()
written above has mode not matching the data.
I think the new arg is not needed, because you set mode before calling write_ccp4_map anyway.
use_points_around<true>(fctr, radius, [&](T& ref, double) { ref += value; }, false); | ||
else | ||
use_points_around<false>(fctr, radius, [&](T& ref, double) { ref += value; }, false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(comment for all the set_points_around_to_*
functions)
if all these functions were to be added, we'd need a general function to avoid duplication. A function that would take Position, radius, function and bool use_pbc
. But here it'd be simpler to move one function which is used to solmask.hpp. It reduces to two lines:
Fractional fctr = unit_cell.fractionalize(ctr);
use_points_around<true>(fctr, radius, [&](T& ref, double) { if(value<ref) {ref = value;} }, false);
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also rather have it in solmask.hpp, to not increase the size of grid.hpp.
In C++17 it will be just two lines if inlined:
for (auto& d : data)
d = std::clamp(d, min_value, max_value);
@@ -70,7 +70,7 @@ const option::Descriptor Usage[] = { | |||
}; | |||
|
|||
void print_cell_parameters(const char* prefix, const gemmi::UnitCell& cell) { | |||
printf("%s %g %7g %7g %6g %6g %6g\n", prefix, | |||
printf("%s %7g %7g %7g %6g %6g %6g\n", prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous PR got included here
@@ -32,6 +33,8 @@ const option::Descriptor Usage[] = { | |||
CommonUsage[Help], | |||
CommonUsage[Version], | |||
CommonUsage[Verbose], | |||
{ Verbosity, 0, "", "verbosity", Arg::Int, | |||
" --verbosity=I \tVerbosity setting." }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having both --verbose
and --verbosity
is confusing. The former is sufficient, because it can be used multiple times , e.g. -vvv
. You can change how many times it was used with
options[Verbose].count()
@@ -468,19 +470,22 @@ void Ccp4<T>::set_extent(const Box<Fractional>& box) { | |||
grid.axis_order = AxisOrder::Unknown; | |||
} | |||
|
|||
/// @param path filename for output CCP4/MRC map | |||
/// @param mode_out (optional) map mode (default of -1 leaves it at the current value); typical values are 0 (binary/integer mask) or 2 (float map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exact copy of the previous comment. Does doxygen require it in both places?
/// @param value value to use (unless use_atom_occupancy is set to true) | ||
/// @param ignore_hydrogen should we skip hydrogen atoms | ||
/// @param ignore_zero_occupancy_atoms should we skip atoms with occupancy of zero | ||
/// @param use_atom_occupancy should we use the occupancy of each atom (to build up a non-binary mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A higher level interface that wraps this function SolventMasker::mask_points().
If this function has documentation in Doxygen, users will expect that the API is stable and then they'll be upset if it changes. The functions that are documented also change from time to time, but I try to minimize it.
mask.set_points_around(atom.pos, radius, value); | ||
bool ignore_zero_occupancy_atoms, | ||
bool use_atom_occupancy, | ||
int verbosity = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to think how to simplify it. The code gets repeated between this function and mask_points_in_varied_radius. And both functions became quite long.
I'll need to check how these functions are used in the current code. Perhaps arg ignore_zero_occupancy_atoms
can be removed. Masks with real numbers have been used before, although only for "soft masks" that change smoothly as a function of radius. That idea wasn't fully tested. I don't know if "soft masks" can improve refinement.
bool use_atom_occupancy, | ||
int verbosity = 0) { | ||
|
||
if (verbosity>0) printf("\n#### constant-radius masker\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's an output for debugging it's fine.
For output that's meant for end users, it's a bit harder. Library functions can be called from a GUI app, or from web app, so the output should to be possible to redirect. In most of the functions I just avoid printing anything. If it's necessary to print something, recently I started using struct Logger
.
New features to create non-binary masks when dealing with partial
occupancies and alternate conformations.
The new command-line argument (to "gemmi mask") --set-occupancy triggers the switch from a binary (0,1) mask computation to one that accommodates partially occupied atoms. In that case we need to loop over all atoms and find (for each grid point) the maximum occupancy of any atom hitting it.
To accommodate the current method of running a flood-fill algorithm (for island remvavl), we need to create an intermediate binary grid ... at least that is the way I found it to work.
There might easily be more elegant ways of doing some of this, but the bottom line is: it seems to work and do exactly what is wanted and therefore improves the computation of the bulk solvent contribution during refinement ... at least for BUSTER (https://www.globalphasing.com/buster/)