-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACES2 Output Transforms #1983
ACES2 Output Transforms #1983
Conversation
Thank you so much for doing this Remi! Everything aside from ACES2CPUHelpers.h looks perfect. And please assume that OCIO will move to C++14 minimum if that allows a cleaner implementation. I understand, as you wrote above, that the ACES2CPUHelpers.h module is a direct port of the CTL. But wow, this still really feels like research code. I'm surprised it's only 8x slower than ACES 1, there are so many unnecessary conversions being done. I know you wrote above that optimization is still a TODO, but it really seems like that is going to be a huge task in order to get something reasonably performant. I don't think it's simply a matter of pre-multiplying a few matrices and swapping in the fast power function. For example, I see things like:
And that seems like it should be replaced by a spline or something. But building these approximations would take time and may not be exact. I'm very worried about the amount of work here, given that OCIO 2.4 is supposed to release in only a few months. It seems like we need more people working on this. I feel like we should ask the ACES WG to provide CTL that is closer to being ready to implement? |
bc3be88
to
d36b1e0
Compare
Pushed an update with my current work in progress, this mainly split the new FixedFunctionOps into 3 for JMh conversion, Tonescale / ChromaCompress and GamutCompress. The code still follows the CTL from aces-dev and has not been optimized further yet (as discussed). One point that will probably be interesting to agree on is the FixedFunctionOps split strategy, as this will be hard to change later and might impact the optimization. There is still some duplication of work here and there (like Matrix that could be merged) and the GPU path also has texture duplication where ChromaCompression and GamutCompression uses the same tables. |
Awesome @remia! I built some test wheels on test.pypi.org: https://test.pypi.org/project/opencolorio/ The Windows tests were breaking so I bypassed them: ea1229e |
Looking at the builtins, we have those currently:
It would be useful for me to have all of them, even if they are null ops so that we can move on the config side. I'm becoming mildly concerned about the 15 September deadline considering the quantity of work left. |
d23243c
to
1c4e83c
Compare
@KelSolaar I added more (if not all) transforms in my latest commit and started a spreadsheet to list everything I have so far. I'm not opposed to changing the list or names if any feedback comes. Note that I dropped the distinction between VIDEO and CINEMA as there is not really any difference I think so far (eg. no surround compensation in the CTL). |
Thanks for the updates Remi! This looks really good. I have a few comments:
|
Once fixed, I'm also getting these additional warnings + error in HLSL:
The error stems from EDIT: Actually, |
Just built new wheels, will check the builtins soon: https://test.pypi.org/project/opencolorio/2.4.0.dev2/ |
Can see a bunch of stuff in the builtins :)
|
Thanks for the feedback!
I just did a re-order pass, now both tabs should be aligned and sorted by CTL names.
Based on the other names, especially the D60 ones, the part after the "in" is meant to be the encoding primaries (of the container space), which are needed for the white scaling step. For example
I think everything is here, sorry for the misleading wording!
Thanks for sharing those warnings / errors, that's really helpful as I don't have an easy way to check HLSL. I agree the texture sampling in the while loop is currently far from ideal (probably even from a performance only standpoint). In theory I think there would not be a need to compute derivative here as we are really doing NEAREST sampling on the texture, not needing to pick a minification or magnification filter (but there might very well be other reasons I'm unaware of), in my GLSL tests I used texelFetch() for example, but to maintain compatibility with older implementations we can't rely on that, I assume HLSL may have similar features. I'm still working on more comprehensive comparison against CTLs on actual images, seeing some GPU texture sampling issues as well on some transforms that might be related. |
I also think it would be preferable not to include "D60" in the name as that could be confusing. In the ACES repo, the white point used in the name is the parameter value for encoding white point, so is the value that needs to be passed to the Just saying "XYZ", without a specific white is, as you say, @doug-walker, implicitly normalised to Illuminant E. |
I started to fill-up the spreadsheet: https://docs.google.com/spreadsheets/d/1z3xsy3sF0I-8AN_tkMOEjHlAs13ba7VAVhrE8v4WIyo/edit?gid=1761235937#gid=1761235937 This will require cross-checking but I think that we are missing quite a few builtins and I could not assign some of them:
I haven't had a chance to check the code yet to find out what they are. That being said, the assignment/filling is not exactly trivial as the name are not always close. |
Hi @KelSolaar, The assignment of ViewTransform to CTLs should be available in the spreadsheet shared earlier. I agree the names are not directly matching in some cases, for example all the 48 nits use the same parameters as the 100 nits, an alternative would be to duplicate the ViewTransform in OCIO to have them under different names but not sure I'd like that. |
Ah thanks! I actually thought about the spreadsheet after posting, it was late and I was intending to check today. Thanks for the 48 nits explanation, that makes sense to me, I will adjust our other spreadsheets accordingly and see how far I was :D |
1c4e83c
to
7b07576
Compare
I pushed an update that should fix the HLSL shader compilation issue by extracting the hue component from the texture into a constant array, as suggested by @KevinJW. Also included some minor updates to keep up with the CTL code (though this will have to be more thoroughly reviewed). Also sharing a script and OCIO config I have been using to compare CTL and OCIO results on images. This is a very simple script so far and is mostly valuable for the CTL to OCIO display / view mapping with the accompanying OCIO test config. I updated the reference CTL to use the parametric chroma norm. From what I can tell, here are a list of remaining TODOs before merging in the hope to share the workload:
|
@remia: As of 6585f18, I can only see the 31 builtins from the comment above. Is this expected or should I have the 54 from the spreadsheet? I haven't had time to check the code to see what is going on. |
Yes this is correct, multiple CTLs can share the same Builtin transform if only the display encoding changes. I added a count of unique values below the tables in the spreadsheet, it indeed sum to 31. |
Thanks, I just checked too, I have an issue in the generator, I need to look at what is going on there instead. |
Good catch, I fixed that now, hopefully should work! |
Yes it works, All the builtins were found! I need to add the displays on the config spreadsheet and I should be good. |
Just pulled the branch and I get a new faiilure that I didn't previously ... [ctest] ERROR: test_style (FixedFunctionTransformTest.FixedFunctionTransformTest.test_style) |
b0d7f29
to
be02e2e
Compare
28fdacc
to
778d408
Compare
Signed-off-by: Rémi Achard <remiachard@gmail.com>
d4bb175
to
bcd7382
Compare
Signed-off-by: Rémi Achard <remiachard@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remi, this is truly amazing, incredibly thorough work.
I think you made excellent design decisions, especially considering the complexity of the work. In the end, I think opting for fewer FixedFunctions is the correct decision; and I really love the parameterization -- peak luminance + output chromaticities + nothing else, which makes building higher level functions around 'em much more intuitive. In a config, it's nice to see an array of numbers that actually feels super human-friendly, which is not something I thought I'd find myself saying about the interfaces to ACES2 stuff.
I've been messing with this PR in python for the past few hours -- everything behaves exactly as expected. I can't really speak to the efficiency of the implementations, other than to say stuff feels fast, and the generated shader code looks smart.
Really nice work, man. No notes, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed a more detailed pass through this awesome work. Thank you so much Remi for helping us find a path up the very steep mountain that is ACES 2!
ss.newLine() << "hwrap = hwrap - floor(hwrap / 360.f) * 360.f;"; | ||
ss.newLine() << "hwrap = (hwrap < 0.0f) ? hwrap + 360.f : hwrap;"; | ||
|
||
ss.newLine() << ss.floatDecl("i_lo") << " = floor(hwrap);"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't clear on why no base index to add? Not in the CPU version (which looks a bit different) but you have it below for the gamma table. But didn't have time to really study it, so I'm sure you're correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following the CTL code here but I don't remember exactly why the ReachM table doesn't have the additional 2 entries wrap like the Gamut cusp and the Gamma hull. I'm not clear why the wrap around is needed for the Gamma hull as well. @nick-shaw probably knows, think it is related to ampas/aces-core#148.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That table is evenly spaced in h
, unlike the other cusp tables, and so the lookup code is different. I seem to remember that @scottdyer had a problem making the wrap-around work in one case. Perhaps he never worked out what his issue with that one was, so left it as is. The extra entries are really an implementation efficiency, not a core part of the algorithm.
|
||
MatrixOpData::MatrixArrayPtr matrixToXYZ | ||
= build_conversion_matrix_to_XYZ_D65(limiting_pri, ADAPTATION_NONE); | ||
CreateMatrixOp(ops, matrixToXYZ, TRANSFORM_DIR_FORWARD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the limiting_pri is sometimes D60, it does seem that the ADAPTATION_NONE is the correct option to use to match what the CTL is doing for the current set of output transforms. But I am puzzling over it.
All of the current output transforms fall in two sets: either D65, or D60 in a D65 container. Unlike ACES 1, aside from the DCDM, there are no displays that are not calibrated to D65. However, if there was a desire to add a D60 calibrated display (or some other white point), I was trying to think how that would work. The way we have been doing the built-in ACES configs, the non-D65 displays have a Bradford adaptation from the D65 of the reference space to their calibration white.
I think we could use the D65 view transforms for a rendition that will be at the display's calibration white point but if we want to provide a true D65 or D60 version, we would need new view transforms. Those view transforms may need to undo the Bradford adaptation in the display.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me the concept of "white point simulation" does not map easily to OCIO v2 viewing transforms workflows. I'm not sure if there is an ideal solution with the current system. As a further option, maybe one could also create a dedicated display for the D60 Sim in D65 case where the display side processing would chromatically adapt from D65 to D60, then apply a white scaling factor like is done in the builtin currently. This is assuming the builtin always "correctly" apply a chromatic adaptation to D65 as intended by cie_xyz_d65. This should allow to avoid duplicated view transforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @doug-walker! I'll have a look at all your points in detail, but answering some of them already to keep discussion going.
|
||
MatrixOpData::MatrixArrayPtr matrixToXYZ | ||
= build_conversion_matrix_to_XYZ_D65(limiting_pri, ADAPTATION_NONE); | ||
CreateMatrixOp(ops, matrixToXYZ, TRANSFORM_DIR_FORWARD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me the concept of "white point simulation" does not map easily to OCIO v2 viewing transforms workflows. I'm not sure if there is an ideal solution with the current system. As a further option, maybe one could also create a dedicated display for the D60 Sim in D65 case where the display side processing would chromatically adapt from D65 to D60, then apply a white scaling factor like is done in the builtin currently. This is assuming the builtin always "correctly" apply a chromatic adaptation to D65 as intended by cie_xyz_d65. This should allow to avoid duplicated view transforms?
Signed-off-by: Rémi Achard <remiachard@gmail.com>
Signed-off-by: Rémi Achard <remiachard@gmail.com>
Signed-off-by: Rémi Achard <remiachard@gmail.com>
I did some profiling on the CPU path and GPU timings and didn't find regression since earlier results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Hi Remi, do you have any additional commits planned? When you're done, please go ahead and merge it. :) |
Good to merge, I updated the branch against main, feel free to do it when the tests are completed! |
Fantastic job Remi! |
Opening this PR early even though it's very much a work in progress / prototype so far, the code is more or less imported directly from the CTL.
Next steps includes: