-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of a new softmax version with PLANE_WISE mode support #3022
base: master
Are you sure you want to change the base?
Conversation
…PU & CUDA) and Add `add_to` Parameter
* remove using namespace std from headers * more std:: * more std:: * more std:: on windows stuff * remove uses of using namespace std::chrono * do not use C++17 features * Add Davis suggestion * revert some more stuff * revert removing include * more std::chrono stuff
Ha yeah, same here as the other PR. This ready for review? It's conflicted with master. Maybe they are all ready for review but just need to be merged with master? |
Absolutely. Thank you Davis. |
Sorry I took so long to come back to this. Been a busy few weeks :| Anyway, I just pulled this and tried to run the unit tests but got compile errors. I.e. I did |
Since you merged one of the new layers introduced to implement the attention mechanism in Dlib, I've noticed new branch conflicts appearing. I imagine the compilation issue comes from that, as everything works fine on my end. I'm looking into it again to make the necessary adjustments and I'll let you know. |
This PR introduces a new implementation of the softmax layer, offering increased flexibility and better support for Large Language Models (LLMs) and other applications requiring 2D tensor processing.
Main changes:
mode
parameter to thesoftmax()
andsoftmax_gradient()
utility functions.softmax
andsoftmaxm
aliases to use the new class.Change details:
softmax_mode
enumeration with CHANNEL_WISE and PLANE_WISE options.softmax_
class to account for the operating mode.softmax()
andsoftmax_gradient()
functions to process data differently based on the chosen mode.Compatibility:
softmax
.softmax
orsoftmaxm
to benefit from the improvements.Tests: