Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of a new softmax version with PLANE_WISE mode support #3022

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Cydral
Copy link
Contributor

@Cydral Cydral commented Sep 27, 2024

This PR introduces a new implementation of the softmax layer, offering increased flexibility and better support for Large Language Models (LLMs) and other applications requiring 2D tensor processing.

Main changes:

  1. Addition of a mode parameter to the softmax() and softmax_gradient() utility functions.
  2. Implementation of PLANE_WISE mode in addition to the existing CHANNEL_WISE mode.
  3. Update of softmax and softmaxm aliases to use the new class.

Change details:

  • Addition of a softmax_mode enumeration with CHANNEL_WISE and PLANE_WISE options.
  • Modification of the softmax_ class to account for the operating mode.
  • Update of softmax() and softmax_gradient() functions to process data differently based on the chosen mode.
  • Adaptation of comments and documentation to reflect the new behaviors.
  • Update of unit tests to cover both operating modes.

Compatibility:

  • This update is backward compatible with existing code using the old softmax.
  • Users can easily switch to the new softmax or softmaxm to benefit from the improvements.

Tests:

  • New unit tests added to verify correct behavior of both modes.
  • Regression tests performed to ensure existing functionalities are not affected.

@Cydral Cydral changed the title Implementation of a new softmax version with plane-wise mode support Implementation of a new softmax version with PLANE_WISE mode support Sep 27, 2024
@davisking
Copy link
Owner

Ha yeah, same here as the other PR. This ready for review? It's conflicted with master. Maybe they are all ready for review but just need to be merged with master?

@Cydral
Copy link
Contributor Author

Cydral commented Sep 30, 2024

Ha yeah, same here as the other PR. This ready for review? It's conflicted with master. Maybe they are all ready for review but just need to be merged with master?

Absolutely. Thank you Davis.

@davisking
Copy link
Owner

Sorry I took so long to come back to this. Been a busy few weeks :|

Anyway, I just pulled this and tried to run the unit tests but got compile errors. I.e. I did make -j6 dtest && ./dtest --test_dnn to run the dnn tests. I'm doing it on a machine with cuda so it's building the cuda parts but those have some errors. Be sure to test all these on such a machine :D

@Cydral
Copy link
Contributor Author

Cydral commented Oct 17, 2024

Sorry I took so long to come back to this. Been a busy few weeks :|

Anyway, I just pulled this and tried to run the unit tests but got compile errors. I.e. I did make -j6 dtest && ./dtest --test_dnn to run the dnn tests. I'm doing it on a machine with cuda so it's building the cuda parts but those have some errors. Be sure to test all these on such a machine :D

Since you merged one of the new layers introduced to implement the attention mechanism in Dlib, I've noticed new branch conflicts appearing. I imagine the compilation issue comes from that, as everything works fine on my end. I'm looking into it again to make the necessary adjustments and I'll let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants