-
Notifications
You must be signed in to change notification settings - Fork 33
[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this doesn't support 4d activations, you should add asserts making sure that the ndims matches expectations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to understand this first, sorry
In the future it might be nice to take a step back and make a decision about when tensors need to be reshaped during the qdq process. Maybe rather than reshaping all of in The implementation below is what it might look like for both activations and weights. Some of this is wrong (it's late for me) but this function ensures the last dim is the granularity you want to quantize by. def reshape_for_groups(func):
def wrapper(x, args, ...):
assert x.ndim >= 2
if args.strategy == "token":
pass
if args.strategy == "channel":
x = x.unsqueeze(-1)
if args.strategy in ("group", "tensor_group"):
num_groups = x.size(-1) // args.group_size
x = x.unflatten(-1, (num_groups, args.group_size))
if args.strategy == "block":
block_height, block1_width = args.block_structure
x = x.unfold(-2, block_height, block_height) # [num_horiz, x.dim[-1], block_height]
x = x.unfold(-2, block1_width, block1_width) # [num_horiz, num_vert, block_height, block_width]
x = flatten(-4, -3) # [num_blocks, block_height x block_width]
x = func(x, args, ...)
if args.strategy == "token":
pass
if args.strategy == "channel":
x = x.squeeze(-1)
if args.strategy in ("group", "tensor_group"):
return x.flatten(-2, -1)
if args.strategy == "block":
x = torch.cat(x, dim=-2)
x = torch.cat(x, dim=-2)
return wrapper
@reshape_for_groups
def _process_quantization(x, args, ...):
if do_quantize:
...
if do_dequantize:
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
… Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update
* add utilities Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add additional tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add utils and tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Implement transform factories Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add permutations Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * key inverses by weight Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * standardize random hadamard Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * prepend input hooks Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply sqrt division first Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use divided hadamards Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add random option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use random seeds, rename matrix multiply Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add deterministic generation to random matrix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix perm math Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup 2 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * make seed optional Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove iterable check and missing return value Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Remove unrelated changes * simplify code Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * implement apply, use in tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use hadamards database file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try manifest Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try setup, update hadamards list Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrings, cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup, thank you @dbarbuzzi Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove numpy, add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * solidify dtype, add gpu tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add device option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * save construction device changes for later Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device * cite nja sloane Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove dreg Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * put on device via safe_open Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * nits and docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Merge * merge with construct: construct in float32 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct with same dtype, constructing on fp32 found no difference Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unnecessary imports Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * bugfixes (#375) Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> * use factory_kwargs Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add frozen dict to deps Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrign Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use parametrize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * populate _dynamic_tied_weights_keys Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * ensure serializable Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove extra space Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge dregs * skip offloading tests until transformers changes land Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use set Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update * serialize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo, comment Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
* add utilities Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add additional tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add utils and tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Implement transform factories Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add permutations Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * key inverses by weight Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * standardize random hadamard Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * prepend input hooks Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply sqrt division first Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use divided hadamards Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add random option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use random seeds, rename matrix multiply Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add deterministic generation to random matrix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix perm math Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup 2 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * make seed optional Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove iterable check and missing return value Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Remove unrelated changes * simplify code Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * implement apply, use in tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use hadamards database file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try manifest Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try setup, update hadamards list Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrings, cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup, thank you @dbarbuzzi Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove numpy, add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * solidify dtype, add gpu tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add device option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * save construction device changes for later Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device * cite nja sloane Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove dreg Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * put on device via safe_open Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * nits and docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Merge * merge with construct: construct in float32 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct with same dtype, constructing on fp32 found no difference Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unnecessary imports Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * bugfixes (#375) Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> * use factory_kwargs Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add frozen dict to deps Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrign Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use parametrize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * populate _dynamic_tied_weights_keys Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * ensure serializable Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove extra space Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge dregs * skip offloading tests until transformers changes land Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use set Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update * serialize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo, comment Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * include format Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
… Compression Params (neuralmagic#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update
* add utilities Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add additional tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add utils and tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Implement transform factories Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add permutations Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * key inverses by weight Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * standardize random hadamard Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * prepend input hooks Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply sqrt division first Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use divided hadamards Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add random option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use random seeds, rename matrix multiply Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add deterministic generation to random matrix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix perm math Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup 2 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * make seed optional Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove iterable check and missing return value Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Remove unrelated changes * simplify code Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * implement apply, use in tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use hadamards database file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try manifest Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try setup, update hadamards list Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrings, cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup, thank you @dbarbuzzi Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove numpy, add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * solidify dtype, add gpu tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add device option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * save construction device changes for later Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device * cite nja sloane Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove dreg Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * put on device via safe_open Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * nits and docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Merge * merge with construct: construct in float32 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct with same dtype, constructing on fp32 found no difference Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unnecessary imports Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * bugfixes (neuralmagic#375) Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> * use factory_kwargs Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add frozen dict to deps Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrign Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use parametrize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * populate _dynamic_tied_weights_keys Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * ensure serializable Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove extra space Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge dregs * skip offloading tests until transformers changes land Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use set Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params (neuralmagic#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update * serialize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo, comment Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
* add utilities Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add additional tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add utils and tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Implement transform factories Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add permutations Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * key inverses by weight Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * standardize random hadamard Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * prepend input hooks Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply sqrt division first Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use divided hadamards Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add random option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use random seeds, rename matrix multiply Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add deterministic generation to random matrix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix perm math Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cleanup 2 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * make seed optional Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove iterable check and missing return value Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Remove unrelated changes * simplify code Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * implement apply, use in tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use hadamards database file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try manifest Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * try setup, update hadamards list Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrings, cleanup Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix setup, thank you @dbarbuzzi Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove numpy, add tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * solidify dtype, add gpu tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add device option Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * save construction device changes for later Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct on execution device, cache on offload device * cite nja sloane Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove dreg Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * put on device via safe_open Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * nits and docstrings Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Merge * merge with construct: construct in float32 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * construct with same dtype, constructing on fp32 found no difference Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unnecessary imports Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * bugfixes (neuralmagic#375) Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> * use factory_kwargs Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add frozen dict to deps Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use delete_offload_module Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstrign Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use parametrize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * populate _dynamic_tied_weights_keys Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * ensure serializable Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove extra space Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * apply style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * merge dregs * skip offloading tests until transformers changes land Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use set Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params (neuralmagic#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update * serialize Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo, comment Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * include format Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Uh oh!
There was an error while loading. Please reload this page.