refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer #133

TeddyHuang-00 · 2024-08-29T02:00:17Z

Thank you for your brilliant work! I saw the official PyTorch implementation was added a few days ago, and I wrote a script for converting the PAX model checkpoints into PyTorch ones. Hope this is helpful for you as well as others fellow researchers!

This PR consists of the following changes:

Add a conversion script for converting PAX model checkpoints to PyTorch state dicts
Update the README for instruction on how to use the conversion script
Change the nested nn.Sequential in ResidualBlock.hidden_layer to match the layout of other child nodes ResidualBlock.output_layer and ResidualBlock.residual_layer

The original hidden_layer in ResidualBlock consists of a nn.Linear and a nn.SiLU. Sperating them will affect nothing, but make the layer structure consistent with other child nodes output_layer and residual_layer

google-cla · 2024-08-29T02:00:21Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

TeddyHuang-00 · 2024-08-29T02:07:59Z

A side note here, I have no idea on how or whether the PyTorch model will also be included in the published package, or is it going to be a optional feature like timesfm[pytorch]. I just put the conversion script there because there seems no better place for it.

rajatsen91 · 2024-08-29T03:42:56Z

Hi @TeddyHuang-00, Thanks and nice work. I am ok to merge the changes to the residual block. For the convert weights, wehave a version of that already. But the reason we have not checked that in is because we are contemplating directly uploading the pytorch weights to huggingface. If you can split the pull request, I can check in the residual block change. For the convert_weights we need to think a little bit more.

TeddyHuang-00 · 2024-08-29T04:03:36Z

Hi @rajatsen91, glad to hear that you have a working solution already. I updated this PR, and please let me know if I can help with the PyTorch version. I am glad to help!

refactor: Unnest nn.Sequential for simpler handling

271aecf

The original hidden_layer in ResidualBlock consists of a nn.Linear and a nn.SiLU. Sperating them will affect nothing, but make the layer structure consistent with other child nodes output_layer and residual_layer

TeddyHuang-00 force-pushed the feat-convert-model branch from 08eaa76 to 271aecf Compare August 29, 2024 04:01

TeddyHuang-00 changed the title ~~feat: convert model to PyTorch~~ refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer #133

refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer #133

TeddyHuang-00 commented Aug 29, 2024

google-cla bot commented Aug 29, 2024

TeddyHuang-00 commented Aug 29, 2024

rajatsen91 commented Aug 29, 2024

TeddyHuang-00 commented Aug 29, 2024

refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer #133

Are you sure you want to change the base?

refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer #133

Conversation

TeddyHuang-00 commented Aug 29, 2024

google-cla bot commented Aug 29, 2024

TeddyHuang-00 commented Aug 29, 2024

rajatsen91 commented Aug 29, 2024

TeddyHuang-00 commented Aug 29, 2024