Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update LLM docs #352

Merged
merged 2 commits into from
Feb 26, 2024
Merged

Update LLM docs #352

merged 2 commits into from
Feb 26, 2024

Conversation

jonatanklosko
Copy link
Member

@jonatanklosko jonatanklosko commented Feb 26, 2024

I did another iteration of this. Currently running LLaMa 7B with params on the GPU requires 16GiB of memory. Params on the CPU + lazy transfers require 15.12GiB, which is almost negligible and given that it adds latency of like x4 inference time, I think it's no longer worth mentioning. Sidenote: lazy transfers don't really change anything here and that's what I would expect, since generation loops over the model and therefor all params need to be on the GPU. I'm sure how not having params on the GPU makes a difference, since they can't be garbage collected early either, but the difference is very tiny anyway.

Note that for Stable Diffusion params on the CPU + lazy transfers has more impact, because it uses several models, so once one finishes its params can be garbage collected and the next model params can be loaded lazily, so it does make sense.

I also added an example with Mistral.

@jonatanklosko jonatanklosko merged commit 9d84d45 into main Feb 26, 2024
2 checks passed
@jonatanklosko jonatanklosko deleted the jk-docs branch February 26, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant