Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for regenerating text #83

Open
6 tasks
MinaAlmasi opened this issue Dec 18, 2024 · 0 comments
Open
6 tasks

Updates for regenerating text #83

MinaAlmasi opened this issue Dec 18, 2024 · 0 comments

Comments

@MinaAlmasi
Copy link
Collaborator

MinaAlmasi commented Dec 18, 2024

I would suggest the following steps for when we want to rerun the text generation with new models:

Do before re-generating

  • Update vLLM and other necessary packages, so we can also update the Python version.

    Currently running everything with coder Python 1.87.2 on UCloud which has Python 3.10. There have been 9 updates since then to the UCloud App.

  • Look into whether vLLM have added a "min_token" parameter.

    Currently, I compute the length of strings and re-generate in a for loop for n = 20 times to avoid getting generations below the desired amount of tokens for each task. There is no need for this hacky solution if there is a built-in solution now.

  • Consider using entire model-names instead of the current short-hands

    E.g. use stabilityai/StableBeluga-7B or StableBeluga-7B instead of beluga7b

  • Remove model names as prefixes for completions column

    Back when I started the project, I somehow thought it was a good idea to add the model name as prefix to the "beluga7b_completions" which I ultimately remove in the folder make_dataset to standardise formats across models. It should just be called completions

After re-generating

  • Remove HF pipeline

    At the time of coding, I also created the possibility of using the HF interface also. I think for simplicity we should remove this. There is no need for the scope of the project (esp. if we want to split the repos at some point).

  • Run embeddings with smaller model

    I used nvidia/NV-Embed-v2 because it scored the highest on MTEB, but it is a heavy model - is it overkill for a baseline? I changed from FP32 to FP16 precision to make it less memory hungry, and could run it with a batch-size of 16 on the new nvidia L40 gpus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant