This tutorial will guide you through the process of using SLaM to generate responses for your specific use case.
- SLaM: SLaM is a framework for human evaluation of language models for different tasks. It is designed to be flexible and easy to use, and it is built using jaclang.
- Human Evaluation: Human evaluation is the process of evaluating the performance of a language model by asking humans which is the best out of a given set of outputs (the identity of the model is hidden from the human evaluators). This is done to understand how well the model is performing and to compare different models for a given task.
- Task: The task is the specific problem that the language model is trying to solve. For example, the task could be to generate a summary of a given text, or to generate a response to a given prompt.
- Language Model: A language model is a model that is trained to generate text. It is trained on a large corpus of text and is used to generate text that is similar to the text in the training corpus.
- Prompt: The prompt is the input to the language model. It is the text that the language model uses to generate the output. For example, the prompt could be a question, and the output could be the answer to the question.
- Response: The response is the output of the language model. It is the text that is generated by the language model based on the prompt.
Follow the steps given in the README to install SLaM and its dependencies.
The first step is to run the Query Engine. The Query Engine is a web server that provides an API for generating responses from a language model. You can run the Query Engine using the following command:
uvicorn query_engine:serv_action --reload
NOTICE: If you are using the OpenAI's GPT-4, you need to setup the API key. You can do this by setting the
OPENAI_API_KEY
environment variable. NOTICE: If you are using the Ollama's LLMs, You need to have the ollama installed and ollama server running. You can do this by running the following commands:curl https://ollama.ai/install.sh | sh ollama serve
The next step is to select the settings for generating the responses. This includes selecting the language models to use, the prompt for the task, and the number of responses to generate etc.
You can select the settings using the Generator
Tab in the Admin Panel.
- Language Models: The language models to use for generating the responses. You can select the language models from the list of available language models.
- Number of Samples: The number of responses to generate for each language model. (Recommended: 10)
- Temperature: The temperature to use for generating the responses. (Recommended: 0.7)
- Prompt: You can use a prompt template here as well, but make sure to fill the input prompt values in the
Prompt Inputs Values
section.
NOTICE: If you want to use a unique identifier for the task, you can set the
Run ID
in the settings. This will help you to track the responses generated for the specific task.
Once you have selected the settings, you can generate the responses by clicking the Generate Responses
button and waiting until the responses for all the models are generated.
INFO: The responses will be saved in the
runs/<run_id>
folder in the root directory of the SLaM.
- How to use SLaM for Human Evaluation: This tutorial will guide you through the process of using SLaM for human evaluation for your specific use case.