LTI's Large Language Model Deployment

TODO: Add a description of the project.

This repo was originally a fork of the huggingface's BLOOM inference demos, ported to it's own repo to allow for more flexibility in the future.

Installation

pip install -e .

Example API Usage

Currently, the client must be run from a compute node on the tir cluster. If you don't have access to the tir cluster, please contact your advisor and ask.

Run the following commands, where tir-x-xx is the current location of the lti-llm running process. The first parameter, text corresponds the prompt that will be forced-decoded by the model. The function will return a list of Output objects, one for every prompt in the input list.

import llm_client

client = llm_client.Client(address="tir-x-xx")
ouputs = client.prompt("CMU's PhD students are")
print(outputs[0].text)

Model State

It is also possible to obtain the raw logit scores / output distribution from the model.

import llm_client

client = llm_client.Client(address="tir-x-xx")
outputs = client.prompt("CMU's PhD students are", output_scores=True)
print(outputs[0].scores.shape)

And equivalently, it is possible to obtain the raw hidden states from the model.

import llm_client

client = llm_client.Client(address="tir-x-xx")
outputs = client.prompt("CMU's PhD students are", output_hidden_states=True)
for layer in outputs[0].hidden_states:
    print(f"Layer {layer}: {layer.shape}")

Other Available Arguments

The rest available arguments are basically identical to Huggingface transformers' model.generate function. However, not all arguments are available, and better documentation of the ones that are will provided in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
inference_server		inference_server
llm_client		llm_client
.gitignore		.gitignore
README.md		README.md
launch_alpaca_fp16_2gpu_server.sh		launch_alpaca_fp16_2gpu_server.sh
launch_bloom_int8_8gpu_server.sh		launch_bloom_int8_8gpu_server.sh
launch_llama30b_int8_2gpu_server.sh		launch_llama30b_int8_2gpu_server.sh
launch_llama65b_int8_4gpu_server.sh		launch_llama65b_int8_4gpu_server.sh
launch_llama7b_fp16_2gpu_server.sh		launch_llama7b_fp16_2gpu_server.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LTI's Large Language Model Deployment

Installation

Example API Usage

Model State

Other Available Arguments

About

Releases

Packages

Contributors 4

Languages

neulab/lti-llm-deployment

Folders and files

Latest commit

History

Repository files navigation

LTI's Large Language Model Deployment

Installation

Example API Usage

Model State

Other Available Arguments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages