Skip to content

Latest commit

 

History

History
40 lines (33 loc) · 1.22 KB

llamafile.md

File metadata and controls

40 lines (33 loc) · 1.22 KB
  • Running llama 3.1 llamafile and prompt style:

    ./llama3.1-8b-instruct.llamafile --temp 0 -ngl 9999 -c 0 -p 'Who is the 45th president?<|end_header_id|>' --silent-prompt 2>/dev/null
    • -ngl 9999 = Offload to GPU
    • -c 0 = Allocate as many tokens as possible
    • Note the <|end_header_id|> at end of prompt. If this isn't present the output will continue forever.
  • SSH into server with a llamafile and give it a prompt:

    scp "./llm_prompt.txt" llmserver:~/
    ssh llmserver "./llama3.1-8b-instruct.llamafile --temp 0.5 -c 0 -ngl 9999 --cli --silent-prompt --file ./llm_prompt.txt" | tee "./llm_output.txt"
    ssh llmserver "rm -v ./llm_prompt.txt"
  • Systemd service file:
[Unit]
Description=Run Llamafile in server mode
After=network.target

[Service]
Type=simple
ExecStart=/run_llamafile.sh
Restart=always
Environment="HOME=/root"

[Install]
WantedBy=default.target
  • Bash script called by service file:
#!/usr/bin/env bash
/home/ubuntu/llama3.1-8b-instruct.llamafile --server --nobrowser -ngl 999 --host 0.0.0.0 -c 0