Quickstart example not working #489

jmorenobl · 2024-05-23T20:22:44Z

System Info

Pre-built Docker image on g4dn.xlarge with Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.2.0 (Amazon Linux 2)

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Execute the example on the home page:
model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/predibase/lorax:main --model-id $model

Expected behavior

A server starts successfully serving mistral.

rusenask · 2024-05-26T19:00:16Z

what's the error you are seeing? and logs?

jmorenobl · 2024-05-27T11:58:24Z

Just by executing this:

model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/predibase/lorax:main --model-id $model

I get the following error:

2024-05-27T11:40:55.235184Z  INFO lorax_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.1", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "252cfb445bd6", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:40:55.235284Z  INFO download: lorax_launcher: Starting download process.
2024-05-27T11:40:58.738448Z ERROR download: lorax_launcher: Download encountered an error: 
Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
    response.raise_for_status()

  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 124, in download_weights
    _download_weights(model_id, revision, extension, auto_convert, source, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/weights.py", line 447, in download_weights
    model_source.weight_files()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 179, in weight_files
    return weight_files(self.model_id, self.revision, extension, self.api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 69, in weight_files
    filenames = weight_hub_files(model_id, revision, extension, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 34, in weight_hub_files
    info = api.model_info(model_id, revision=revision)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1922, in model_info
    hf_raise_for_status(r)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    raise GatedRepoError(message, response) from e

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-6654714a-7fc4308c45580b7328298ca4;876ae9f0-b94e-4384-a4a9-fd3139261aa7)

Cannot access gated repo for url https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted. You must be authenticated to access it.

Error: DownloadError

If I add my token I and execute it this way:

model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN -v $volume:/data ghcr.io/predibase/lorax:main --model-id $model

I get the following error:

2024-05-27T11:45:16.081927Z  INFO lorax_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.1", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "8b4a73dd40ee", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:45:16.082040Z  INFO download: lorax_launcher: Starting download process.
Error: DownloadError
2024-05-27T11:45:18.784725Z ERROR download: lorax_launcher: Download encountered an error: 
Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
    response.raise_for_status()

  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 124, in download_weights
    _download_weights(model_id, revision, extension, auto_convert, source, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/weights.py", line 447, in download_weights
    model_source.weight_files()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 179, in weight_files
    return weight_files(self.model_id, self.revision, extension, self.api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 69, in weight_files
    filenames = weight_hub_files(model_id, revision, extension, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 34, in weight_hub_files
    info = api.model_info(model_id, revision=revision)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1922, in model_info
    hf_raise_for_status(r)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    raise GatedRepoError(message, response) from e

huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-6654724e-68506e57290cd75960e9177c;090b4284-8d6f-441f-916f-2fa3e5ab57c3)

Cannot access gated repo for url https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted and you are not in the authorized list. Visit https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 to ask for access.

It's fine since I don't have access to that model but I would use another example for a quickstart. I tried with Phi-3 since it's not a gated model but it didn't work either:

$ export model=microsoft/Phi-3-small-8k-instruct
$ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/predibase/lorax:main --model-id $model
2024-05-27T11:52:11.862867Z  INFO lorax_launcher: Args { model_id: "microsoft/Phi-3-small-8k-instruct", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "99cd7f793e22", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:52:11.862989Z  INFO download: lorax_launcher: Starting download process.
2024-05-27T11:52:14.407194Z  INFO lorax_launcher: hub.py:121 Download file: model-00001-of-00004.safetensors

2024-05-27T11:52:31.144230Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00001-of-00004.safetensors in 0:00:16.

2024-05-27T11:52:31.144324Z  INFO lorax_launcher: hub.py:150 Download: [1/4] -- ETA: 0:00:48

2024-05-27T11:52:31.156145Z  INFO lorax_launcher: hub.py:121 Download file: model-00002-of-00004.safetensors

2024-05-27T11:53:19.520065Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00002-of-00004.safetensors in 0:00:48.

2024-05-27T11:53:19.520245Z  INFO lorax_launcher: hub.py:150 Download: [2/4] -- ETA: 0:01:05

2024-05-27T11:53:19.520607Z  INFO lorax_launcher: hub.py:121 Download file: model-00003-of-00004.safetensors

2024-05-27T11:54:38.927900Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00003-of-00004.safetensors in 0:01:19.

2024-05-27T11:54:38.928020Z  INFO lorax_launcher: hub.py:150 Download: [3/4] -- ETA: 0:00:48

2024-05-27T11:54:38.928307Z  INFO lorax_launcher: hub.py:121 Download file: model-00004-of-00004.safetensors

2024-05-27T11:54:46.533325Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00004-of-00004.safetensors in 0:00:07.

2024-05-27T11:54:46.533440Z  INFO lorax_launcher: hub.py:150 Download: [4/4] -- ETA: 0

2024-05-27T11:54:47.004372Z  INFO download: lorax_launcher: Successfully downloaded weights.
2024-05-27T11:54:47.004674Z  INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-05-27T11:54:52.749530Z ERROR lorax_launcher: server.py:265 Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 84, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 318, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 252, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 328, in get_model
    raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3small

2024-05-27T11:54:53.511126Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 84, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 318, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 252, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 328, in get_model
    raise ValueError(f"Unsupported model type {model_type}")

ValueError: Unsupported model type phi3small
 rank=0
2024-05-27T11:54:53.609061Z ERROR lorax_launcher: Shard 0 failed to start
2024-05-27T11:54:53.609080Z  INFO lorax_launcher: Shutting down shards
Error: ShardCannotStart

Is there any model I could use to start playing around with LoRAX?

peterschmidt85 · 2024-05-30T11:14:36Z

@jmorenobl You just need to log in to HuggingFace Hub, open the model, accept the EULA, and then run LoRaX by passing your own HUGGING_FACE_HUB_TOKEN env variable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quickstart example not working #489

Quickstart example not working #489

jmorenobl commented May 23, 2024 •

edited

Loading

rusenask commented May 26, 2024

jmorenobl commented May 27, 2024

peterschmidt85 commented May 30, 2024

Quickstart example not working #489

Quickstart example not working #489

Comments

jmorenobl commented May 23, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

rusenask commented May 26, 2024

jmorenobl commented May 27, 2024

peterschmidt85 commented May 30, 2024

jmorenobl commented May 23, 2024 •

edited

Loading