Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickstart example not working #489

Open
2 of 4 tasks
jmorenobl opened this issue May 23, 2024 · 3 comments
Open
2 of 4 tasks

Quickstart example not working #489

jmorenobl opened this issue May 23, 2024 · 3 comments

Comments

@jmorenobl
Copy link

jmorenobl commented May 23, 2024

System Info

Pre-built Docker image on g4dn.xlarge with Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.2.0 (Amazon Linux 2)

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Execute the example on the home page:
model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/predibase/lorax:main --model-id $model

Expected behavior

A server starts successfully serving mistral.

@rusenask
Copy link

what's the error you are seeing? and logs?

@jmorenobl
Copy link
Author

Just by executing this:

model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/predibase/lorax:main --model-id $model

I get the following error:

2024-05-27T11:40:55.235184Z  INFO lorax_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.1", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "252cfb445bd6", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:40:55.235284Z  INFO download: lorax_launcher: Starting download process.
2024-05-27T11:40:58.738448Z ERROR download: lorax_launcher: Download encountered an error: 
Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
    response.raise_for_status()

  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 124, in download_weights
    _download_weights(model_id, revision, extension, auto_convert, source, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/weights.py", line 447, in download_weights
    model_source.weight_files()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 179, in weight_files
    return weight_files(self.model_id, self.revision, extension, self.api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 69, in weight_files
    filenames = weight_hub_files(model_id, revision, extension, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 34, in weight_hub_files
    info = api.model_info(model_id, revision=revision)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1922, in model_info
    hf_raise_for_status(r)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    raise GatedRepoError(message, response) from e

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-6654714a-7fc4308c45580b7328298ca4;876ae9f0-b94e-4384-a4a9-fd3139261aa7)

Cannot access gated repo for url https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted. You must be authenticated to access it.

Error: DownloadError

If I add my token I and execute it this way:

model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN -v $volume:/data ghcr.io/predibase/lorax:main --model-id $model

I get the following error:

2024-05-27T11:45:16.081927Z  INFO lorax_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.1", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "8b4a73dd40ee", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:45:16.082040Z  INFO download: lorax_launcher: Starting download process.
Error: DownloadError
2024-05-27T11:45:18.784725Z ERROR download: lorax_launcher: Download encountered an error: 
Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
    response.raise_for_status()

  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 124, in download_weights
    _download_weights(model_id, revision, extension, auto_convert, source, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/weights.py", line 447, in download_weights
    model_source.weight_files()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 179, in weight_files
    return weight_files(self.model_id, self.revision, extension, self.api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 69, in weight_files
    filenames = weight_hub_files(model_id, revision, extension, api_token)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 34, in weight_hub_files
    info = api.model_info(model_id, revision=revision)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1922, in model_info
    hf_raise_for_status(r)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    raise GatedRepoError(message, response) from e

huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-6654724e-68506e57290cd75960e9177c;090b4284-8d6f-441f-916f-2fa3e5ab57c3)

Cannot access gated repo for url https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted and you are not in the authorized list. Visit https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 to ask for access.

It's fine since I don't have access to that model but I would use another example for a quickstart. I tried with Phi-3 since it's not a gated model but it didn't work either:

$ export model=microsoft/Phi-3-small-8k-instruct
$ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/predibase/lorax:main --model-id $model
2024-05-27T11:52:11.862867Z  INFO lorax_launcher: Args { model_id: "microsoft/Phi-3-small-8k-instruct", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "99cd7f793e22", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:52:11.862989Z  INFO download: lorax_launcher: Starting download process.
2024-05-27T11:52:14.407194Z  INFO lorax_launcher: hub.py:121 Download file: model-00001-of-00004.safetensors

2024-05-27T11:52:31.144230Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00001-of-00004.safetensors in 0:00:16.

2024-05-27T11:52:31.144324Z  INFO lorax_launcher: hub.py:150 Download: [1/4] -- ETA: 0:00:48

2024-05-27T11:52:31.156145Z  INFO lorax_launcher: hub.py:121 Download file: model-00002-of-00004.safetensors

2024-05-27T11:53:19.520065Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00002-of-00004.safetensors in 0:00:48.

2024-05-27T11:53:19.520245Z  INFO lorax_launcher: hub.py:150 Download: [2/4] -- ETA: 0:01:05

2024-05-27T11:53:19.520607Z  INFO lorax_launcher: hub.py:121 Download file: model-00003-of-00004.safetensors

2024-05-27T11:54:38.927900Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00003-of-00004.safetensors in 0:01:19.

2024-05-27T11:54:38.928020Z  INFO lorax_launcher: hub.py:150 Download: [3/4] -- ETA: 0:00:48

2024-05-27T11:54:38.928307Z  INFO lorax_launcher: hub.py:121 Download file: model-00004-of-00004.safetensors

2024-05-27T11:54:46.533325Z  INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00004-of-00004.safetensors in 0:00:07.

2024-05-27T11:54:46.533440Z  INFO lorax_launcher: hub.py:150 Download: [4/4] -- ETA: 0

2024-05-27T11:54:47.004372Z  INFO download: lorax_launcher: Successfully downloaded weights.
2024-05-27T11:54:47.004674Z  INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-05-27T11:54:52.749530Z ERROR lorax_launcher: server.py:265 Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 84, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 318, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 252, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 328, in get_model
    raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3small

2024-05-27T11:54:53.511126Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 84, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 318, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 252, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 328, in get_model
    raise ValueError(f"Unsupported model type {model_type}")

ValueError: Unsupported model type phi3small
 rank=0
2024-05-27T11:54:53.609061Z ERROR lorax_launcher: Shard 0 failed to start
2024-05-27T11:54:53.609080Z  INFO lorax_launcher: Shutting down shards
Error: ShardCannotStart

Is there any model I could use to start playing around with LoRAX?

@peterschmidt85
Copy link

@jmorenobl You just need to log in to HuggingFace Hub, open the model, accept the EULA, and then run LoRaX by passing your own HUGGING_FACE_HUB_TOKEN env variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants