Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2 70B Benchemark #248

Open
ninano1208 opened this issue Feb 20, 2025 · 3 comments
Open

Llama2 70B Benchemark #248

ninano1208 opened this issue Feb 20, 2025 · 3 comments

Comments

@ninano1208
Copy link

I am unable to run the benchmark due to the following issues. I need help to resolve these two issues.
I am using ROCM6.1 version and ROCM is working very accurately.

/app/mlc/bin/python3 main.py --scenario Offline --dataset-path /root/MLC/repos/local/cache/download-file_a50938f4/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl.gz --device rocm --total-sample-count 10 --user-conf '/root/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/5083167cafc74fa18c4e30e3eed9e796.conf' --output-log-dir /root/MLC/repos/local/cache/get-mlperf-inference-results-dir_98334fda/test_results/5cfda71d8e53-reference-rocm-pytorch-v2.6.0.dev20241122-default_config/llama2-70b-99/offline/performance/run_1 --dtype float32 --model-path /root/MLC/repos/local/cache/get-ml-model-llama2_adcd8091 2>&1 | tee '/root/MLC/repos/local/cache/get-mlperf-inference-results-dir_98334fda/test_results/5cfda71d8e53-reference-rocm-pytorch-v2.6.0.dev20241122-default_config/llama2-70b-99/offline/performance/run_1/console.out'; echo ${PIPESTATUS[0]} > exitstatus
usage: main.py [-h] [--scenario {Offline,Server}] [--model-path MODEL_PATH]
[--dataset-path DATASET_PATH] [--accuracy] [--dtype DTYPE]
[--device {cpu,cuda:0}] [--audit-conf AUDIT_CONF]
[--user-conf USER_CONF]
[--total-sample-count TOTAL_SAMPLE_COUNT]
[--batch-size BATCH_SIZE] [--output-log-dir OUTPUT_LOG_DIR]
[--enable-log-trace] [--num-workers NUM_WORKERS] [--vllm]
[--api-model-name API_MODEL_NAME] [--api-server API_SERVER]
[--lg-model-name {llama2-70b,llama2-70b-interactive}]
main.py: error: argument --device: invalid choice: 'rocm' (choose from cpu, cuda:0)
Traceback (most recent call last):
File "/app/mlc/bin/mlcr", line 8, in
sys.exit(mlcr())
^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1715, in mlcr
main()
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1797, in main
res = method(run_args)
^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1529, in run
return self.call_script_module_function("run", run_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1509, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run
r = self._run(i)
^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1772, in _run
r = customize_code.preprocess(ii)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/script/run-mlperf-inference-app/customize.py", line 284, in preprocess
r = mlc.access(ii)
^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 92, in access
result = method(options)
^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1529, in run
return self.call_script_module_function("run", run_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1509, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run
r = self._run(i)
^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1842, in _run
r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3532, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3702, in _run_deps
r = self.action_object.access(ii)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 92, in access
result = method(options)
^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1529, in run
return self.call_script_module_function("run", run_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1509, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run
r = self._run(i)
^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1858, in _run
r = prepare_and_run_script_with_postprocessing(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 5488, in prepare_and_run_script_with_postprocessing
r = script_automation._call_run_deps(posthook_deps, local_env_keys, local_env_keys_from_meta, env, state, const, const_state,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3532, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3702, in _run_deps
r = self.action_object.access(ii)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 92, in access
result = method(options)
^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1529, in run
return self.call_script_module_function("run", run_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1509, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run
r = self._run(i)
^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1885, in _run
r = self._run_deps(post_deps, clean_env_keys_post_deps, env, state, const, const_state, add_deps_recursive, recursion_spaces,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3702, in _run_deps
r = self.action_object.access(ii)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 92, in access
result = method(options)
^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1529, in run
return self.call_script_module_function("run", run_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/mlc/lib/python3.12/site-packages/mlc/main.py", line 1519, in call_script_module_function
raise ScriptExecutionError(f"Script {function_name} execution failed. Error : {error}")
mlc.main.ScriptExecutionError: Script run execution failed. Error : MLC script failed (name = benchmark-program, return code = 512)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please file an issue at https://github.com/mlcommons/mlperf-automations/issues along with the full MLC command being run and the relevant
or full console log.

@anandhu-eng
Copy link
Contributor

Hi @ninano1208 , I suppose you have followed this documentation. Could you replace --device=rocm to --device=cuda on your run command? I think it should pick up the gpu for execution even if rocm library is installed.

@anandhu-eng
Copy link
Contributor

Apologies for the previous reply, that would install CUDA dependency on the system instead of ROCm. PR containing the fix has been raised here . There will not be any modification to the run command.

@anandhu-eng
Copy link
Contributor

@ninano1208 could you please try again, the changes should now reflect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants