Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jump-ci] Allow testing multiple presets #646

Merged
merged 4 commits into from
Jan 23, 2025
Merged

Conversation

kpouget
Copy link
Contributor

@kpouget kpouget commented Jan 21, 2025

No description provided.

Copy link

openshift-ci bot commented Jan 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kpouget. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kpouget kpouget force-pushed the metal branch 2 times, most recently from 09113f6 to ac2975e Compare January 21, 2025 11:20
@kpouget
Copy link
Contributor Author

kpouget commented Jan 21, 2025

/test jump-ci icelake fine_tuning
/var multi_run.args: [[fms, gating], ilab, ray]
/var matbench.lts.opensearch.export.enabled: false
/var tests.fine_tuning.test_settings.gpu: 0
/only test_ci

@kpouget
Copy link
Contributor Author

kpouget commented Jan 21, 2025

/test jump-ci icelake fine_tuning
/var multi_run.args: [[fms, gating], ilab, ray]
/var matbench.lts.opensearch.export.enabled: false
/var tests.fine_tuning.test_settings.gpu: 0
/only test_ci

@kpouget
Copy link
Contributor Author

kpouget commented Jan 21, 2025

/test jump-ci icelake fine_tuning
/var multi_run.args: [[fms, gating], ilab, ray]
/var matbench.lts.opensearch.export.enabled: false
/var tests.fine_tuning.test_settings.gpu: 0
/only test_ci

@kpouget
Copy link
Contributor Author

kpouget commented Jan 21, 2025

/test jump-ci icelake fine_tuning
/var multi_run.args: [[fms, gating], ilab, ray]
/var matbench.lts.opensearch.export.enabled: false
/var tests.fine_tuning.test_settings.gpu: 0
/only test_ci

Copy link

topsail-bot bot commented Jan 21, 2025

🟢 Test of 'fine_tuning test test_ci' succeeded after 00 hours 05 minutes 51 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: fms
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0

Copy link

topsail-bot bot commented Jan 21, 2025

🔴 Test of 'fine_tuning test test_ci' failed after 00 hours 19 minutes 37 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: ilab
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0

Failure indicator:

/logs/artifacts/004__ilab_fine_tuning/000__fine_tuning__run_fine_tuning_job/FAILURE | [000__fine_tuning__run_fine_tuning_job] ./run_toolbox.py from_config fine_tuning run_fine_tuning_job --extra={'name': 'ilab', 'pod_count': 1, 'model_name': 'granite-3.0-8b-instruct', 'dataset_name': 'ilab_skills_data.jsonl', 'gpu': 0, 'shared_memory': 20} --> 2
/logs/artifacts/004__ilab_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/004__ilab_fine_tuning" ./run_toolbox.py from_config fine_tuning run_fine_tuning_job --extra="{'name': 'ilab', 'pod_count': 1, 'model_name': 'granite-3.0-8b-instruct', 'dataset_name': 'ilab_skills_data.jsonl', 'gpu': 0, 'shared_memory': 20}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 151, in _run_test
    run.run_toolbox_from_config("fine_tuning", "run_fine_tuning_job",
  File "/opt/topsail/src/projects/core/library/run.py", line 65, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 121, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

Copy link

topsail-bot bot commented Jan 21, 2025

🔴 Test of 'fine_tuning test test_ci' failed after 00 hours 00 minutes 52 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: ray
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0

Failure indicator:

/logs/artifacts/003__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'ray', 'pod_count': 1, 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 0, 'dataset_replication': 1} --> 2
/logs/artifacts/003__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/003__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'ray', 'pod_count': 1, 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 0, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 154, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 65, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 121, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

@kpouget
Copy link
Contributor Author

kpouget commented Jan 21, 2025

/test jump-ci icelake fine_tuning fms
/only test_ci

Copy link

topsail-bot bot commented Jan 21, 2025

🔴 Test of 'fine_tuning test test_ci' failed after 00 hours 05 minutes 23 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_1: fms

Failure indicator:

/logs/artifacts/004__fms_fine_tuning/000__fine_tuning__run_fine_tuning_job/FAILURE | [000__fine_tuning__run_fine_tuning_job] ./run_toolbox.py from_config fine_tuning run_fine_tuning_job --extra={'name': 'fine-tuning', 'pod_count': 1, 'model_name': 'bloom-560m', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1, 'dataset_response_template': '\n### Label:'} --> 2
/logs/artifacts/004__fms_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/004__fms_fine_tuning" ./run_toolbox.py from_config fine_tuning run_fine_tuning_job --extra="{'name': 'fine-tuning', 'pod_count': 1, 'model_name': 'bloom-560m', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1, 'dataset_response_template': '\n### Label:'}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 151, in _run_test
    run.run_toolbox_from_config("fine_tuning", "run_fine_tuning_job",
  File "/opt/topsail/src/projects/core/library/run.py", line 65, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 121, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

@kpouget
Copy link
Contributor Author

kpouget commented Jan 21, 2025

/test jump-ci icelake fine_tuning fms
/only test_ci
/var tests.fine_tuning.test_settings.gpu: 0

Copy link

topsail-bot bot commented Jan 21, 2025

🟢 Test of 'fine_tuning test test_ci' succeeded after 00 hours 05 minutes 20 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_1: fms
tests.fine_tuning.test_settings.gpu: 0

@kpouget
Copy link
Contributor Author

kpouget commented Jan 21, 2025

/test jump-ci icelake fine_tuning
/var multi_run.args: [[fms, gating], ray_bench__iperf]
/only test_ci
/var tests.fine_tuning.test_settings.gpu: 0
/var matbench.lts.opensearch.export.enabled: false

Copy link

topsail-bot bot commented Jan 21, 2025

🟢 Test of 'fine_tuning test test_ci' succeeded after 00 hours 05 minutes 48 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: fms
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0

Copy link

topsail-bot bot commented Jan 21, 2025

🔴 Test of 'fine_tuning test test_ci' failed after 00 hours 04 minutes 15 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: ray_bench__iperf
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0

Failure indicator:

/logs/artifacts/003__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'ray', 'pod_count': 2, 'gpu': 0, 'node_selector_key': 'nvidia.com/gpu.present', 'node_selector_value': 'true', 'hyper_parameters': {'flavor': 'iperf'}} --> 2
/logs/artifacts/003__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/003__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'ray', 'pod_count': 2, 'gpu': 0, 'node_selector_key': 'nvidia.com/gpu.present', 'node_selector_value': 'true', 'hyper_parameters': {'flavor': 'iperf'}}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 154, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 65, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 121, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

@kpouget
Copy link
Contributor Author

kpouget commented Jan 22, 2025

/test jump-ci icelake fine_tuning
/var multi_run.args: [[fms, gating], ray_bench__iperf]
/only test_ci
/var tests.fine_tuning.test_settings.gpu: 0
/var matbench.lts.opensearch.export.enabled: false
/var tests.fine_tuning.test_settings.node_selector_key: beta.kubernetes.io/os
/var tests.fine_tuning.test_settings.node_selector_value: linux

Copy link

topsail-bot bot commented Jan 22, 2025

🔴 Test of 'fine_tuning test test_ci' failed after 00 hours 03 minutes 09 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: fms
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0
tests.fine_tuning.test_settings.node_selector_key: beta.kubernetes.io/os
tests.fine_tuning.test_settings.node_selector_value: linux

Failure indicator:

/logs/artifacts/004__fms_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/004__fms_fine_tuning" ./run_toolbox.py from_config fine_tuning run_fine_tuning_job --extra="{'name': 'fine-tuning', 'pod_count': 1, 'model_name': 'bloom-560m', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 0, 'dataset_replication': 1, 'node_selector_key': 'beta.kubernetes.io/os', 'node_selector_value': 'linux', 'dataset_response_template': '\n### Label:'}"' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 151, in _run_test
    run.run_toolbox_from_config("fine_tuning", "run_fine_tuning_job",
  File "/opt/topsail/src/projects/core/library/run.py", line 65, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 121, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,

[...]

Copy link

topsail-bot bot commented Jan 22, 2025

🔴 Test of 'fine_tuning test test_ci' failed after 00 hours 04 minutes 13 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: ray_bench__iperf
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0
tests.fine_tuning.test_settings.node_selector_key: beta.kubernetes.io/os
tests.fine_tuning.test_settings.node_selector_value: linux

Failure indicator:

/logs/artifacts/003__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'ray', 'pod_count': 2, 'gpu': 0, 'node_selector_key': 'beta.kubernetes.io/os', 'node_selector_value': 'linux', 'hyper_parameters': {'flavor': 'iperf'}} --> 2
/logs/artifacts/003__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/003__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'ray', 'pod_count': 2, 'gpu': 0, 'node_selector_key': 'beta.kubernetes.io/os', 'node_selector_value': 'linux', 'hyper_parameters': {'flavor': 'iperf'}}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 154, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 65, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 121, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

@kpouget
Copy link
Contributor Author

kpouget commented Jan 22, 2025

/test jump-ci icelake fine_tuning
/var multi_run.args: [[fms, gating], [fms]]
/only test_ci
/var tests.fine_tuning.test_settings.gpu: 0
/var matbench.lts.opensearch.export.enabled: false

Copy link

topsail-bot bot commented Jan 22, 2025

🟢 Test of 'fine_tuning test test_ci' succeeded after 00 hours 05 minutes 49 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: fms
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0

1 similar comment
Copy link

topsail-bot bot commented Jan 22, 2025

🟢 Test of 'fine_tuning test test_ci' succeeded after 00 hours 05 minutes 49 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: fms
PR_POSITIONAL_ARG_3: gating
matbench.lts.opensearch.export.enabled: false
tests.fine_tuning.test_settings.gpu: 0

@kpouget
Copy link
Contributor Author

kpouget commented Jan 22, 2025

/test jump-ci icelake skeleton
/var multi_run.args: [single, single]
/only prepare_ci

@kpouget
Copy link
Contributor Author

kpouget commented Jan 22, 2025

/test jump-ci icelake skeleton
/var multi_run.args: [single, single]
/only test_ci

Copy link

topsail-bot bot commented Jan 22, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 05 minutes 55 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single

1 similar comment
Copy link

topsail-bot bot commented Jan 22, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 05 minutes 55 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single

@kpouget
Copy link
Contributor Author

kpouget commented Jan 23, 2025

/test jump-ci icelake skeleton
/var multi_run.args: [single, single]
/only test_ci
/var tests.sleep_duration: 5 # minutes

Copy link

topsail-bot bot commented Jan 23, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 05 minutes 56 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single
tests.sleep_duration: 5

Copy link

topsail-bot bot commented Jan 23, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 05 minutes 53 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single
tests.sleep_duration: 5

@kpouget
Copy link
Contributor Author

kpouget commented Jan 23, 2025

/test jump-ci icelake skeleton
/var multi_run.args: [single, single]
/only test_ci
/var tests.sleep_duration: 1 # minutes

Copy link

topsail-bot bot commented Jan 23, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 01 minutes 50 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single
tests.sleep_duration: 1

Copy link

topsail-bot bot commented Jan 23, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 01 minutes 48 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single
tests.sleep_duration: 1

@kpouget
Copy link
Contributor Author

kpouget commented Jan 23, 2025

/test jump-ci icelake skeleton
/var multi_run.args: [single, single]
/only test_ci
/var tests.sleep_duration: 1 # minutes

Copy link

topsail-bot bot commented Jan 23, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 01 minutes 54 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single
tests.sleep_duration: 1

Copy link

topsail-bot bot commented Jan 23, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 01 minutes 48 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_2: single
tests.sleep_duration: 1

@kpouget
Copy link
Contributor Author

kpouget commented Jan 23, 2025

/test jump-ci icelake skeleton
/only test_ci
/var tests.sleep_duration: 1 # minutes

Copy link

topsail-bot bot commented Jan 23, 2025

🟢 Test of 'skeleton test test_ci' succeeded after 00 hours 01 minutes 48 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
tests.sleep_duration: 1

@kpouget
Copy link
Contributor Author

kpouget commented Jan 23, 2025

test passed ❤️ , merging.

@kpouget kpouget enabled auto-merge January 23, 2025 17:11
@kpouget kpouget merged commit edd0221 into openshift-psap:main Jan 23, 2025
6 checks passed
@kpouget kpouget deleted the metal branch January 23, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant