Skip to content

Conversation

@AdnanQureshi3
Copy link
Contributor

#470
Solved the issue: Implement a New Workflow
feat: add SudokuWorkflow example implementation and register in default_mapping

Checklist

Added sudoku_workflow.py with full docstrings and inline comments

  • Implemented simple single-turn Sudoku solving workflow
  • Added reward calculation based on exact match
  • Registered workflow in trinity.common.workflows.init in alphabetical order
  • Ensured code follows style and passes pre-commit checks

If need to implement more custom workflows then please let me know

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 19, 2026

Sorry for the late reply. The current workflow code structure basically meets the requirements, but there is still some room for improvement in the details. For example:

  1. Currently, self.board is directly represented as an array, rather than using a string format similar to the frozen lake render for the prompt. This may affect the model's understanding.
  2. The generate function can be further improved. Although the difficulty (number of empty cells) can be adjusted, all questions are generated based on a single standard answer, which may lead to overfitting.
  3. The current design fills only one cell per step. While this is simple, considering that Sudoku has many empty cells, it may result in too many interaction rounds, overly long context, and increased training costs. You might consider allowing the model to fill in multiple numbers at a time. Of course, this is just a personal suggestion, and the actual effect needs to be verified in practice.

If resources permit, I recommend running the workflow locally in debug mode to observe:

  1. Whether the workflow can complete the game without errors (regardless of correctness).
  2. Whether it can solve the Sudoku correctly with a certain probability (if all answers are wrong, RL training cannot proceed).

Additionally, since this example is relatively complex, I suggest converting some samples into unit tests to ensure the correctness of each module.

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 19, 2026

If you find the 9x9 setting is too difficult, you can try 4x4 or 6x6 setting instead.

This Leaderboard may help you to build the workflow

@AdnanQureshi3
Copy link
Contributor Author

I’ve improved it further and addressed the earlier points. Please have a look and let me know if anything looks wrong or if further changes are needed?

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 21, 2026

Good job! Only a few final tasks remain:

  1. Package the current files together and place them in the trinity.common.workflows.envs.sudoku directory, and modify the imports accordingly.
  2. Create a new file sudoku_test.py under tests/common to verify the correctness of the generator and judge.
  3. (Recommended) Write a test case to verify the correctness of the sudoku workflow.
  4. (If possible) Provide a 4x4 version of sudoku.

@AdnanQureshi3
Copy link
Contributor Author

I’ve done these changes now,
sudoku files are packaged under trinity.common.workflows.envs.sudoku, tests for the generator and judge are added, and a 4×4 Sudoku version is also supported. Please have a look and let me know if anything needs adjustment.

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 22, 2026

/unittest-module-common

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
53 52 0 1 0 0 15m 22s

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 39.1s
tests/common/config_test.py::TestConfig::test_chat_template_path 74ms
tests/common/config_test.py::TestConfig::test_config_flatten 31ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 153ms
tests/common/config_test.py::TestConfig::test_default_workflow 72ms
tests/common/config_test.py::TestConfig::test_load_default_config 4.3s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 73ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 74ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 1.9s
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 2ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 1m 21s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 1m 9s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 38.1s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 1m 1s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 54.8s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 54.6s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 54.8s
tests/common/vllm_test.py::TestAPIServer::test_api 26.5s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 23.9s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 57.3s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 305ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 295ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 55.9s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 55.6s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 2m 38s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 1m 16s

Github Test Reporter by CTRF 💚

Copy link
Collaborator

@pan-x-c pan-x-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your contribution. LGTM.

@pan-x-c pan-x-c merged commit 837f826 into agentscope-ai:main Jan 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants