feature(workflows): added a agentic workflow for sudoku #481

AdnanQureshi3 · 2026-01-16T12:06:30Z

#470
Solved the issue: Implement a New Workflow
feat: add SudokuWorkflow example implementation and register in default_mapping

Checklist

Added sudoku_workflow.py with full docstrings and inline comments

Implemented simple single-turn Sudoku solving workflow
Added reward calculation based on exact match
Registered workflow in trinity.common.workflows.init in alphabetical order
Ensured code follows style and passes pre-commit checks

If need to implement more custom workflows then please let me know

trinity/common/workflows/envs/sudoku/sudoku_workflow.py

trinity/common/workflows/sudoku_generator.py

trinity/common/workflows/sudoku_workflow.py

pan-x-c · 2026-01-19T13:13:39Z

Sorry for the late reply. The current workflow code structure basically meets the requirements, but there is still some room for improvement in the details. For example:

Currently, self.board is directly represented as an array, rather than using a string format similar to the frozen lake render for the prompt. This may affect the model's understanding.
The generate function can be further improved. Although the difficulty (number of empty cells) can be adjusted, all questions are generated based on a single standard answer, which may lead to overfitting.
The current design fills only one cell per step. While this is simple, considering that Sudoku has many empty cells, it may result in too many interaction rounds, overly long context, and increased training costs. You might consider allowing the model to fill in multiple numbers at a time. Of course, this is just a personal suggestion, and the actual effect needs to be verified in practice.

If resources permit, I recommend running the workflow locally in debug mode to observe:

Whether the workflow can complete the game without errors (regardless of correctness).
Whether it can solve the Sudoku correctly with a certain probability (if all answers are wrong, RL training cannot proceed).

Additionally, since this example is relatively complex, I suggest converting some samples into unit tests to ensure the correctness of each module.

pan-x-c · 2026-01-19T13:22:48Z

If you find the 9x9 setting is too difficult, you can try 4x4 or 6x6 setting instead.

This Leaderboard may help you to build the workflow

AdnanQureshi3 · 2026-01-20T21:06:50Z

I’ve improved it further and addressed the earlier points. Please have a look and let me know if anything looks wrong or if further changes are needed?

pan-x-c · 2026-01-21T08:17:14Z

Good job! Only a few final tasks remain:

Package the current files together and place them in the trinity.common.workflows.envs.sudoku directory, and modify the imports accordingly.
Create a new file sudoku_test.py under tests/common to verify the correctness of the generator and judge.
(Recommended) Write a test case to verify the correctness of the sudoku workflow.
(If possible) Provide a 4x4 version of sudoku.

AdnanQureshi3 · 2026-01-21T18:31:16Z

I’ve done these changes now,
sudoku files are packaged under trinity.common.workflows.envs.sudoku, tests for the generator and judge are added, and a 4×4 Sudoku version is also supported. Please have a look and let me know if anything needs adjustment.

pan-x-c · 2026-01-22T03:27:09Z

/unittest-module-common

github-actions · 2026-01-22T03:44:44Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
53	52	0	1	0	0	15m 22s

Skipped

Tests	Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	skipped ⏭️

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	39.1s
tests/common/config_test.py::TestConfig::test_chat_template_path	✅	74ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	31ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	153ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	72ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4.3s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	73ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	✅	74ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	1.9s
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	2ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution	✅	1ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes	✅	1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution	✅	1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	1m 21s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	1m 9s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	38.1s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	1m 1s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	54.8s
tests/common/vllm_test.py::TestModelLen_2::test_model_len	✅	54.6s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	✅	54.8s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	26.5s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	✅	23.9s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	57.3s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	⏭️	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	305ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	295ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	55.9s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	55.6s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	✅	2m 38s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api	✅	1m 16s

Github Test Reporter by CTRF 💚

pan-x-c

Thanks again for your contribution. LGTM.

feature(workflows): added a agentic workflow for sudoku

1bdcf82

pan-x-c reviewed Jan 16, 2026

View reviewed changes

trinity/common/workflows/envs/sudoku/sudoku_workflow.py Show resolved Hide resolved

feature: added Sudoku generator, and judge

fe71316

pan-x-c reviewed Jan 17, 2026

View reviewed changes

trinity/common/workflows/sudoku_generator.py Outdated Show resolved Hide resolved

trinity/common/workflows/sudoku_workflow.py Outdated Show resolved Hide resolved

feature(workflows): improve SudokuWorkflow prompt and generator

3b8796d

improved sudoku workflow and generator

bf1156a

created a package for sudoku and also added tests

dfd5382

pan-x-c approved these changes Jan 23, 2026

View reviewed changes

pan-x-c merged commit 837f826 into agentscope-ai:main Jan 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(workflows): added a agentic workflow for sudoku #481

feature(workflows): added a agentic workflow for sudoku #481

AdnanQureshi3 commented Jan 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Jan 19, 2026

Uh oh!

pan-x-c commented Jan 19, 2026

Uh oh!

AdnanQureshi3 commented Jan 20, 2026

Uh oh!

pan-x-c commented Jan 21, 2026

Uh oh!

AdnanQureshi3 commented Jan 21, 2026

Uh oh!

pan-x-c commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

pan-x-c left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feature(workflows): added a agentic workflow for sudoku #481

feature(workflows): added a agentic workflow for sudoku #481

Conversation

AdnanQureshi3 commented Jan 16, 2026

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Jan 19, 2026

Uh oh!

pan-x-c commented Jan 19, 2026

Uh oh!

AdnanQureshi3 commented Jan 20, 2026

Uh oh!

pan-x-c commented Jan 21, 2026

Uh oh!

AdnanQureshi3 commented Jan 21, 2026

Uh oh!

pan-x-c commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Summary

Skipped

Tests

Uh oh!

pan-x-c left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants