The project started when @MuWinds decided to build an AI agent for fun and practice.
The agent is not intended to stay limited to BUUCTF, so the challenge descriptions are currently provided manually (mostly because of laziness).
Vision: become the trusted teammate of every CTF player—and if the agent can eventually solve challenges on its own, even better.
- End-to-end automated solving, including problem analysis, target exploration, code execution, and flag extraction.
- Interactive solving flow in the command line.
- Built-in tooling to run Python locally or over SSH on a prepared Linux host.
- Extensible framework for adding CTF tools.
- Customisable prompts and model configurations.
- Clone the repository
git clone https://github.com/MuWinds/BUUCTF_Agent.git
- Install dependencies
pip install -r .\requirements.txt
-
(Optional) Configure a Docker container. This sets up the execution environment for the agent. You can prepare your own virtual machine, or use the provided Dockerfile—just make sure Docker is installed first.
(1) Build the image:docker build -t ctf_agent .(2) Run the image and map the container’s SSH port 22 to port 2201 on the host:
docker run -itd -p 2201:22 ctf_agent
If you create the container by using the Dockerfile in this repository, the SSH user is
rootand the password isctfagent. -
Update the configuration file
config.jsonwith your tooling preferences.
Below is an example that uses the SiliconFlow API (OpenAI-compatible mode):{ "llm":{ "analyzer":{ "model": "deepseek-ai/DeepSeek-R1", "api_key": "", "api_base": "https://api.siliconflow.cn/" }, "solve_agent":{ "model": "deepseek-ai/DeepSeek-V3", "api_key": "", "api_base": "https://api.siliconflow.cn/" }, "pre_processor":{ "model": "Qwen/Qwen3-8B", "api_key": "", "api_base": "https://api.siliconflow.cn/" } }, "max_history_steps": 15, "compression_threshold": 7, "tool_config":{ "ssh_shell": { "host": "127.0.0.1", "port": 22, "username": "", "password": "" }, "python": { } } }In the
llmsection,analyzerhandles reasoning about outputs,solve_agentexecutes the solving steps, andpre_processorperforms lightweight text pre-processing—use a small, cost-effective model here. A chain-of-thought style model is recommended foranalyzerto improve the quality of reasoning.The project currently only supports OpenAI-compatible APIs.
-
Run the agent:
python .\main.py
Allow running Python code in the local environment(done)- Support more tooling, e.g. binary analysis, beyond web and crypto challenges
- Provide a polished interface such as a web front-end or Qt desktop GUI
- Add a RAG knowledge base
Use different LLMs for different tools or tasks (reasoning vs. code generation)(done)- Improve MCP support
- Automate interactions with additional online judges so challenge text does not need to be entered manually
Support attachmentsdone—place files in the project root underattachments
Python execution and SSH access to a prepared Linux box are available out of the box. If you want to add your own tooling, start here.
Inside the ctf_tool directory you will find base_tool.py:
class BaseTool(ABC):
@abstractmethod
def execute(self, *args, **kwargs) -> Tuple[str, str]:
"""Run the tool and return stdout/stderr."""
pass
@property
@abstractmethod
def function_config(self) -> Dict:
"""Describe the function-call schema exposed to the agent."""
passEvery custom tool must implement execute and function_config.
-
executeperforms the actual action and returns a tuple of(stdout, stderr); the order is flexible, but both values should be provided. -
function_configexposes the tool through function calling so the agent can discover when to use it. The method must be decorated with@property, and the returned structure follows a consistent schema. Example for a remote shell:
@property
def function_config(self) -> Dict:
return {
"type": "function",
"function": {
"name": "execute_shell_command",
"description": "Run a shell command on the remote server. curl, sqlmap, nmap, openssl, and other common tools are available.",
"parameters": {
"type": "object",
"properties": {
"purpose": {
"type": "string",
"description": "Why this step is being executed."
},
"content": {
"type": "string",
"description": "The shell command to run."
},
},
"required": ["purpose", "content"]
}
}
}
}Because the agent can execute shell commands, do not let it run on a machine that stores important data. There is no guarantee that an LLM will not suggest something destructive like rm -rf /*. Use a disposable environment or the provided Dockerfile to stay safe.
QQ group:

