Paper: IEEE S&P 2026
ConcoLLMic is the first language- and theory-agnostic concolic executor powered by LLM agents. Unlike traditional symbolic execution tools that require language-specific implementations and struggle with constraint solving, ConcoLLMic:
- Works with any programming language and environment interactions — supports C, C++, Python, Java, ..., and even multi-language systems; requires no additional environment modeling and handles them efficiently.
- Handles diverse constraint theories — including floating-point arithmetic, strings, structured data, and bit-level operations.
- Achieves superior coverage — in ~4 hours, ConcoLLMic covers 115%-233% more branches than state-of-the-art tools like KLEE and 81% more than AFL++ running for 48 hours.
- Finds real bugs — has already discovered 10+ previously unknown vulnerabilities in real-world softwares.
How it works: ConcoLLMic uses LLM agents to
- instrument code at source level with logging statements like
fprintf(stderr, ...); - summarize path constraints flexibly in natural language, code, or SMT formulas;
- solve constraints intelligently — directly reasoning constraints over semantics and delegating to grounded solvers when needed.
This eliminates the need for hand-crafted language-specific symbolic interpreters and environment modeling as well as expensive constraint solving.
Example instrumentation of src/count.c from FP-Bench, by inserting fprintf at key points in the execution. Output in stderr allows us to reconstruct the dynamic execution path.
This shows one round of concolic execution with ConcoLLMic. Compared to conventional tools with verbose, implementation-level formulas:
start = atof(I[0]) ∧ end = atof(I[1]) ∧ start < end ∧ cur[0] = start ∧
(∀ 0 ≤ i < cnt, cur[i] ≠ end) ∧ cur[cnt] = end ∧ cnt ≤ 20
ConcoLLMic summarizes the path constraints concisely using high-level natural-language:
The number of representable FP values in the range must be ≤ 20.
- Python 3.10 or higher
- API key for Claude-3.7-Sonnet (from Anthropic)
- Clone the repository
git clone https://github.com/ConcoLLMic/ConcoLLMic.git
cd ConcoLLMic- Install Python dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt- Set up your API key
export ANTHROPIC_API_KEY="your_api_key_here"By default, ConcoLLMic uses Claude-3.7-Sonnet. You can configure other models in ACE.py (see setup_model()).
Let's walk through a complete example using a C program from FP-Bench that counts floating-point values in a range.
ConcoLLMic instruments source code by inserting trace statements. For the provided examples, instrumented code is already available in code_example/instr/.
To instrument from scratch:
rm ./code_example/instr/count.c
python3 ACE.py instrument \
--src_dir ./code_example/src/ \
--out_dir ./code_example/instr/ \
--instr_languages c,cpp,python,javaNote: The --instr_languages flag is only used for filtering files by extension. No additional language-specific support is needed.
Compile the instrumented code normally:
gcc -o ./code_example/instr/count ./code_example/instr/count.cRun the concolic execution agent with a test harness (code_example/harness/count.py):
python3 ACE.py run \
--project_dir ./code_example/instr/ \
--execution ./code_example/harness/count.py \
--out ./out/ \
--rounds 2 \
--parallel_num 3Parameters:
--project_dir: Directory containing the instrumented code and the compiled program--execution: Test harness (a Python script that executes the target program and returns stderr + exit code)--out: Output directory for generated test cases and logs--rounds: Number of concolic execution rounds (default: unlimited until coverage plateau)--parallel_num: Maximum number of concurrent test case generations per round
Expected cost: ~$0.40 for 2 rounds with Claude-3.7-Sonnet
Output:
./out/ConcoLLMic_*.log— detailed execution log./out/queue/id:*.yaml— generated test cases with metadata
Analyze testing statistics (tokens used, costs):
python3 ACE.py run_data --out_dir ./out/To measure code coverage or verify bugs:
- Compile the original (non-instrumented) code with coverage:
cd ./code_example/src/
gcc --coverage -o count count.c
lcov -z -d .
cd ../../- Replay generated test cases:
python3 ACE.py replay \
./out/ \
./code_example/src/ \
./coverage.csv \
--cov_script ./code_example/src/coverage.shOutput:
./coverage.csv— time-series coverage data for each test case./code_example/src/count.c.gcov— line-by-line coverage report
The gcov report will show that the bug at line 67 has been triggered.
ConcoLLMic provides a unified CLI interface:
python3 ACE.py <command> [options]| Command | Description |
|---|---|
instrument |
Instrument source code for tracing |
run |
Execute concolic testing |
replay |
Replay generated test cases |
instrument_data |
Analyze instrumentation statistics |
run_data |
Display testing statistics and costs |
For detailed options, run:
python3 ACE.py <command> --helpOr visit the documentation.
To reproduce the paper's evaluation results or run ConcoLLMic on real-world benchmarks, see the experiments/README.md.
We provide Docker-based setups for:
- C/C++ benchmarks
- Multi-language systems
- Comparison with other tools — KLEE, KLEE-Pending, AFL++, SymCC, SymSan
This project is dual licensed:
- Non-commercial use: Licensed under GPLv3 for academic and non-commercial purposes
- Commercial use: Please contact the authors for commercial licensing
For full license details, see LICENSE.
For commercial licensing inquiries, contact:
- Abhik Roychoudhury (abhik@nus.edu.sg)
- Cristian Cadar (c.cadar@imperial.ac.uk)
- Jonathan Tan (jonathan_tan@nus.edu.sg)
@inproceedings{luo2026concollmic,
title={Agentic Concolic Execution},
author={Luo, Zhengxiong and Zhao, Huan and Wolff, Dylan and Cadar, Cristian and Roychoudhury, Abhik},
booktitle={2026 IEEE Symposium on Security and Privacy (S\&P)},
pages={1--19},
year={2026},
organization={IEEE}
}


