ZoryaTrace is a powerful artificial intelligence algorithm designed to analyze texts and determine whether the content is AI-generated or not. ZoryaTrace leverage individual user data to determinate if LLM was used to generate text. To get into the mathematical details, ZoryaTrace uses the Naive Bayes classifier, which is based on the Bayes' theorem to achieve this goal.
🚧 Currently in HEAVY development – many features are still being built and refined.
📜 Informations (13 april 2025) : The data extractor, the algorithm (backend) and the interface (zorya.py) have been developed. I've also written the full documentation. I still have to implement default ai-generated training data.
- Features
- Disclaimer
- Limitations
- Installation
- Usage
- Algorithm Overview
- Ressources
- Contributing
- License
- Credits
- AI Detection - Analyze text to classify AI-generated content.
- User-Based Analysis - Adapt detection based on individual user patterns.
- Scalability - Designed to handle large-scale text processing.
| Feature | ZoryaTrace | Other Solutions |
|---|---|---|
| Open Source | ✅ Transparent and modifiable | ❌ Often proprietary and closed-source |
| Privacy-focused | ✅ No user tracking, fully local processing | ❌ Often cloud-based, collects user data |
| Security | ✅ Local execution, no external data leaks | ❌ Data sent to third-party servers |
| Efficiency | ✅ Optimized TF-IDF & Naive Bayes, few seconds analysis | |
| Ease of Use | ✅ Easy setup | |
| Customization | ✅ Fully customizable training dataset | ❌ Limited or no customization options |
| Lightweight | ✅ Minimal dependencies, runs on low-end devices | ❌ Heavy dependencies, requires cloud infrastructure |
| No API Limits | ✅ Works offline, no request limits | ❌ API-based, limited free requests |
| Modern LLMs detection | ❌ May struggle with hightly creative AI texts from modern LLMs | ✅ Hightly creative AI text is often better detected |
| Human/AI merged content | ❌ May struggle with human-edited AI content | ✅ Human/AI merged content may be betters supported |
Zorya refers to two (sometimes three) deities in Slavic mythology — Zorya Utrennyaya (Morning Star) and Zorya Vechernyaya (Evening Star), occasionally joined by Zorya Polunochnaya (Midnight Star). These celestial sisters serve as guardians of Simargl, a cosmic hound chained to the star Polaris. If the chain ever breaks, it is said that the universe would be destroyed.
The Zoryas are tasked with watching the sky, opening the gates of the Sun each morning and closing them each night. They represent constant vigilance, protection against unseen threats, and balance between light and darkness.
ZoryaTrace draws inspiration from this mythos: just as the Zorya monitor the heavens for signs of cosmic disruption, the tool monitors digital texts to detect traces of artificial generation. The goal is not to judge, but to provide early signals, protect informational integrity, and offer clarity.
Please Read Carefully
ZoryaTrace is provided as an experimental tool for content analysis and educational research purposes. By using this software, you acknowledge and agree to the following:
- The results produced by ZoryaTrace are probabilistic in nature and do not constitute definitive or authoritative assessments.
- The tool may generate false positives or false negatives, and must not be solely relied upon for critical decisions.
- ZoryaTrace is not certified for legal, regulatory, compliance, or forensic use.
- The accuracy and reliability of outputs depend heavily on the quality and relevance of the underlying data.
- The developers, contributors, and associated entities assume no liability for any direct or indirect damages, losses, or consequences resulting from the use or interpretation of the software’s outputs.
- Use of this tool is entirely at the user's own risk.
- Always conduct a human review of any content flagged or analyzed by the tool.
- For professional or domain-specific usage, consider retraining the model on relevant datasets.
- Maintain proper audit logs and documentation for transparency and accountability.
- Ensure that your use of the software complies with applicable laws and ethical guidelines in your jurisdiction.
By proceeding, you acknowledge your understanding of these limitations and agree to use ZoryaTrace responsibly.
🪫 As explained above, ZoryaTrace is still under development. This means that you can't download the executables directly at the moment, you have to run it in a python environment, as explained below. In addition, some databases are not currently provided; you need to provide your own to run ZoryaTrace. This will be temporary, as the project is evolving very quickly.
git clone https://github.com/Malwprotector/ZoryaTrace.git
cd zoryatrace
pip install -r requirements.txtThen run it with
python3 zorya.pyWindows, Linux and Mac binaries will be available here soon !
- Python 3.8+
- 1GB RAM minimum (4GB recommended)
- 200MB disk space
Main application window with tab navigation
- You'll need to create training data before analyzing things. But don't worry, everything will be explained simply ! ✨
- Click "Add human written PDF(s)" to import your human written data. You can import several : the aim is to import as many PDFs written by a human as possible. If you want to base yourself on a particular subject or langage, you can import the data for that subject.
- Toggle checkbox to use default suspicious data or add custom data. So in fact, you don't have to import data written by an Artificial Intelligence, because ZoryaTrace comes with this data. But as mentioned above, if you're working on a particular subject and you have text data generated by Artificial Intelligence, you can import that too.
- View loaded files in the listbox.
- Click "Create Training Data".
- The training data file is named
data.csv, and it will be saved in the same directory of ZoryaTrace. - Monitor progress in status bar (It's often extremely fast).
- Completion notification will appear !
- Now that you have created the training data, you can use the algorithm freely. Make sure you don't delete or move your
data.csvtraining data file.
- This is ZoryaTrace's main function. Here, you submit written files, and the algorithm analyses them to determine the percentage of content generated by AI.
- Click "Select PDF File".
- Choose document to analyze.
- Filename appears in status.
- Click "Analyze PDF".
- Watch real-time progress bar (often very fast).
- Results display with:
- 🔴 Red highlighted suspicious sentences.
- 🟢 Green neutral text.
- Summary statistics.
- Paste or type text in the input box.
- Click "Classify Text".
- View results:
- 🟢 Green = Neutral.
- 🔴 Red = Suspicious.
- Click "Run Algorithm Test"
- View metrics:
- Precision
- Recall
- F-score
- Accuracy
- Sample test sentences with predictions
Problem: "Training data not loaded"
Solution:
- Navigate to Data Preparation tab.
- Ensure both human written and ai-generated samples are loaded.
- Verify
data.csvexists in working directory.
Problem: Slow PDF processing
Solution:
- Split large documents (>50 pages).
- Close other memory-intensive applications.
- Use simpler PDFs (avoid scanned documents).
ZoryaTrace employs a hybrid NLP pipeline combining:
- Sentence Segmentation: Advanced regex patterns handle complex punctuation
- Tokenization: NLTK's word_tokenize with custom modifications
- Normalization:
- Case folding (lowercasing)
- Porter stemming
- Stop word removal (customizable list)
- TF-IDF Vectorization:
- Term Frequency-Inverse Document Frequency weighting
- Adaptive n-gram range (1-3 grams)
- Sublinear TF scaling
- Lexical Features:
- Sentence length analysis
- Word rarity scoring
- Syntactic pattern matching
- Naive Bayes Classifier with TF-IDF features
- Dual Probability Models:
- Neutral content profile
- Suspicious content profile
- Adaptive Thresholding:
- Dynamic classification boundaries
- Confidence scoring
I aven't added my ressources yet I will probably write something about ZoryaTrace.
Contributions are welcome! Feel free to make :
- Bug reports
- Feature requests
- Performance optimizations
- Additional language support
Thank you !
ZoriaTrace is licensed under CC BY-NC-SA 4.0
Developed with <3 by me. Special thanks to contributors and testers who help improve ZoryaTrace (there are none yet).






