Text compression for TEXT_MESSAGE_COMPRESSED_APP (portnum 7) — 2-7x more text per packetPortnum 7 has been unused since Unishox2 was removed due to buffer overflow (#3841, PR #3606). I built an alternative approach that avoids those issues. #9970

dimapanov · 2026-03-22T05:47:11Z

dimapanov
Mar 22, 2026

Portnum 7 has been unused since Unishox2 was removed due to buffer overflow (#3841, PR #3606). I built an alternative approach that avoids those issues.

How it works: Character-level 11-gram language model + arithmetic coding. The model predicts the next character from context, and the coder spends fewer bits on predictable characters. Unlike zlib/Unishox2 which look for patterns inside the message, this uses external language statistics — so even short messages compress well.

Results on typical mesh messages:

Message	UTF-8	Unishox2	n-gram+AC
Check channel 5	15 B	11 B	7 B
Battery 40%, power save	39 B	26 B	12 B
Long English (104 chars)	104 B	65 B	52 B

100% lossless, verified on 2000 test messages.

Safety: Compressed format includes original text length in header → decompression is always bounded. No unbounded buffer writes, no expansion beyond input+1 byte. Graceful fallback if compressed > original.

Architecture: Compression runs on client apps (Android/iOS/Web), not on ESP32. Model needs ~15 MB RAM — doesn't fit on ESP32, but phones have plenty. The radio just relays bytes, no firmware changes needed.

Working demo: https://dimapanov.github.io/mesh-compressor/
Code: https://github.com/dimapanov/mesh-compressor

Questions:

Is there interest in reactivating portnum 7 with a new algorithm?
Would a PR to the Android/iOS/Web apps be welcome, or is this better as a standalone tool?
Any concerns about model size?

dimapanov · 2026-03-22T05:59:36Z

dimapanov
Mar 22, 2026
Author

Update 3: Universal 10-language model — multilingual is solved
Got feedback on Reddit that supporting only Russian/English is a dealbreaker. Fair point. Ran experiments this weekend.
Result: One universal model covers 10 languages at 74-84% compression. Only 1-3% worse than per-language models — not worth the complexity of separate firmware builds.

Language	Compression
Arabic	84%
Japanese	79%
Korean	79%
Russian, English, Spanish	75-77%
German, French, Portuguese, Chinese	74-75%
Model size: 3.5 MB binary (5.3 MB JSON). Still fits on ESP32 via flash mmap. T-Deck/T-Pager (16 MB flash) have room to spare. Heltec V3 (8 MB) needs a custom partition table — layout is in the README.
Other updates since the original post:

Detailed board compatibility table — T-Deck, T-Pager, T-Beam S3, Heltec V3/V4, Station G2 all work. nRF52840 boards (T-Echo) can't fit the model but can relay compressed packets
Firmware-first strategy — compression won't ship in client apps until standalone devices can decode natively
Web demo updated with the universal model — try Spanish, Arabic, Japanese, etc.
Full experiment data and methodology in the repo
Adding more languages just means adding training data. The architecture is language-agnostic. Community contributions of real-world message datasets in any language are welcome. standalone devices (MUI/BaseUI without a phone)

Got this question on Reddit — how would this work on devices not connected to a phone?
The primary architecture is client-side (phone/browser), but there's a potential path for standalone ESP32 devices too. ESP32 supports memory-mapped flash (esp_partition_mmap), which lets you read the model directly from flash without loading it into RAM — only ~1-2 KB needed for the decoder state.

The full model is 94 MB — way too big. But with pruning:

Order 4, threshold 5 → 3.0 MB — fits in flash
Order 3, threshold 5 → 1.4 MB — fits easily

Compression quality would drop from 5-7x to roughly 2-3x, but still a significant improvement over raw UTF-8.
To be clear — this is a hypothesis. The whole repo is a proof of concept. The algorithm works and is verified (2000/2000 lossless), but running a pruned model from flash on actual ESP32 hardware still needs to be built and tested. If there's interest, I'm happy to explore this direction.

2 replies

dimapanov Mar 22, 2026
Author

Update 2: WOW found a 2.8 MB model that matches full model quality — ESP32 flash is feasible

Ran a systematic search across 72 order×threshold combinations (full results (https://github.com/dimapanov/mesh-compressor/blob/main/autoresearch/search_results.tsv)).

Surprising finding:
Order=9 with aggressive pruning (threshold=50) compresses slightly better than the full order=11 model — while being 5x smaller.

Full model (order=11, thr=5): BPC 3.225, 13.5 MB binary, 518K contexts
Pruned model (order=9, thr=50): BPC 3.216, 2.8 MB binary, 63K contexts
Aggressive pruning removes noisy low-count contexts that hurt prediction more than they help. Fewer contexts, better accuracy.
2.8 MB fits in ESP32 flash. With esp_partition_mmap the model is read directly from flash — only ~1-2 KB RAM needed for the decoder state. This means standalone devices with MUI/BaseUI could potentially run compression without a phone.

dimapanov Mar 22, 2026
Author

Update 3: Universal 10-language model — multilingual is solved
Got feedback on Reddit that supporting only Russian/English is a dealbreaker. Fair point. Ran experiments this weekend.
Result: One universal model covers 10 languages at 74-84% compression. Only 1-3% worse than per-language models — not worth the complexity of separate firmware builds.

Language	Compression
Arabic	84%
Japanese	79%
Korean	79%
Russian, English, Spanish	75-77%
German, French, Portuguese, Chinese	74-75%
Model size: 3.5 MB binary (5.3 MB JSON). Still fits on ESP32 via flash mmap. T-Deck/T-Pager (16 MB flash) have room to spare. Heltec V3 (8 MB) needs a custom partition table — layout is in the README.
Other updates since the original post:

Detailed board compatibility table — T-Deck, T-Pager, T-Beam S3, Heltec V3/V4, Station G2 all work. nRF52840 boards (T-Echo) can't fit the model but can relay compressed packets
Firmware-first strategy — compression won't ship in client apps until standalone devices can decode natively
Web demo updated with the universal model — try Spanish, Arabic, Japanese, etc.
Full experiment data and methodology in the repo
Adding more languages just means adding training data. The architecture is language-agnostic. Community contributions of real-world message datasets in any language are welcome.

korbinianbauer · 2026-03-26T13:08:19Z

korbinianbauer
Mar 26, 2026

What would this mean for standalone nodes w/o a companion phone?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Text compression for TEXT_MESSAGE_COMPRESSED_APP (portnum 7) — 2-7x more text per packetPortnum 7 has been unused since Unishox2 was removed due to buffer overflow (#3841, PR #3606). I built an alternative approach that avoids those issues. #9970

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Text compression for TEXT_MESSAGE_COMPRESSED_APP (portnum 7) — 2-7x more text per packetPortnum 7 has been unused since Unishox2 was removed due to buffer overflow (#3841, PR #3606). I built an alternative approach that avoids those issues. #9970

Uh oh!

dimapanov Mar 22, 2026

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

dimapanov Mar 22, 2026 Author

Uh oh!

Uh oh!

dimapanov Mar 22, 2026 Author

Uh oh!

dimapanov Mar 22, 2026 Author

Uh oh!

korbinianbauer Mar 26, 2026

dimapanov
Mar 22, 2026

Replies: 2 comments 2 replies

dimapanov
Mar 22, 2026
Author

dimapanov Mar 22, 2026
Author

dimapanov Mar 22, 2026
Author

korbinianbauer
Mar 26, 2026