Skip to content

Perf/precompile regex v2#793

Open
chillum-codeX wants to merge 1 commit intosipeed:refactor/channel-systemfrom
chillum-codeX:perf/precompile-regex-v2
Open

Perf/precompile regex v2#793
chillum-codeX wants to merge 1 commit intosipeed:refactor/channel-systemfrom
chillum-codeX:perf/precompile-regex-v2

Conversation

@chillum-codeX
Copy link

perf: pre-compile regex patterns in markdownToTelegramHTML

Problem

The markdownToTelegramHTML(), extractCodeBlocks(), and extractInlineCodes() functions in pkg/channels/telegram.go compile 9 regex patterns from scratch on every call using regexp.MustCompile() inside the function body.

Each regexp.MustCompile() allocates ~2-4 KB on the heap for the compiled automaton. For a single outbound message, this creates ~20 KB of unnecessary heap allocation that immediately becomes garbage, increasing GC pressure.

Solution

Move all 9 regexp.MustCompile() calls to package-level var declarations. The compiled patterns are created once at program start and reused across all calls. This is safe because Go's regexp.Regexp is goroutine-safe.

Before

func markdownToTelegramHTML(text string) string {
    text = regexp.MustCompile(`^#{1,6}\s+(.+)$`).ReplaceAllString(text, "$1")
    text = regexp.MustCompile(`^>\s*(.*)$`).ReplaceAllString(text, "$1")
    // ... 5 more inline compilations
}

func extractCodeBlocks(text string) codeBlockMatch {
    re := regexp.MustCompile("```[\\w]*\\n?([\\s\\S]*?)```")
    // ...
}

After

var (
    reHeader     = regexp.MustCompile(`^#{1,6}\s+(.+)$`)
    reBlockquote = regexp.MustCompile(`^>\s*(.*)$`)
    reLink       = regexp.MustCompile(`\[([^\]]+)\]\(([^)]+)\)`)
    reBold       = regexp.MustCompile(`\*\*(.+?)\*\*`)
    reBoldAlt    = regexp.MustCompile(`__(.+?)__`)
    reItalic     = regexp.MustCompile(`_([^_]+)_`)
    reStrike     = regexp.MustCompile(`~~(.+?)~~`)
    reListItem   = regexp.MustCompile(`^[-*]\s+`)
    reCodeBlock  = regexp.MustCompile("```[\\w]*\\n?([\\s\\S]*?)```")
    reInlineCode = regexp.MustCompile("`([^`]+)`")
)

func markdownToTelegramHTML(text string) string {
    text = reHeader.ReplaceAllString(text, "$1")
    text = reBlockquote.ReplaceAllString(text, "$1")
    // ...
}

Impact

  • Eliminates ~20 KB heap allocation per outbound message
  • Reduces GC pressure under high message throughput
  • Zero behavioral change — all regex patterns are identical
  • Measured context: Independent benchmarking (100-1000 msg bursts) showed the telegram-slim build achieves 1-3 MB RSS on Linux. This optimization reduces the per-message GC cost, further tightening the memory footprint.

Testing

  • Build succeeds: go build -tags "telegram pprof smallbuf" ./cmd/picoclaw
  • No functional change — identical regex patterns, same replacement logic
  • regexp.Regexp is documented as goroutine-safe

References

Move 10 regexp.MustCompile() calls from inside functions to package-level
var declarations. Eliminates ~20 KB of heap allocation per outbound message.

- markdownToTelegramHTML: 8 patterns pre-compiled
- extractCodeBlocks: 1 pattern pre-compiled
- extractInlineCodes: 1 pattern pre-compiled
- removeBotMention @username regex stays inline (runtime-dependent)

regexp.Regexp is goroutine-safe, so pre-compiled patterns are safe to share.

Ref: https://github.com/chillum-codeX/picoclaw-under-the-hood
@chillum-codeX chillum-codeX changed the base branch from main to refactor/channel-system February 26, 2026 02:29
@yumosx
Copy link
Contributor

yumosx commented Feb 26, 2026

It seems to be a duplicate: #687

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants