Skip to content

Conversation

@GeiserX
Copy link
Owner

@GeiserX GeiserX commented Jan 28, 2026

Summary

Major release with schema normalization, new whitelist filtering mode, and several bug fixes.

Issues Addressed

Breaking Changes

Schema Normalization (addresses #45)

Media metadata moved from messages table to dedicated media table:

Removed columns from messages:

  • media_type
  • media_id
  • media_path

API response format changed:

// Before (v5.x)
{ "media_type": "photo", "media_path": "/path/file.jpg" }

// After (v6.0.0)
{ "media": { "type": "photo", "file_path": "/path/file.jpg", "file_size": 12345 } }

Removed LISTEN_ALBUMS (fixes #46)

Albums are now automatically handled via grouped_id in the NewMessage handler. The separate Album event handler was redundant and has been removed.

New Features

CHAT_IDS Whitelist Mode (fixes #48)

New simple way to backup only specific chats:

# Backup ONLY these 2 chats - nothing else
CHAT_IDS=-1001234567890,-1009876543210

When CHAT_IDS is set, ALL other filtering options are ignored.

Mode Variable Behavior
Whitelist (new) CHAT_IDS=id1,id2 Backup ONLY these chats
Type-based (default) CHAT_TYPES=channels Backup ALL chats of that type

This addresses the common confusion where users expected *_INCLUDE_CHAT_IDS to act as a whitelist.

Database Changes

Foreign Keys Added

  • media(message_id, chat_id)messages(id, chat_id) with CASCADE delete
  • reactions.user_idusers.id with SET NULL on delete

New Indexes

  • idx_messages_reply_to - Fast reply lookups
  • idx_media_downloaded - Find undownloaded media
  • idx_media_type - Filter by media type
  • idx_reactions_user - User reaction queries
  • idx_chats_username / idx_users_username - Username lookups

Migration

Automatic via Alembic. The migration:

  1. Creates new indexes
  2. Adds foreign key constraints
  3. Drops redundant columns from messages table

Backup your database before upgrading.

Test Plan

  • Schema migration runs successfully
  • Viewer displays messages with media correctly
  • Albums display as grids via grouped_id
  • CHAT_IDS whitelist mode works
  • Listener respects whitelist mode
  • Test on production-like data volume

BREAKING CHANGE: Media metadata is now stored exclusively in the media
table. The media_type, media_id, and media_path columns have been removed
from the messages table.

Changes:
- Remove media_type, media_id, media_path from messages table
- Add proper FK constraints (sender_id->users, media->messages, reactions->users)
- Add performance indexes for reply lookups, media filtering, username searches
- Update API to return media as nested object instead of flat fields
- Update frontend to use msg.media?.type syntax
- Service messages now use raw_data.service_type
- Polls now detected by raw_data.poll presence

Migration:
- Alembic migration 005 handles data migration automatically
- Creates backup table for rollback capability
- Supports both SQLite and PostgreSQL
@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

sender_id in messages can contain channel/group IDs (negative values)
that don't exist in the users table. Removed the FK constraint but
kept the ORM relationship for joins.
@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

1 similar comment
@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

Albums are now handled entirely via grouped_id in the NewMessage handler:
- Added grouped_id capture to listener's NewMessage handler
- Removed separate Album event handler (was causing duplicate processing)
- Removed LISTEN_ALBUMS config option

The viewer already groups messages by grouped_id, so albums display correctly
without a separate handler. This simplifies the codebase and fixes issue #46.
@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

The docker-compose.yml was using `${CHAT_TYPES:-default}` syntax which
treats empty values the same as unset, always using the default.

Changed to `${CHAT_TYPES-default}` (without colon) which only uses the
default when the variable is completely unset, allowing CHAT_TYPES=
(empty) to work for whitelist-only mode as documented.

Also updated README with a note explaining this syntax difference for
users who may have older docker-compose.yml files.
@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

## Problem
Users expected `CHANNELS_INCLUDE_CHAT_IDS` to work as a whitelist (backup
ONLY these chats), but it was actually additive (ALSO backup these chats
in addition to what CHAT_TYPES selects). This caused confusion and made
it difficult to backup only specific chats.

## Solution
Added a new `CHAT_IDS` environment variable that provides true whitelist
mode:

```bash
# Backup ONLY these 2 chats - nothing else
CHAT_IDS=-1001234567890,-1009876543210
```

When `CHAT_IDS` is set, ALL other filtering options are ignored.

## Two Filtering Modes

| Mode | When | How it works |
|------|------|--------------|
| Whitelist | CHAT_IDS set | Backup ONLY listed chats |
| Type-based | CHAT_IDS not set | Use CHAT_TYPES + INCLUDE/EXCLUDE |

## Files Changed
- src/config.py: Parse CHAT_IDS, add whitelist_mode flag
- src/listener.py: Respect whitelist mode in _should_process_chat
- docker-compose.yml: Add CHAT_IDS variable with documentation
- README.md: Rewrite chat filtering section with clear examples
- .env.example: Update with new CHAT_IDS option
- docs/CHANGELOG.md: Document the feature

Closes #48
@GeiserX GeiserX changed the title feat!: v6.0.0 schema normalization - remove media duplication feat!: v6.0.0 - schema normalization, whitelist mode, and bug fixes Jan 28, 2026
@github-actions
Copy link

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@GeiserX GeiserX merged commit 76072ae into master Jan 28, 2026
2 of 3 checks passed
@GeiserX GeiserX deleted the ai/v6-schema-refactor branch January 28, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Whitelist-only mode does not work [Question]: Environment variable - LISTEN_ALBUMS

2 participants