This Work Is Pre-Alpha And Not Ready for End Users
You can use https://raw.githubusercontent.com/davedean/lidmeta/refs/heads/main/scripts/switch_metadata_server.sh to switch.
eg: ./switch_metadata_erver.sh --db ./lidarr.db --url https://example.com/api/v1/
Warnings:
- You should expect lidmeta to ruin your Lidarr install and your music collection !!
- No support or assistance will be available.
- No guarantees of service availability are made.
A high-performance metadata provider for Lidarr that processes MusicBrainz dump data into a normalized, efficient format.
LidMeta makes it possible to run your own Lidarr-compatible metadata service completely offline on ordinary hardware. The official metadata server requires roughly 300 GB of storage when mirrored locally; LidMeta brings that down to ≈30 GB by shipping pre-flattened JSON files for artist & album endpoints and a compact SQLite database for search.
Key benefits:
- Works without internet access once the dataset has been generated
- Order-of-magnitude smaller on-disk footprint (≈90 % reduction)
- Fully compatible with Lidarr’s existing API
This project processes MusicBrainz JSON dumps to create a comprehensive, normalized metadata cache for Lidarr. The system achieves storage efficiency while maintaining most metadata coverage including:
- ✅ Artist Information: Name, type, genres, aliases, biography
- ✅ Album Metadata: Title, type, release date, genres, disambiguation
- ✅ Release Details: Multiple releases per album with formats, countries, labels
- ✅ Track Listings: Complete track data with durations
- ✅ Rich Metadata: Images, ratings, links, aliases
- ≈90% size reduction compared with a full mirror (300 GB → 30 GB)
- Eliminates redundancy in raw dump data
- Optimized for Lidarr field requirements
- All release types: Studio albums, live albums, EPs, singles, compilations
- Multiple releases per album: Different formats, countries, labels
- Complete track data: Titles, durations, positions
- Rich metadata: Genres, ratings, images, links
- Compatible data structure with existing mappers
- All required fields included
- Direct API integration ready
- Complete metadata coverage confirmed
- Lidarr compatibility verified
The LidMeta Server is composed of three core services that work together to deliver high-performance metadata searching. The system is designed to be deployed as a cohesive stack using Docker Compose, but each service can be run independently for testing and development.
LidMeta is designed around a minimal runtime stack backed by an offline build step:
graph TD
A["Lidarr"] --> P["Proxy (dev only)"]
P --> C["Caddy"]
C --> S["Search Server"]
C --> J["Static JSON (artist / album)"]
subgraph "Offline"
D["Data Processor"]
end
D --> J
D --> DB["Search SQLite DB"]
The architecture includes:
- Caddy Proxy: A reverse proxy that routes requests from Lidarr to the appropriate backend service.
- Search Service: The primary endpoint for Lidarr's search queries. It uses a normalized SQLite database to return metadata results quickly.
- Capture Proxy: A secondary endpoint that captures Lidarr's requests and saves them as test fixtures. This allows for contract testing and validation.
- Data Processor: An offline component responsible for processing raw MusicBrainz data dumps into a normalized format.
The data processing strategy relies on building a reverse index to efficiently map artists and release groups to their corresponding releases. This approach allows for single-pass processing of the massive MusicBrainz release file, which is too large to fit into memory.
The process is as follows:
- Build Indexes: Create reverse indexes for artists and release groups, mapping their MBIDs to the line number where they appear in the raw data files.
- Stream Releases: Stream the compressed release file, which contains over 4.8 million releases.
- Normalize Data: For each release, use the reverse indexes to look up the corresponding artist and release group information.
- Enrich and Save: Enrich the release data with the full artist and release group metadata, and save the normalized output to a new file.
This strategy has been proven to achieve a 99.3% reduction in data size while maintaining complete data coverage. This strategy yields an approximate 10× reduction in storage requirements compared with mirroring the full MusicBrainz dataset.
To get started with the LidMeta Server, clone the repository and use the provided Make commands to build and run the services.
# Clone the repository
git clone https://github.com/your-username/lidarr-metadata-server.git
cd lidarr-metadata-server
# download a musicbrainz json dump
`manual steps`
# Build and run the service stack
make upThis will start the Caddy proxy, search service, and capture proxy using Docker Compose.
Tell Lidarr where to find our server by running sqlite ./deploy/lidarr_config/lidarr.db.
INSERT INTO Config (Key, Value) VALUES ('metadatasource', 'http://proxy:5000/api/v1/');
The primary way to interact with the server is through the Lidarr interface. Once the services are running, you can configure Lidarr to use the local metadata provider.
The repository includes a suite of tests to ensure the system is working correctly. You can run the tests using the following Make command:
# Run all tests
make testlidarr-metadata-server/
├── capture_proxy/ # Captures Lidarr requests for contract testing
├── data_processor/ # Processes raw MusicBrainz data
├── schema_validator/ # Validates data against the Lidarr API schema
├── search_service/ # Handles search queries from Lidarr
├── tests/ # Unit and integration tests
├── tools/ # Helper scripts for development and data analysis
├── Caddyfile # Configuration for the Caddy reverse proxy
├── docker-compose.yml # Docker Compose file for running the service stack
├── Makefile # Makefile for common development tasks
└── pyproject.toml # Project dependencies and configuration
The following sections provide more detail on the individual components of the system.
The data_processor is responsible for converting the raw MusicBrainz data dumps into a normalized format that can be efficiently searched. It uses a streaming approach to handle the large volume of data and builds reverse indexes to avoid memory issues.
The capture_proxy is a simple proxy that sits between Lidarr and the search service. It captures requests and saves them as test fixtures, which can be used to validate the system's behavior.
The caddy serves the static artist/album json files, and forwards requests to the search service.
The search_service is a lightweight Flask application that exposes a search endpoint for Lidarr. It queries the normalized SQLite database and returns results in a format that Lidarr can understand.
Contributions are welcome! If you'd like to improve the LidMeta Server, please follow these steps:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more information.
- MusicBrainz for providing the comprehensive dump data
- Lidarr team for the excellent metadata provider architecture
- Community contributors for testing and feedback