Database adapter for SoundFingerprinting algorithm using LMDB database. It's fast, it's persistent and safe from data violation.
Beware that this adapter supports only audio fingerprints storage. It should be possible to implement video storage as well, but i do not have time to proceed with that functionality. Library is open for contributions
To get library simply install it from Nuget:
Install-Package SoundFingerprinting.Extensions.LMDB
or using dotnet cli
dotnet add package SoundFingerprinting.Extensions.LMDB
As a requirement from dependent library Spreads.LMDB
you have to provide native lmdb library yourself (considering your application architecture target). Take proper native library from here and make sure it always get copied to your compiled application folder (the simplest way is to attach this file to project and mark it as "Copy on Build").
To use LMDB database with SoundFingerprinting create LMDBModelService
object and use it in algorithm, like this:
var audioService = new SoundFingerprintingAudioService();
using(var modelService = new LMDBModelService("db")){
var track = new TrackData("GBBKS1200164", "Adele", "Skyfall", "Skyfall", 2012, 290);
// store track metadata in the datasource
var trackReference = modelService.InsertTrack(track);
// create hashed fingerprints
var hashedFingerprints = FingerprintCommandBuilder.Instance
.BuildFingerprintCommand()
.From(pathToAudioFile)
.UsingServices(audioService)
.Hash()
.Result;
// store hashes in the database for later retrieval
modelService.InsertHashDataForTrack(hashedFingerprints, trackReference);
}
Parameter of LMDBModelService
constructor is path to directory. LMDB will create its files in this directory.
You need to build your application targeting x64 architecture, since LMDB supports only that. On x32 you will encounter runtime errors!
It's VERY important to dispose modelService after usage (although it's best to keep instance for whole application life and dispose it on application closing). Not doing it might cause memory dump, which tries to dump whole VirtualMemory of process. Memory Mapped File is part of VirtualMemory, so it will be dumped as well. That might result in system getting unresponsive for even a couple of minutes!
LMDB itself is very fast key-value database based on B+Tree and Memory Mapped File.
This storage is slow to write (because inserts and deletes are single threaded - locks are already in code) but very fast to random reads (very efficent reading in highly concurrent environments).
LMDB is file-based database, so there is no network protocol used for communication. As a downside to this we can't use this database between machines (due to how Memory Mapped File works it's forbidden to use LMDB database file by network shares - more on this in LMDB documentation).
Huge thanks to all library creators for making this all possible.
- SoundFingerprinting
- LMDB
- Spreads.LMDB (.NET wrapper over LMDB)
- MessagePack (extremly fast binary serializer)
Benchmark (source is in repo) is made using 10 sample tracks. LMDBModelService
is around 10-20% slower than InMemoryService
which is decent enough. I'm still working on optimizations in allocation count and overall performace.
I'm using this adapter in production with 4000 tracks in database. As far as i can tell - 10-20% performance difference still apply on such dataset. I'd love if somebody could test this out on bigger dataset and share his experience.
As you can see in the benchmark - .NET Core is much more optimized to work with this adapter. Scaled performance is better, but not so much. But allocations can get crazy low - from 250MB on .NET Framework to 1.4KB on .NET Core. This is because .NET Core can take advantage of Span
and Memory
constructs leading to zero-copy reads from LMDB database. So i strongly recommend using .NET Core to get the best performance and allocation count.
Whole benchmark results available Here
All are welcome to open Issues and Pull Requests. You can contact me on email jakub.nekro@gmail.com if you have further questions, opinions and feature ideas.
The framework is provided under MIT license agreement.