Skip to content

Set cardinality estimates using HyperLogLog implementation

Notifications You must be signed in to change notification settings

mattbornski/cardinality

 
 

Repository files navigation

Cardinality estimation using HyperLogLog algorithm Build Status

The HyperLogLog algorithm estimates the cardinality of the data set (i.e. number of distinct elements in the data set) without having to store the actual elements seen, which would be required for a naive unique count implementation. In order to achieve a high degree of accuracy with a low memory footprint, a good hash algorithm must be chosen.

Installation

npm install cardinality

Usage

Extending

Recognizing that other people might not use the algorithm in the exact same way I do, I have attempted to preserve the integrity of the core algorithm while allowing end-users to extend many pieces of the implementation; in particular, the hash algorithm and the storage mechanisms are designed to be easily replaced in a modular fashion.

Known extensions:

Credits

Many tech bloggers and scalability evangelists have been writing about HyperLogLog and related ideas recently; however, this work is principally derived from the following pieces of work:

  1. [http://github.com/sedictor/loglog](The GitHub repo of reference PHP and Javascript implementations of the LogLog and HyperLogLog algorithms by Vadim Semenov), from which this repository was originally forked.

  2. The paper by Philippe Flajolet, Éric Fusy, Olivier Gandouet and Frédéric Meunier entitled "HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm", available http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf as well as blob/master/HyperLogLog.pdf for your reference.

  3. (For future work) [http://hal.archives-ouvertes.fr/docs/00/46/53/13/PDF/sliding_HyperLogLog.pdf](a description of a minor HyperLogLog variation which provides for sliding windows of estimation)

About

Set cardinality estimates using HyperLogLog implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%