diff --git a/docs/directory.conf b/docs/directory.conf index d38f436e..bb91c683 100644 --- a/docs/directory.conf +++ b/docs/directory.conf @@ -6,7 +6,9 @@ laika.navigationOrder = [ diet.md discrete.md disjointsets.md + hashmap.md + hashset.md predicate.md range.md set.md -] \ No newline at end of file +] diff --git a/docs/hashmap.md b/docs/hashmap.md new file mode 100644 index 00000000..548dfd8a --- /dev/null +++ b/docs/hashmap.md @@ -0,0 +1,193 @@ +# HashMap + +`HashMap` is an immutable hash map using [`cats.kernel.Hash`](https://typelevel.org/cats/api/cats/kernel/Hash.html) for hashing. +It is implemented using the **CHAMP encoding** (Compressed Hash-Array Mapped Prefix Tree). + +CHAMP is an efficient persistent data structure that combines **bit-mapped indexing** and **structural sharing**, providing high performance for functional programming. + +--- + +## Internal Representation + +The structure of `HashMap` is based on the observation that hash codes can be viewed as **prefix trees** of bits. +Each key’s hash code is divided into 5-bit segments (for 32 possible branches). +Each segment determines which path to take down the tree. + +For example, a 32-bit hash is broken into up to **7 segments** (since 7 × 5 bits = 35 bits, covering all bits of the hash). + +Each node in this tree (a **BitMapNode**) stores: +- A *bitmap* indicating positions of key-value pairs. +- Another *bitmap* indicating positions of child nodes. + +When two keys share the same hash, they are stored in a **CollisionNode**, which simply holds all colliding key-value pairs. + +This design ensures: +- Efficient updates and lookups (`O(1)` average) +- Memory efficiency through **bitmaps** +- Full immutability and **structural sharing** + +--- + +## Best and Worst Case Analysis + +| Case | Description | Space | Time Complexity | +|------|--------------|--------|-----------------| +| **Best Case** | Hashes are uniformly distributed (no collisions). The structure is shallow and branchless. | O(n) | O(1) average for lookup, insert, remove | +| **Worst Case** | All keys share identical hash codes. Stored as a single `CollisionNode`. | O(n) | O(n) per lookup due to linear scan within collisions | + +In practice, using a good `Hash` instance (such as `Hash.fromUniversalHashCode`) avoids pathological cases. + +--- + +## Supported Operations + +- `empty`: create an empty map +- `apply`: create a map from key–value pairs +- `fromSeq`: build from a Scala sequence +- `fromIterableOnce`: build from any iterable collection +- `fromFoldable`: build from a Cats Foldable +- `contains`: test whether a key exists +- `get`: get the value associated with a key +- `getOrElse`: get the value or return a default +- `updated`: add or update a key–value pair +- `removed`: remove a key +- `iterator`: iterate over key–value pairs +- `keysIterator`: iterate over keys +- `valuesIterator`: iterate over values +- `===`: type-safe equality check using `Eq` +- `hash`: compute a hash using `cats.kernel.Hash` +- `show`: string representation using `cats.Show` + +--- + +## `HashMap` is *showable* and *comparable* so you can call `show` or `===` on it. + +--- + +## Example usage + +Start by creating an empty HashMap: + +```scala mdoc +import cats._ +import cats.implicits._ +import cats.collections._ + +val hm = HashMap.empty[Int, String] +hm.isEmpty +hm.show +``` + +Add some key-value pairs: + +```scala mdoc +val hm2 = hm.updated(1, "One").updated(2, "Two") +hm2.show +``` + +You can check for existence and get values: + +```scala mdoc +hm2.contains(1) +hm2.contains(3) + +hm2.get(1) +hm2.getOrElse(3, "Unknown") +``` + +If we remove an element, we get a new map: + +```scala mdoc +val hm3 = hm2.removed(1) +hm3.show +``` + +Building a map directly: + +```scala mdoc +val hm4 = HashMap(1 -> "A", 2 -> "B", 3 -> "C") +hm4.show +``` + +Creating from a collection: + +```scala mdoc:nest +val seqMap = HashMap.fromSeq(Seq(10 -> "X", 20 -> "Y", 30 -> "Z")) +seqMap.contains(20) +seqMap.get(30) +``` + +Using Cats abstractions: + +```scala mdoc:nest +val doubled = seqMap.unorderedTraverse(v => Option(v + v)) +doubled.map(_.show) +``` + +--- + +## Internal Visualization + +Consider inserting keys `1`, `2`, and `33` into an empty `HashMap`. + +Their (simplified) hash codes might look like this (in binary): + +``` +1 => 00001 00000 ... +2 => 00010 00000 ... +33 => 00001 00001 ... +``` + +- The **first 5 bits** decide the *branch position* at the root level. +- Keys `1` and `33` share the prefix `00001`, so they go into the same branch. +- Within that branch, the next 5 bits are compared: + - `1` continues at sub-index `00000` + - `33` continues at sub-index `00001` + +The structure becomes: + +``` +Root + ├── [00001] → Node + │ ├── [00000] = (1 -> "One") + │ └── [00001] = (33 -> "Thirty-Three") + └── [00010] = (2 -> "Two") +``` + +If `1` and `33` had identical hashes, they’d be stored together in a `CollisionNode`: +``` +CollisionNode(hash=..., values=[(1 -> "One"), (33 -> "Thirty-Three")]) +``` + +This structure allows **fast lookups** by traversing at most one path determined by hash segments. + +--- + +## Example of Equality and Hashing + +```scala mdoc +import cats.kernel.instances.int._ +import cats.kernel.instances.string._ + +val a = HashMap(1 -> "A", 2 -> "B") +val b = HashMap(1 -> "A", 2 -> "B") + +a === b // true + +a.hash == b.hash // consistent hashing +``` + +--- + +## Summary + +| Feature | Description | +|----------|--------------| +| **Type** | Immutable HashMap | +| **Hashing** | Uses `cats.kernel.Hash` | +| **Implementation** | CHAMP (Compressed Hash-Array Mapped Prefix Tree) | +| **Handles** | Key collisions using `CollisionNode` | +| **Typeclasses** | `Eq`, `Hash`, `Show`, `UnorderedTraverse`, `CommutativeMonoid` | +| **Complexity** | O(1) average lookup/update | +| **Immutable** | Yes — all updates return a new structure | + diff --git a/docs/hashset.md b/docs/hashset.md new file mode 100644 index 00000000..901dc528 --- /dev/null +++ b/docs/hashset.md @@ -0,0 +1,194 @@ +# HashSet + +`HashSet` is an immutable hash set implemented with the **CHAMP** (Compressed Hash-Array Mapped Prefix-tree) encoding. +It stores an unordered collection of unique elements of type `A` and relies on a `cats.kernel.Hash[A]` instance for hashing and typeclass-aware equality. + +This Cats Collections implementation is derived from Scala’s immutable `HashSet` and adapted to integrate with Cats typeclasses. + +The CHAMP trie splits 32-bit hashes into 5-bit partitions at successive depths, using compact bitmaps and small arrays to represent node contents. When many elements map to the same 5-bit segment (or to the same 32-bit hash), the implementation uses sub-nodes or a collision node to preserve correctness. + +--- + +## Best and Worst Case Analysis + +- **Best case (well distributed hashes)** + Most operations — lookup (`contains`), insertion (`add`), deletion (`remove`) — run in expected **O(1)** time on average. + +- **Worst case (heavy hash collisions or adversarial hashes)** + Many elements colliding to the same 32-bit hash lead to a `CollisionNode` collection; operations can degrade toward **O(n)** for the colliding elements. + +Memory usage is efficient for broad distributions due to compact bitmaps and structural sharing; however collision nodes and copying on updates can increase memory footprint in degenerate cases. + +--- + +## Supported Operations + +- `empty` — create an empty `HashSet`. +- `apply(as: A*)` / `fromSeq` / `fromIterableOnce` / `fromFoldable` — construct a `HashSet` from collections. +- `iterator` — one-time iterator over elements. +- `size` — number of elements. +- `isEmpty` / `nonEmpty` — emptiness checks. +- `foreach(f)` — iterate for side effects. +- `contains(value)` — membership test. +- `add(value)` / `+` — return a new set with `value` added. +- `remove(value)` / `-` — return a new set with `value` removed. +- `union(set)` / `union(iterable)` — union of sets. +- `diff(set)` / `diff(iterable)` — difference (this \ that). +- `intersect(set)` — set intersection. +- `filter(f)` / `filterNot(f)` — retain / drop elements by predicate. +- `toSet` — convert to a standard Scala `Set` wrapper (`WrappedHashSet`). +- `===(that)` — typesafe equality (uses `Eq` semantics). +- `equals`, `hashCode`, `toString` — standard JVM-style operations. +- `show` (via `Show` instance) — pretty printing using `cats.Show`. +- `improve(hash: Int)` — hash mixing helper (private utility). + +--- + +## `HashSet` is *showable* and integrates with Cats typeclasses + +`HashSet` supports and provides instances for several Cats and Cats-Kernel typeclasses: + +- `UnorderedFoldable[HashSet]` +- `CommutativeMonoid[HashSet[A]]` (union) +- `Show[HashSet[A]]` +- `Hash[HashSet[A]]` +- `DistributiveLattice[HashSet[A]]` (join = union, meet = intersection) + +There are also concrete monoid implementations: +- `HashSetUnionMonoid[A]` — combines by union. +- `HashSetIntersectionMonoid[A]` — combines by intersection. + +--- + +## Internal structure (developer reference — Node API) + +### `HashSet.Node[A]` (abstract) + +Defines the common API for CHAMP nodes. + +- `allElementsCount`, `valueCount`, `nodeCount`, `size` +- `getValue(index)`, `getNode(index)` +- `hasNodes`, `hasValues` +- `foreach(f)`, `contains(element, hash, depth)`, `add`, `remove`, `===` +- `sizeHint`: approximation used for deletions + +### `CollisionNode[A]` + +Handles multiple elements sharing the same hash. +Contains `collisionHash`, `contents: NonEmptyVector[A]`, and implements `contains`, `add`, `remove`, `===`. + +### `BitMapNode[A]` + +Main trie node with `valueMap`, `nodeMap`, and `contents: Array[Any]`. +Provides value and node indexing, bitmap-based lookup, and efficient merges of subtrees. + +--- + +## Iterator + +`HashSet.Iterator[A]` performs depth-first traversal without recursion, using fixed-size arrays for stack and cursor management. +`hasNext` and `next()` follow standard Scala iterator semantics. + +--- + +## Companion utilities and factories + +- `improve(hash: Int)` — hash mixing utility. +- `empty[A]`, `apply[A](as: A*)`, `fromSeq`, `fromIterableOnce`, `fromFoldable` — constructors for creating `HashSet`s. + +--- + +## Typeclass instances provided (complete list) + +- `UnorderedFoldable[HashSet]` +- `CommutativeMonoid[HashSet[A]]` +- `Show[HashSet[A]]` +- `Hash[HashSet[A]]` +- `DistributiveLattice[HashSet[A]]` + +Monoid variants: +- `HashSetUnionMonoid[A]` +- `HashSetIntersectionMonoid[A]` + +--- + +## Examples + +```scala mdoc +import cats.collections._ +import cats.implicits._ + +val s = HashSet.empty[Int] +s.isEmpty +s.size +s.show +``` + +Add and check: + +```scala mdoc +val s1 = s.add(10).add(5).add(20) +s1.contains(10) +s1.show +``` + +Remove elements: + +```scala mdoc +val s2 = s1.remove(10) +s2.show +``` + +Union / intersection / difference: + +```scala mdoc +val a = HashSet(1, 2, 3) +val b = HashSet(3, 4, 5) +(a.union(b)).show +(a.intersect(b)).show +(a.union(b).diff(b)).show +``` + +Filter: + +```scala mdoc +val u = a.union(b) +u.filter(_ % 2 == 0).show +u.filterNot(_ % 2 == 0).show +``` + +Constructors: + +```scala mdoc +HashSet.fromSeq(Seq(1, 2, 3)).show +HashSet.fromIterableOnce(Vector(10, 11, 10)).show +HashSet.fromFoldable(Option(42)).show +``` + +Iterator and conversion: + +```scala mdoc +val it = a.iterator +it.hasNext +it.next() +a.toSet +``` + +Equality and show: + +```scala mdoc +val x = HashSet(1,2,3) +val y = HashSet(3,2,1) +x === y +x.hashCode +x.show +``` + +--- + +## Summary + +- `HashSet` is an immutable, CHAMP-based set implementation optimized for fast, type-safe hashing. +- It supports full set algebra and integrates with Cats typeclasses. +- Efficient, persistent structure with compact memory layout and safe updates. +