From 4337d98c9fe8a1806de08f19a3b367aa5fc52447 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:23:50 +0000 Subject: [PATCH 01/15] Create hashmap.md --- docs/hashmap.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 docs/hashmap.md diff --git a/docs/hashmap.md b/docs/hashmap.md new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/docs/hashmap.md @@ -0,0 +1 @@ + From c019098424d2dba8375f6250607adb68094aac4d Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:24:23 +0000 Subject: [PATCH 02/15] Create hashset.md --- docs/hashset.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 docs/hashset.md diff --git a/docs/hashset.md b/docs/hashset.md new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/docs/hashset.md @@ -0,0 +1 @@ + From 0e020cdd5d12ffe844536fdce1a9829dbc329a36 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Sat, 25 Oct 2025 22:53:42 +0000 Subject: [PATCH 03/15] Enhance HashMap documentation with usage and performance Added detailed documentation for HashMap, including its structure, usage examples, performance characteristics, and when to use it. --- docs/hashmap.md | 84 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) diff --git a/docs/hashmap.md b/docs/hashmap.md index 8b137891..bbe225fe 100644 --- a/docs/hashmap.md +++ b/docs/hashmap.md @@ -1 +1,85 @@ +# HashMap +`HashMap` is an **immutable**, high-performance map implementation in `cats.collections`. + + +It uses a hashing system from `cats.kernel.Hash` to keep track of its keys. Under the hood, it relies on a clever data structure called **CHAMP**. This setup helps it deliver fast lookups, efficient updates, and minimal memory overhead, all while preserving immutability. + +## How CHAMP Powers HashMap + +**CHAMP** (Compressed Hash-Array Mapped Prefix-tree) is a modern trie-based design optimized for immutable hash tables. CHAMP uses: + +- **Bitmap-compressed nodes** to track occupied slots, reducing memory waste +- **5-bit hash chunks** to navigate a 32-ary trie (log₃₂ depth) which keeps the tree shallow for fast lookups and updates +- **Structural sharing** to reuse unchanged subtrees during updates, saving memory and time +- **Cache-friendly layouts** store data close together, making access faster + + + +## Usage +`HashMap[K, V]` stores key–value pairs: +- **K** = key (e.g., a name, ID, or lookup label) +- **V** = value (e.g., a score, setting, or associated data) + +### a. Create an empty HashMap + +```scala mdoc +import scala.collection.immutable.HashMap + +// Create an empty HashMap +val emptyScores = HashMap.empty[String, Int] +println(emptyScores) //HashMap() + +``` +### b. Add entries +```scala mdoc +val scores = HashMap("Alice" -> 95, "Bob" -> 88) +println(scores.size) // 2 +println(emptyScores ++ scores) //HashMap(Bob -> 88, Alice -> 95) +``` + +### c. Update value +```scala mdoc +val updateBobScore = scores.updated("Bob", 70) +println(updateBobScore.get("Bob")) // Some(70) +println(updateBobScore) //HashMap(Alice -> 95, Bob -> 70) +``` + +### d. Remove an entry +```scala mdoc +val withoutBob = scores.removed("Bob") +println(withoutBob.size) // 1 +println(withoutBob.contains("Bob")) //false +``` + +Every operation on an immutable HashMap creates a new instance — the original map (scores) is never changed. + +## Performance Characteristics + +- Fast operations: Lookups, inserts, updates, and deletes are all very quick. + +- Predictable speed: Performance stays consistent as your data grows. + +- Low memory use: Only stores what’s needed and shares unchanged parts when you make a new version. + + + +## When to Use HashMap + +Prefer `HashMap` over Scala’s standard immutable `Map` when you: + +- Work in a **purely functional** codebase (e.g., with Cats, ZIO, or fs2) +- Need **frequent updates** without sacrificing performance +- Value **predictable memory usage** and **thread safety** via immutability +- Build interpreters, caches, or stateful pipelines that rely on persistent data + + +## References + +- Steindorfer, M. J. (2019). + **[Efficient Immutable Collections](https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf)**. + PhD Thesis, Vrije Universiteit Amsterdam. + + +- **ptimizing Hash-Array Mapped Tries for +Fast and Lean Immutable JVM Collections** - https://michael.steindorfer.name/publications/oopsla15.pdf?utm_source From a066fdc57d25c94acc436948c9272683a8ba413f Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Sat, 25 Oct 2025 23:14:38 +0000 Subject: [PATCH 04/15] Enhance HashSet documentation with usage and features Added detailed documentation for HashSet, including its structure, usage examples, performance characteristics, and when to use it. --- docs/hashset.md | 89 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/docs/hashset.md b/docs/hashset.md index 8b137891..bd353628 100644 --- a/docs/hashset.md +++ b/docs/hashset.md @@ -1 +1,90 @@ +# HashSet +`HashSet` is an **immutable**, high-performance set implementation in `cats.collections`. + + +HashSet uses `cats.kernel.Hash` to hash its elements and is built on the **CHAMP** data structure. +This gives it fast lookups, efficient updates, and low memory use without any mutation + + +## How CHAMP Powers HashSet + +**CHAMP** (Compressed Hash-Array Mapped Prefix-tree) is a modern trie-based design optimized for immutable hash tables. CHAMP uses: + +- **Bitmap-compressed nodes** to track occupied slots, reducing memory waste +- **5-bit hash chunks** to navigate a 32-ary trie (log₃₂ depth) which keeps the tree shallow for fast lookups and updates +- **Structural sharing** to reuse unchanged subtrees during updates, saving memory and time +- **Cache-friendly layouts** store data close together, making access faster + + + +## Usage +`HashSet[A]` holds unique values. No duplicates. No order. Immutable. + + +### a. Create an empty set + +```scala mdoc +import scala.collection.immutable.HashSet + +// Create an empty HashMap +val nofruits = HashSet.empty[String] +println(nofruits) + +``` +### b. Add items +```scala mdoc +val fruits = nofruits + "apple" + "banana" +println(fruits.size) // 2 +println(fruits) //HashSet(banana, apple) +``` + +### c. Check if an item is in the set +```scala mdoc +println(fruits.contains("apple")) // true +``` + +### d. Remove an item +```scala mdoc +val withoutApple = fruits - "apple" +println(withoutApple.size) // 1 +println(withoutApple) //HashSet(banana) +``` + +### e. Set Operations +```scala +val otherFruits = HashSet("cherry", "banana") +val union = fruits ++ otherFruits +println(union) // HashSet("apple", "banana", "cherry") +val intersection = fruits & otherFruits +println(intersection) // HashSet("banana") +val difference = fruits -- otherFruits +println(difference) // HashSet("apple") +``` + +## Performance Characteristics + +- Fast membership tests: Checking if an item is in the set takes near-constant time. +- Quick adds and removes: Adding or removing elements is efficient, thanks to structural sharing. +- Low memory footprint: Reuses unchanged parts of the set when you create a new version, no wasted space. +. + + + +## When to Use HashSet + +Prefer `HashSet` over Scala’s standard immutable `Set` when you: + +- Work in a **purely functional** codebase (e.g., with Cats, ZIO, or fs2) +- Need **frequent updates** without sacrificing performance +- Value **predictable memory usage** and **thread safety** via immutability +- Build interpreters, caches, or stateful pipelines that rely on persistent data + + +## References + +- Steindorfer, M. J. (2019). + **[Efficient Immutable Collections](https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf)**. + PhD Thesis, Vrije Universiteit Amsterdam. + + **HashSet | Piotr Kosmowski** - https://kospiotr.github.io/docs/notes/development/data_structures/hash_set/?utm_source From c1e6e0be70e51fae66d335d358351c3bebef3ed2 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Sat, 25 Oct 2025 23:33:11 +0000 Subject: [PATCH 05/15] Update example for removing an entry in hashmap --- docs/hashmap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hashmap.md b/docs/hashmap.md index bbe225fe..0c76ef06 100644 --- a/docs/hashmap.md +++ b/docs/hashmap.md @@ -47,7 +47,7 @@ println(updateBobScore) //HashMap(Alice -> 95, Bob -> 70) ### d. Remove an entry ```scala mdoc -val withoutBob = scores.removed("Bob") +val withoutBob = scores - "Bob" println(withoutBob.size) // 1 println(withoutBob.contains("Bob")) //false ``` From 892ab36b2461707cdda82ef07f69d6ce89cb3ca0 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 21:55:41 +0000 Subject: [PATCH 06/15] Modify HashMap documentation with Cats Collections Updated import statement and added Cats Collections reference. --- docs/hashmap.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/hashmap.md b/docs/hashmap.md index 0c76ef06..62557393 100644 --- a/docs/hashmap.md +++ b/docs/hashmap.md @@ -24,7 +24,7 @@ It uses a hashing system from `cats.kernel.Hash` to keep track of its keys. Unde ### a. Create an empty HashMap ```scala mdoc -import scala.collection.immutable.HashMap +import cats._, cats.implicits._, cats.collections._, cats.collections.syntax.all._ // Create an empty HashMap val emptyScores = HashMap.empty[String, Int] @@ -80,6 +80,7 @@ Prefer `HashMap` over Scala’s standard immutable `Map` when you: **[Efficient Immutable Collections](https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf)**. PhD Thesis, Vrije Universiteit Amsterdam. +- **Cats Collections** – https://typelevel.org/cats-collections/ - **ptimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections** - https://michael.steindorfer.name/publications/oopsla15.pdf?utm_source From afe7e2f52763cd688f61e01bb5995557eb74886b Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 21:56:50 +0000 Subject: [PATCH 07/15] Modify HashSet documentation and add Cats Collections link Updated import statement for HashSet and added a reference to Cats Collections. --- docs/hashset.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/hashset.md b/docs/hashset.md index bd353628..0ab1f570 100644 --- a/docs/hashset.md +++ b/docs/hashset.md @@ -25,7 +25,7 @@ This gives it fast lookups, efficient updates, and low memory use without any mu ### a. Create an empty set ```scala mdoc -import scala.collection.immutable.HashSet +import cats._, cats.implicits._, cats.collections._, cats.collections.syntax.all._ // Create an empty HashMap val nofruits = HashSet.empty[String] @@ -87,4 +87,6 @@ Prefer `HashSet` over Scala’s standard immutable `Set` when you: **[Efficient Immutable Collections](https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf)**. PhD Thesis, Vrije Universiteit Amsterdam. +- **Cats Collections** – https://typelevel.org/cats-collections/ + **HashSet | Piotr Kosmowski** - https://kospiotr.github.io/docs/notes/development/data_structures/hash_set/?utm_source From 2ffd9638df1a18531eb8da9e887905ed32a60c92 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 23:29:55 +0000 Subject: [PATCH 08/15] Add files via upload --- hashmap.md | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++ hashset.md | 193 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 385 insertions(+) create mode 100644 hashmap.md create mode 100644 hashset.md diff --git a/hashmap.md b/hashmap.md new file mode 100644 index 00000000..a9a0d3a4 --- /dev/null +++ b/hashmap.md @@ -0,0 +1,192 @@ +# HashMap + +`HashMap` is an immutable hash map using [`cats.kernel.Hash`](https://typelevel.org/cats/api/cats/kernel/Hash.html) for hashing. +It is implemented using the **CHAMP encoding** (Compressed Hash-Array Mapped Prefix Tree). + +CHAMP is an efficient persistent data structure that combines **bit-mapped indexing** and **structural sharing**, providing high performance for functional programming. + +--- + +## Internal Representation + +The structure of `HashMap` is based on the observation that hash codes can be viewed as **prefix trees** of bits. +Each key’s hash code is divided into 5-bit segments (for 32 possible branches). +Each segment determines which path to take down the tree. + +For example, a 32-bit hash is broken into up to **7 segments** (since 7 × 5 bits = 35 bits, covering all bits of the hash). + +Each node in this tree (a **BitMapNode**) stores: +- A *bitmap* indicating positions of key-value pairs. +- Another *bitmap* indicating positions of child nodes. + +When two keys share the same hash, they are stored in a **CollisionNode**, which simply holds all colliding key-value pairs. + +This design ensures: +- Efficient updates and lookups (`O(1)` average) +- Memory efficiency through **bitmaps** +- Full immutability and **structural sharing** + +--- + +## Best and Worst Case Analysis + +| Case | Description | Space | Time Complexity | +|------|--------------|--------|-----------------| +| **Best Case** | Hashes are uniformly distributed (no collisions). The structure is shallow and branchless. | O(n) | O(1) average for lookup, insert, remove | +| **Worst Case** | All keys share identical hash codes. Stored as a single `CollisionNode`. | O(n) | O(n) per lookup due to linear scan within collisions | + +In practice, using a good `Hash` instance (such as `Hash.fromUniversalHashCode`) avoids pathological cases. + +--- + +## Supported Operations + +- `empty`: create an empty map +- `apply`: create a map from key–value pairs +- `fromSeq`: build from a Scala sequence +- `fromIterableOnce`: build from any iterable collection +- `fromFoldable`: build from a Cats Foldable +- `contains`: test whether a key exists +- `get`: get the value associated with a key +- `getOrElse`: get the value or return a default +- `updated`: add or update a key–value pair +- `removed`: remove a key +- `iterator`: iterate over key–value pairs +- `keysIterator`: iterate over keys +- `valuesIterator`: iterate over values +- `===`: type-safe equality check using `Eq` +- `hash`: compute a hash using `cats.kernel.Hash` +- `show`: string representation using `cats.Show` + +--- + +## `HashMap` is *showable* and *comparable* so you can call `show` or `===` on it. + +--- + +## Example usage + +Start by creating an empty HashMap: + +```scala mdoc +import cats._ +import cats.implicits._ +import cats.collections._ + +val hm = HashMap.empty[Int, String] +hm.isEmpty +hm.show +``` + +Add some key-value pairs: + +```scala mdoc +val hm2 = hm.updated(1, "One").updated(2, "Two") +hm2.show +``` + +You can check for existence and get values: + +```scala mdoc +hm2.contains(1) +hm2.contains(3) + +hm2.get(1) +hm2.getOrElse(3, "Unknown") +``` + +If we remove an element, we get a new map: + +```scala mdoc +val hm3 = hm2.removed(1) +hm3.show +``` + +Building a map directly: + +```scala mdoc +val hm4 = HashMap(1 -> "A", 2 -> "B", 3 -> "C") +hm4.show +``` + +Creating from a collection: + +```scala mdoc:nest +val seqMap = HashMap.fromSeq(Seq(10 -> "X", 20 -> "Y", 30 -> "Z")) +seqMap.contains(20) +seqMap.get(30) +``` + +Using Cats abstractions: + +```scala mdoc:nest +val doubled = seqMap.unorderedTraverse(v => Option(v + v)) +doubled.map(_.show) +``` + +--- + +## Internal Visualization + +Consider inserting keys `1`, `2`, and `33` into an empty `HashMap`. + +Their (simplified) hash codes might look like this (in binary): + +``` +1 => 00001 00000 ... +2 => 00010 00000 ... +33 => 00001 00001 ... +``` + +- The **first 5 bits** decide the *branch position* at the root level. +- Keys `1` and `33` share the prefix `00001`, so they go into the same branch. +- Within that branch, the next 5 bits are compared: + - `1` continues at sub-index `00000` + - `33` continues at sub-index `00001` + +The structure becomes: + +``` +Root + ├── [00001] → Node + │ ├── [00000] = (1 -> "One") + │ └── [00001] = (33 -> "Thirty-Three") + └── [00010] = (2 -> "Two") +``` + +If `1` and `33` had identical hashes, they’d be stored together in a `CollisionNode`: +``` +CollisionNode(hash=..., values=[(1 -> "One"), (33 -> "Thirty-Three")]) +``` + +This structure allows **fast lookups** by traversing at most one path determined by hash segments. + +--- + +## Example of Equality and Hashing + +```scala mdoc +import cats.kernel.instances.int._ +import cats.kernel.instances.string._ + +val a = HashMap(1 -> "A", 2 -> "B") +val b = HashMap(1 -> "A", 2 -> "B") + +a === b // true + +a.hash == b.hash // consistent hashing +``` + +--- + +## Summary + +| Feature | Description | +|----------|--------------| +| **Type** | Immutable HashMap | +| **Hashing** | Uses `cats.kernel.Hash` | +| **Implementation** | CHAMP (Compressed Hash-Array Mapped Prefix Tree) | +| **Handles** | Key collisions using `CollisionNode` | +| **Typeclasses** | `Eq`, `Hash`, `Show`, `UnorderedTraverse`, `CommutativeMonoid` | +| **Complexity** | O(1) average lookup/update | +| **Immutable** | Yes — all updates return a new structure | diff --git a/hashset.md b/hashset.md new file mode 100644 index 00000000..30da505e --- /dev/null +++ b/hashset.md @@ -0,0 +1,193 @@ +# HashSet + +`HashSet` is an immutable hash set implemented with the **CHAMP** (Compressed Hash-Array Mapped Prefix-tree) encoding. +It stores an unordered collection of unique elements of type `A` and relies on a `cats.kernel.Hash[A]` instance for hashing and typeclass-aware equality. + +This Cats Collections implementation is derived from Scala’s immutable `HashSet` and adapted to integrate with Cats typeclasses. + +The CHAMP trie splits 32-bit hashes into 5-bit partitions at successive depths, using compact bitmaps and small arrays to represent node contents. When many elements map to the same 5-bit segment (or to the same 32-bit hash), the implementation uses sub-nodes or a collision node to preserve correctness. + +--- + +## Best and Worst Case Analysis + +- **Best case (well distributed hashes)** + Most operations — lookup (`contains`), insertion (`add`), deletion (`remove`) — run in expected **O(1)** time on average. + +- **Worst case (heavy hash collisions or adversarial hashes)** + Many elements colliding to the same 32-bit hash lead to a `CollisionNode` collection; operations can degrade toward **O(n)** for the colliding elements. + +Memory usage is efficient for broad distributions due to compact bitmaps and structural sharing; however collision nodes and copying on updates can increase memory footprint in degenerate cases. + +--- + +## Supported Operations + +- `empty` — create an empty `HashSet`. +- `apply(as: A*)` / `fromSeq` / `fromIterableOnce` / `fromFoldable` — construct a `HashSet` from collections. +- `iterator` — one-time iterator over elements. +- `size` — number of elements. +- `isEmpty` / `nonEmpty` — emptiness checks. +- `foreach(f)` — iterate for side effects. +- `contains(value)` — membership test. +- `add(value)` / `+` — return a new set with `value` added. +- `remove(value)` / `-` — return a new set with `value` removed. +- `union(set)` / `union(iterable)` — union of sets. +- `diff(set)` / `diff(iterable)` — difference (this \ that). +- `intersect(set)` — set intersection. +- `filter(f)` / `filterNot(f)` — retain / drop elements by predicate. +- `toSet` — convert to a standard Scala `Set` wrapper (`WrappedHashSet`). +- `===(that)` — typesafe equality (uses `Eq` semantics). +- `equals`, `hashCode`, `toString` — standard JVM-style operations. +- `show` (via `Show` instance) — pretty printing using `cats.Show`. +- `improve(hash: Int)` — hash mixing helper (private utility). + +--- + +## `HashSet` is *showable* and integrates with Cats typeclasses + +`HashSet` supports and provides instances for several Cats and Cats-Kernel typeclasses: + +- `UnorderedFoldable[HashSet]` +- `CommutativeMonoid[HashSet[A]]` (union) +- `Show[HashSet[A]]` +- `Hash[HashSet[A]]` +- `DistributiveLattice[HashSet[A]]` (join = union, meet = intersection) + +There are also concrete monoid implementations: +- `HashSetUnionMonoid[A]` — combines by union. +- `HashSetIntersectionMonoid[A]` — combines by intersection. + +--- + +## Internal structure (developer reference — Node API) + +### `HashSet.Node[A]` (abstract) + +Defines the common API for CHAMP nodes. + +- `allElementsCount`, `valueCount`, `nodeCount`, `size` +- `getValue(index)`, `getNode(index)` +- `hasNodes`, `hasValues` +- `foreach(f)`, `contains(element, hash, depth)`, `add`, `remove`, `===` +- `sizeHint`: approximation used for deletions + +### `CollisionNode[A]` + +Handles multiple elements sharing the same hash. +Contains `collisionHash`, `contents: NonEmptyVector[A]`, and implements `contains`, `add`, `remove`, `===`. + +### `BitMapNode[A]` + +Main trie node with `valueMap`, `nodeMap`, and `contents: Array[Any]`. +Provides value and node indexing, bitmap-based lookup, and efficient merges of subtrees. + +--- + +## Iterator + +`HashSet.Iterator[A]` performs depth-first traversal without recursion, using fixed-size arrays for stack and cursor management. +`hasNext` and `next()` follow standard Scala iterator semantics. + +--- + +## Companion utilities and factories + +- `improve(hash: Int)` — hash mixing utility. +- `empty[A]`, `apply[A](as: A*)`, `fromSeq`, `fromIterableOnce`, `fromFoldable` — constructors for creating `HashSet`s. + +--- + +## Typeclass instances provided (complete list) + +- `UnorderedFoldable[HashSet]` +- `CommutativeMonoid[HashSet[A]]` +- `Show[HashSet[A]]` +- `Hash[HashSet[A]]` +- `DistributiveLattice[HashSet[A]]` + +Monoid variants: +- `HashSetUnionMonoid[A]` +- `HashSetIntersectionMonoid[A]` + +--- + +## Examples + +```scala mdoc +import cats.collections._ +import cats.implicits._ + +val s = HashSet.empty[Int] +s.isEmpty +s.size +s.show +``` + +Add and check: + +```scala mdoc +val s1 = s.add(10).add(5).add(20) +s1.contains(10) +s1.show +``` + +Remove elements: + +```scala mdoc +val s2 = s1.remove(10) +s2.show +``` + +Union / intersection / difference: + +```scala mdoc +val a = HashSet(1, 2, 3) +val b = HashSet(3, 4, 5) +(a.union(b)).show +(a.intersect(b)).show +(a.union(b).diff(b)).show +``` + +Filter: + +```scala mdoc +val u = a.union(b) +u.filter(_ % 2 == 0).show +u.filterNot(_ % 2 == 0).show +``` + +Constructors: + +```scala mdoc +HashSet.fromSeq(Seq(1, 2, 3)).show +HashSet.fromIterableOnce(Vector(10, 11, 10)).show +HashSet.fromFoldable(Option(42)).show +``` + +Iterator and conversion: + +```scala mdoc +val it = a.iterator +it.hasNext +it.next() +a.toSet +``` + +Equality and show: + +```scala mdoc +val x = HashSet(1,2,3) +val y = HashSet(3,2,1) +x === y +x.hashCode +x.show +``` + +--- + +## Summary + +- `HashSet` is an immutable, CHAMP-based set implementation optimized for fast, type-safe hashing. +- It supports full set algebra and integrates with Cats typeclasses. +- Efficient, persistent structure with compact memory layout and safe updates. From 993c776ca30b11208d9c91e8b8f82d038bd9cef0 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 23:37:39 +0000 Subject: [PATCH 09/15] Enhance HashMap documentation with detailed insights Expanded documentation on HashMap's internal representation, performance characteristics, and usage examples. Added sections on best and worst case analysis, supported operations, and when to use HashMap. --- docs/hashmap.md | 198 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 158 insertions(+), 40 deletions(-) diff --git a/docs/hashmap.md b/docs/hashmap.md index 62557393..a2dcfc04 100644 --- a/docs/hashmap.md +++ b/docs/hashmap.md @@ -1,77 +1,195 @@ # HashMap -`HashMap` is an **immutable**, high-performance map implementation in `cats.collections`. +`HashMap` is an immutable hash map using [`cats.kernel.Hash`](https://typelevel.org/cats/api/cats/kernel/Hash.html) for hashing. +It is implemented using the **CHAMP encoding** (Compressed Hash-Array Mapped Prefix Tree). +CHAMP is an efficient persistent data structure that combines **bit-mapped indexing** and **structural sharing**, providing high performance for functional programming. -It uses a hashing system from `cats.kernel.Hash` to keep track of its keys. Under the hood, it relies on a clever data structure called **CHAMP**. This setup helps it deliver fast lookups, efficient updates, and minimal memory overhead, all while preserving immutability. +--- -## How CHAMP Powers HashMap +## Internal Representation -**CHAMP** (Compressed Hash-Array Mapped Prefix-tree) is a modern trie-based design optimized for immutable hash tables. CHAMP uses: +The structure of `HashMap` is based on the observation that hash codes can be viewed as **prefix trees** of bits. +Each key’s hash code is divided into 5-bit segments (for 32 possible branches). +Each segment determines which path to take down the tree. -- **Bitmap-compressed nodes** to track occupied slots, reducing memory waste -- **5-bit hash chunks** to navigate a 32-ary trie (log₃₂ depth) which keeps the tree shallow for fast lookups and updates -- **Structural sharing** to reuse unchanged subtrees during updates, saving memory and time -- **Cache-friendly layouts** store data close together, making access faster +For example, a 32-bit hash is broken into up to **7 segments** (since 7 × 5 bits = 35 bits, covering all bits of the hash). +Each node in this tree (a **BitMapNode**) stores: +- A *bitmap* indicating positions of key-value pairs. +- Another *bitmap* indicating positions of child nodes. +When two keys share the same hash, they are stored in a **CollisionNode**, which simply holds all colliding key-value pairs. -## Usage -`HashMap[K, V]` stores key–value pairs: -- **K** = key (e.g., a name, ID, or lookup label) -- **V** = value (e.g., a score, setting, or associated data) +This design ensures: +- Efficient updates and lookups (`O(1)` average) +- Memory efficiency through **bitmaps** +- Full immutability and **structural sharing** -### a. Create an empty HashMap +--- + +## Best and Worst Case Analysis + +| Case | Description | Space | Time Complexity | +|------|--------------|--------|-----------------| +| **Best Case** | Hashes are uniformly distributed (no collisions). The structure is shallow and branchless. | O(n) | O(1) average for lookup, insert, remove | +| **Worst Case** | All keys share identical hash codes. Stored as a single `CollisionNode`. | O(n) | O(n) per lookup due to linear scan within collisions | + +In practice, using a good `Hash` instance (such as `Hash.fromUniversalHashCode`) avoids pathological cases. + +--- + +## Supported Operations + +- `empty`: create an empty map +- `apply`: create a map from key–value pairs +- `fromSeq`: build from a Scala sequence +- `fromIterableOnce`: build from any iterable collection +- `fromFoldable`: build from a Cats Foldable +- `contains`: test whether a key exists +- `get`: get the value associated with a key +- `getOrElse`: get the value or return a default +- `updated`: add or update a key–value pair +- `removed`: remove a key +- `iterator`: iterate over key–value pairs +- `keysIterator`: iterate over keys +- `valuesIterator`: iterate over values +- `===`: type-safe equality check using `Eq` +- `hash`: compute a hash using `cats.kernel.Hash` +- `show`: string representation using `cats.Show` + +--- + +## `HashMap` is *showable* and *comparable* so you can call `show` or `===` on it. + +--- + +## Example usage + +Start by creating an empty HashMap: ```scala mdoc -import cats._, cats.implicits._, cats.collections._, cats.collections.syntax.all._ +import cats._ +import cats.implicits._ +import cats.collections._ -// Create an empty HashMap -val emptyScores = HashMap.empty[String, Int] -println(emptyScores) //HashMap() +val hm = HashMap.empty[Int, String] +hm.isEmpty +hm.show +``` + +Add some key-value pairs: +```scala mdoc +val hm2 = hm.updated(1, "One").updated(2, "Two") +hm2.show ``` -### b. Add entries + +You can check for existence and get values: + ```scala mdoc -val scores = HashMap("Alice" -> 95, "Bob" -> 88) -println(scores.size) // 2 -println(emptyScores ++ scores) //HashMap(Bob -> 88, Alice -> 95) +hm2.contains(1) +hm2.contains(3) + +hm2.get(1) +hm2.getOrElse(3, "Unknown") ``` -### c. Update value +If we remove an element, we get a new map: + ```scala mdoc -val updateBobScore = scores.updated("Bob", 70) -println(updateBobScore.get("Bob")) // Some(70) -println(updateBobScore) //HashMap(Alice -> 95, Bob -> 70) +val hm3 = hm2.removed(1) +hm3.show ``` -### d. Remove an entry +Building a map directly: + ```scala mdoc -val withoutBob = scores - "Bob" -println(withoutBob.size) // 1 -println(withoutBob.contains("Bob")) //false +val hm4 = HashMap(1 -> "A", 2 -> "B", 3 -> "C") +hm4.show +``` + +Creating from a collection: + +```scala mdoc:nest +val seqMap = HashMap.fromSeq(Seq(10 -> "X", 20 -> "Y", 30 -> "Z")) +seqMap.contains(20) +seqMap.get(30) +``` + +Using Cats abstractions: + +```scala mdoc:nest +val doubled = seqMap.unorderedTraverse(v => Option(v + v)) +doubled.map(_.show) ``` -Every operation on an immutable HashMap creates a new instance — the original map (scores) is never changed. +--- -## Performance Characteristics +## Internal Visualization -- Fast operations: Lookups, inserts, updates, and deletes are all very quick. +Consider inserting keys `1`, `2`, and `33` into an empty `HashMap`. -- Predictable speed: Performance stays consistent as your data grows. +Their (simplified) hash codes might look like this (in binary): -- Low memory use: Only stores what’s needed and shares unchanged parts when you make a new version. +``` +1 => 00001 00000 ... +2 => 00010 00000 ... +33 => 00001 00001 ... +``` +- The **first 5 bits** decide the *branch position* at the root level. +- Keys `1` and `33` share the prefix `00001`, so they go into the same branch. +- Within that branch, the next 5 bits are compared: + - `1` continues at sub-index `00000` + - `33` continues at sub-index `00001` +The structure becomes: + +``` +Root + ├── [00001] → Node + │ ├── [00000] = (1 -> "One") + │ └── [00001] = (33 -> "Thirty-Three") + └── [00010] = (2 -> "Two") +``` + +If `1` and `33` had identical hashes, they’d be stored together in a `CollisionNode`: +``` +CollisionNode(hash=..., values=[(1 -> "One"), (33 -> "Thirty-Three")]) +``` + +This structure allows **fast lookups** by traversing at most one path determined by hash segments. + +--- + +## Example of Equality and Hashing + +```scala mdoc +import cats.kernel.instances.int._ +import cats.kernel.instances.string._ + +val a = HashMap(1 -> "A", 2 -> "B") +val b = HashMap(1 -> "A", 2 -> "B") + +a === b // true + +a.hash == b.hash // consistent hashing +``` -## When to Use HashMap +--- -Prefer `HashMap` over Scala’s standard immutable `Map` when you: +## Summary -- Work in a **purely functional** codebase (e.g., with Cats, ZIO, or fs2) -- Need **frequent updates** without sacrificing performance -- Value **predictable memory usage** and **thread safety** via immutability -- Build interpreters, caches, or stateful pipelines that rely on persistent data +| Feature | Description | +|----------|--------------| +| **Type** | Immutable HashMap | +| **Hashing** | Uses `cats.kernel.Hash` | +| **Implementation** | CHAMP (Compressed Hash-Array Mapped Prefix Tree) | +| **Handles** | Key collisions using `CollisionNode` | +| **Typeclasses** | `Eq`, `Hash`, `Show`, `UnorderedTraverse`, `CommutativeMonoid` | +| **Complexity** | O(1) average lookup/update | +| **Immutable** | Yes — all updates return a new structure | ## References From 3b37c69e2597699ddcde4a27062942007958c429 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 23:40:10 +0000 Subject: [PATCH 10/15] Enhance HashSet documentation with detailed explanations Expanded the documentation for HashSet, detailing its implementation, performance characteristics, and usage examples. --- docs/hashset.md | 205 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 159 insertions(+), 46 deletions(-) diff --git a/docs/hashset.md b/docs/hashset.md index 0ab1f570..f5acaa69 100644 --- a/docs/hashset.md +++ b/docs/hashset.md @@ -1,84 +1,197 @@ # HashSet -`HashSet` is an **immutable**, high-performance set implementation in `cats.collections`. +`HashSet` is an immutable hash set implemented with the **CHAMP** (Compressed Hash-Array Mapped Prefix-tree) encoding. +It stores an unordered collection of unique elements of type `A` and relies on a `cats.kernel.Hash[A]` instance for hashing and typeclass-aware equality. +This Cats Collections implementation is derived from Scala’s immutable `HashSet` and adapted to integrate with Cats typeclasses. -HashSet uses `cats.kernel.Hash` to hash its elements and is built on the **CHAMP** data structure. -This gives it fast lookups, efficient updates, and low memory use without any mutation +The CHAMP trie splits 32-bit hashes into 5-bit partitions at successive depths, using compact bitmaps and small arrays to represent node contents. When many elements map to the same 5-bit segment (or to the same 32-bit hash), the implementation uses sub-nodes or a collision node to preserve correctness. +--- -## How CHAMP Powers HashSet +## Best and Worst Case Analysis -**CHAMP** (Compressed Hash-Array Mapped Prefix-tree) is a modern trie-based design optimized for immutable hash tables. CHAMP uses: +- **Best case (well distributed hashes)** + Most operations — lookup (`contains`), insertion (`add`), deletion (`remove`) — run in expected **O(1)** time on average. -- **Bitmap-compressed nodes** to track occupied slots, reducing memory waste -- **5-bit hash chunks** to navigate a 32-ary trie (log₃₂ depth) which keeps the tree shallow for fast lookups and updates -- **Structural sharing** to reuse unchanged subtrees during updates, saving memory and time -- **Cache-friendly layouts** store data close together, making access faster +- **Worst case (heavy hash collisions or adversarial hashes)** + Many elements colliding to the same 32-bit hash lead to a `CollisionNode` collection; operations can degrade toward **O(n)** for the colliding elements. +Memory usage is efficient for broad distributions due to compact bitmaps and structural sharing; however collision nodes and copying on updates can increase memory footprint in degenerate cases. +--- -## Usage -`HashSet[A]` holds unique values. No duplicates. No order. Immutable. +## Supported Operations +- `empty` — create an empty `HashSet`. +- `apply(as: A*)` / `fromSeq` / `fromIterableOnce` / `fromFoldable` — construct a `HashSet` from collections. +- `iterator` — one-time iterator over elements. +- `size` — number of elements. +- `isEmpty` / `nonEmpty` — emptiness checks. +- `foreach(f)` — iterate for side effects. +- `contains(value)` — membership test. +- `add(value)` / `+` — return a new set with `value` added. +- `remove(value)` / `-` — return a new set with `value` removed. +- `union(set)` / `union(iterable)` — union of sets. +- `diff(set)` / `diff(iterable)` — difference (this \ that). +- `intersect(set)` — set intersection. +- `filter(f)` / `filterNot(f)` — retain / drop elements by predicate. +- `toSet` — convert to a standard Scala `Set` wrapper (`WrappedHashSet`). +- `===(that)` — typesafe equality (uses `Eq` semantics). +- `equals`, `hashCode`, `toString` — standard JVM-style operations. +- `show` (via `Show` instance) — pretty printing using `cats.Show`. +- `improve(hash: Int)` — hash mixing helper (private utility). -### a. Create an empty set +--- + +## `HashSet` is *showable* and integrates with Cats typeclasses + +`HashSet` supports and provides instances for several Cats and Cats-Kernel typeclasses: + +- `UnorderedFoldable[HashSet]` +- `CommutativeMonoid[HashSet[A]]` (union) +- `Show[HashSet[A]]` +- `Hash[HashSet[A]]` +- `DistributiveLattice[HashSet[A]]` (join = union, meet = intersection) + +There are also concrete monoid implementations: +- `HashSetUnionMonoid[A]` — combines by union. +- `HashSetIntersectionMonoid[A]` — combines by intersection. + +--- + +## Internal structure (developer reference — Node API) + +### `HashSet.Node[A]` (abstract) + +Defines the common API for CHAMP nodes. + +- `allElementsCount`, `valueCount`, `nodeCount`, `size` +- `getValue(index)`, `getNode(index)` +- `hasNodes`, `hasValues` +- `foreach(f)`, `contains(element, hash, depth)`, `add`, `remove`, `===` +- `sizeHint`: approximation used for deletions + +### `CollisionNode[A]` + +Handles multiple elements sharing the same hash. +Contains `collisionHash`, `contents: NonEmptyVector[A]`, and implements `contains`, `add`, `remove`, `===`. + +### `BitMapNode[A]` + +Main trie node with `valueMap`, `nodeMap`, and `contents: Array[Any]`. +Provides value and node indexing, bitmap-based lookup, and efficient merges of subtrees. + +--- + +## Iterator + +`HashSet.Iterator[A]` performs depth-first traversal without recursion, using fixed-size arrays for stack and cursor management. +`hasNext` and `next()` follow standard Scala iterator semantics. + +--- + +## Companion utilities and factories + +- `improve(hash: Int)` — hash mixing utility. +- `empty[A]`, `apply[A](as: A*)`, `fromSeq`, `fromIterableOnce`, `fromFoldable` — constructors for creating `HashSet`s. + +--- + +## Typeclass instances provided (complete list) + +- `UnorderedFoldable[HashSet]` +- `CommutativeMonoid[HashSet[A]]` +- `Show[HashSet[A]]` +- `Hash[HashSet[A]]` +- `DistributiveLattice[HashSet[A]]` + +Monoid variants: +- `HashSetUnionMonoid[A]` +- `HashSetIntersectionMonoid[A]` + +--- + +## Examples ```scala mdoc -import cats._, cats.implicits._, cats.collections._, cats.collections.syntax.all._ +import cats.collections._ +import cats.implicits._ -// Create an empty HashMap -val nofruits = HashSet.empty[String] -println(nofruits) +val s = HashSet.empty[Int] +s.isEmpty +s.size +s.show +``` + +Add and check: +```scala mdoc +val s1 = s.add(10).add(5).add(20) +s1.contains(10) +s1.show ``` -### b. Add items + +Remove elements: + ```scala mdoc -val fruits = nofruits + "apple" + "banana" -println(fruits.size) // 2 -println(fruits) //HashSet(banana, apple) +val s2 = s1.remove(10) +s2.show ``` -### c. Check if an item is in the set +Union / intersection / difference: + ```scala mdoc -println(fruits.contains("apple")) // true +val a = HashSet(1, 2, 3) +val b = HashSet(3, 4, 5) +(a.union(b)).show +(a.intersect(b)).show +(a.union(b).diff(b)).show ``` -### d. Remove an item +Filter: + ```scala mdoc -val withoutApple = fruits - "apple" -println(withoutApple.size) // 1 -println(withoutApple) //HashSet(banana) +val u = a.union(b) +u.filter(_ % 2 == 0).show +u.filterNot(_ % 2 == 0).show ``` -### e. Set Operations -```scala -val otherFruits = HashSet("cherry", "banana") -val union = fruits ++ otherFruits -println(union) // HashSet("apple", "banana", "cherry") -val intersection = fruits & otherFruits -println(intersection) // HashSet("banana") -val difference = fruits -- otherFruits -println(difference) // HashSet("apple") +Constructors: + +```scala mdoc +HashSet.fromSeq(Seq(1, 2, 3)).show +HashSet.fromIterableOnce(Vector(10, 11, 10)).show +HashSet.fromFoldable(Option(42)).show ``` -## Performance Characteristics +Iterator and conversion: -- Fast membership tests: Checking if an item is in the set takes near-constant time. -- Quick adds and removes: Adding or removing elements is efficient, thanks to structural sharing. -- Low memory footprint: Reuses unchanged parts of the set when you create a new version, no wasted space. -. +```scala mdoc +val it = a.iterator +it.hasNext +it.next() +a.toSet +``` + +Equality and show: +```scala mdoc +val x = HashSet(1,2,3) +val y = HashSet(3,2,1) +x === y +x.hashCode +x.show +``` +--- -## When to Use HashSet +## Summary -Prefer `HashSet` over Scala’s standard immutable `Set` when you: +- `HashSet` is an immutable, CHAMP-based set implementation optimized for fast, type-safe hashing. +- It supports full set algebra and integrates with Cats typeclasses. +- Efficient, persistent structure with compact memory layout and safe updates. -- Work in a **purely functional** codebase (e.g., with Cats, ZIO, or fs2) -- Need **frequent updates** without sacrificing performance -- Value **predictable memory usage** and **thread safety** via immutability -- Build interpreters, caches, or stateful pipelines that rely on persistent data ## References From 034ebf17ed5f62a4bcb3d5ed82cc920a56af9f39 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 23:46:06 +0000 Subject: [PATCH 11/15] Update hashmap.md --- docs/hashmap.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/docs/hashmap.md b/docs/hashmap.md index a2dcfc04..548dfd8a 100644 --- a/docs/hashmap.md +++ b/docs/hashmap.md @@ -191,14 +191,3 @@ a.hash == b.hash // consistent hashing | **Complexity** | O(1) average lookup/update | | **Immutable** | Yes — all updates return a new structure | - -## References - -- Steindorfer, M. J. (2019). - **[Efficient Immutable Collections](https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf)**. - PhD Thesis, Vrije Universiteit Amsterdam. - -- **Cats Collections** – https://typelevel.org/cats-collections/ - -- **ptimizing Hash-Array Mapped Tries for -Fast and Lean Immutable JVM Collections** - https://michael.steindorfer.name/publications/oopsla15.pdf?utm_source From a5a11402cc81fb99a632ee9d7040bc3ad835e6a8 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 23:46:46 +0000 Subject: [PATCH 12/15] Update hashset.md Removed references section from HashSet documentation. --- docs/hashset.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/docs/hashset.md b/docs/hashset.md index f5acaa69..901dc528 100644 --- a/docs/hashset.md +++ b/docs/hashset.md @@ -192,14 +192,3 @@ x.show - It supports full set algebra and integrates with Cats typeclasses. - Efficient, persistent structure with compact memory layout and safe updates. - - -## References - -- Steindorfer, M. J. (2019). - **[Efficient Immutable Collections](https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf)**. - PhD Thesis, Vrije Universiteit Amsterdam. - -- **Cats Collections** – https://typelevel.org/cats-collections/ - - **HashSet | Piotr Kosmowski** - https://kospiotr.github.io/docs/notes/development/data_structures/hash_set/?utm_source From c0883eb7dbc9ec3c3db20c119aacc1c566673c97 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 23:55:17 +0000 Subject: [PATCH 13/15] Delete hashmap.md --- hashmap.md | 192 ----------------------------------------------------- 1 file changed, 192 deletions(-) delete mode 100644 hashmap.md diff --git a/hashmap.md b/hashmap.md deleted file mode 100644 index a9a0d3a4..00000000 --- a/hashmap.md +++ /dev/null @@ -1,192 +0,0 @@ -# HashMap - -`HashMap` is an immutable hash map using [`cats.kernel.Hash`](https://typelevel.org/cats/api/cats/kernel/Hash.html) for hashing. -It is implemented using the **CHAMP encoding** (Compressed Hash-Array Mapped Prefix Tree). - -CHAMP is an efficient persistent data structure that combines **bit-mapped indexing** and **structural sharing**, providing high performance for functional programming. - ---- - -## Internal Representation - -The structure of `HashMap` is based on the observation that hash codes can be viewed as **prefix trees** of bits. -Each key’s hash code is divided into 5-bit segments (for 32 possible branches). -Each segment determines which path to take down the tree. - -For example, a 32-bit hash is broken into up to **7 segments** (since 7 × 5 bits = 35 bits, covering all bits of the hash). - -Each node in this tree (a **BitMapNode**) stores: -- A *bitmap* indicating positions of key-value pairs. -- Another *bitmap* indicating positions of child nodes. - -When two keys share the same hash, they are stored in a **CollisionNode**, which simply holds all colliding key-value pairs. - -This design ensures: -- Efficient updates and lookups (`O(1)` average) -- Memory efficiency through **bitmaps** -- Full immutability and **structural sharing** - ---- - -## Best and Worst Case Analysis - -| Case | Description | Space | Time Complexity | -|------|--------------|--------|-----------------| -| **Best Case** | Hashes are uniformly distributed (no collisions). The structure is shallow and branchless. | O(n) | O(1) average for lookup, insert, remove | -| **Worst Case** | All keys share identical hash codes. Stored as a single `CollisionNode`. | O(n) | O(n) per lookup due to linear scan within collisions | - -In practice, using a good `Hash` instance (such as `Hash.fromUniversalHashCode`) avoids pathological cases. - ---- - -## Supported Operations - -- `empty`: create an empty map -- `apply`: create a map from key–value pairs -- `fromSeq`: build from a Scala sequence -- `fromIterableOnce`: build from any iterable collection -- `fromFoldable`: build from a Cats Foldable -- `contains`: test whether a key exists -- `get`: get the value associated with a key -- `getOrElse`: get the value or return a default -- `updated`: add or update a key–value pair -- `removed`: remove a key -- `iterator`: iterate over key–value pairs -- `keysIterator`: iterate over keys -- `valuesIterator`: iterate over values -- `===`: type-safe equality check using `Eq` -- `hash`: compute a hash using `cats.kernel.Hash` -- `show`: string representation using `cats.Show` - ---- - -## `HashMap` is *showable* and *comparable* so you can call `show` or `===` on it. - ---- - -## Example usage - -Start by creating an empty HashMap: - -```scala mdoc -import cats._ -import cats.implicits._ -import cats.collections._ - -val hm = HashMap.empty[Int, String] -hm.isEmpty -hm.show -``` - -Add some key-value pairs: - -```scala mdoc -val hm2 = hm.updated(1, "One").updated(2, "Two") -hm2.show -``` - -You can check for existence and get values: - -```scala mdoc -hm2.contains(1) -hm2.contains(3) - -hm2.get(1) -hm2.getOrElse(3, "Unknown") -``` - -If we remove an element, we get a new map: - -```scala mdoc -val hm3 = hm2.removed(1) -hm3.show -``` - -Building a map directly: - -```scala mdoc -val hm4 = HashMap(1 -> "A", 2 -> "B", 3 -> "C") -hm4.show -``` - -Creating from a collection: - -```scala mdoc:nest -val seqMap = HashMap.fromSeq(Seq(10 -> "X", 20 -> "Y", 30 -> "Z")) -seqMap.contains(20) -seqMap.get(30) -``` - -Using Cats abstractions: - -```scala mdoc:nest -val doubled = seqMap.unorderedTraverse(v => Option(v + v)) -doubled.map(_.show) -``` - ---- - -## Internal Visualization - -Consider inserting keys `1`, `2`, and `33` into an empty `HashMap`. - -Their (simplified) hash codes might look like this (in binary): - -``` -1 => 00001 00000 ... -2 => 00010 00000 ... -33 => 00001 00001 ... -``` - -- The **first 5 bits** decide the *branch position* at the root level. -- Keys `1` and `33` share the prefix `00001`, so they go into the same branch. -- Within that branch, the next 5 bits are compared: - - `1` continues at sub-index `00000` - - `33` continues at sub-index `00001` - -The structure becomes: - -``` -Root - ├── [00001] → Node - │ ├── [00000] = (1 -> "One") - │ └── [00001] = (33 -> "Thirty-Three") - └── [00010] = (2 -> "Two") -``` - -If `1` and `33` had identical hashes, they’d be stored together in a `CollisionNode`: -``` -CollisionNode(hash=..., values=[(1 -> "One"), (33 -> "Thirty-Three")]) -``` - -This structure allows **fast lookups** by traversing at most one path determined by hash segments. - ---- - -## Example of Equality and Hashing - -```scala mdoc -import cats.kernel.instances.int._ -import cats.kernel.instances.string._ - -val a = HashMap(1 -> "A", 2 -> "B") -val b = HashMap(1 -> "A", 2 -> "B") - -a === b // true - -a.hash == b.hash // consistent hashing -``` - ---- - -## Summary - -| Feature | Description | -|----------|--------------| -| **Type** | Immutable HashMap | -| **Hashing** | Uses `cats.kernel.Hash` | -| **Implementation** | CHAMP (Compressed Hash-Array Mapped Prefix Tree) | -| **Handles** | Key collisions using `CollisionNode` | -| **Typeclasses** | `Eq`, `Hash`, `Show`, `UnorderedTraverse`, `CommutativeMonoid` | -| **Complexity** | O(1) average lookup/update | -| **Immutable** | Yes — all updates return a new structure | From 12e6e9bcb98c89e513665e458262d90e9e2e19fa Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Mon, 27 Oct 2025 23:55:44 +0000 Subject: [PATCH 14/15] Delete hashset.md --- hashset.md | 193 ----------------------------------------------------- 1 file changed, 193 deletions(-) delete mode 100644 hashset.md diff --git a/hashset.md b/hashset.md deleted file mode 100644 index 30da505e..00000000 --- a/hashset.md +++ /dev/null @@ -1,193 +0,0 @@ -# HashSet - -`HashSet` is an immutable hash set implemented with the **CHAMP** (Compressed Hash-Array Mapped Prefix-tree) encoding. -It stores an unordered collection of unique elements of type `A` and relies on a `cats.kernel.Hash[A]` instance for hashing and typeclass-aware equality. - -This Cats Collections implementation is derived from Scala’s immutable `HashSet` and adapted to integrate with Cats typeclasses. - -The CHAMP trie splits 32-bit hashes into 5-bit partitions at successive depths, using compact bitmaps and small arrays to represent node contents. When many elements map to the same 5-bit segment (or to the same 32-bit hash), the implementation uses sub-nodes or a collision node to preserve correctness. - ---- - -## Best and Worst Case Analysis - -- **Best case (well distributed hashes)** - Most operations — lookup (`contains`), insertion (`add`), deletion (`remove`) — run in expected **O(1)** time on average. - -- **Worst case (heavy hash collisions or adversarial hashes)** - Many elements colliding to the same 32-bit hash lead to a `CollisionNode` collection; operations can degrade toward **O(n)** for the colliding elements. - -Memory usage is efficient for broad distributions due to compact bitmaps and structural sharing; however collision nodes and copying on updates can increase memory footprint in degenerate cases. - ---- - -## Supported Operations - -- `empty` — create an empty `HashSet`. -- `apply(as: A*)` / `fromSeq` / `fromIterableOnce` / `fromFoldable` — construct a `HashSet` from collections. -- `iterator` — one-time iterator over elements. -- `size` — number of elements. -- `isEmpty` / `nonEmpty` — emptiness checks. -- `foreach(f)` — iterate for side effects. -- `contains(value)` — membership test. -- `add(value)` / `+` — return a new set with `value` added. -- `remove(value)` / `-` — return a new set with `value` removed. -- `union(set)` / `union(iterable)` — union of sets. -- `diff(set)` / `diff(iterable)` — difference (this \ that). -- `intersect(set)` — set intersection. -- `filter(f)` / `filterNot(f)` — retain / drop elements by predicate. -- `toSet` — convert to a standard Scala `Set` wrapper (`WrappedHashSet`). -- `===(that)` — typesafe equality (uses `Eq` semantics). -- `equals`, `hashCode`, `toString` — standard JVM-style operations. -- `show` (via `Show` instance) — pretty printing using `cats.Show`. -- `improve(hash: Int)` — hash mixing helper (private utility). - ---- - -## `HashSet` is *showable* and integrates with Cats typeclasses - -`HashSet` supports and provides instances for several Cats and Cats-Kernel typeclasses: - -- `UnorderedFoldable[HashSet]` -- `CommutativeMonoid[HashSet[A]]` (union) -- `Show[HashSet[A]]` -- `Hash[HashSet[A]]` -- `DistributiveLattice[HashSet[A]]` (join = union, meet = intersection) - -There are also concrete monoid implementations: -- `HashSetUnionMonoid[A]` — combines by union. -- `HashSetIntersectionMonoid[A]` — combines by intersection. - ---- - -## Internal structure (developer reference — Node API) - -### `HashSet.Node[A]` (abstract) - -Defines the common API for CHAMP nodes. - -- `allElementsCount`, `valueCount`, `nodeCount`, `size` -- `getValue(index)`, `getNode(index)` -- `hasNodes`, `hasValues` -- `foreach(f)`, `contains(element, hash, depth)`, `add`, `remove`, `===` -- `sizeHint`: approximation used for deletions - -### `CollisionNode[A]` - -Handles multiple elements sharing the same hash. -Contains `collisionHash`, `contents: NonEmptyVector[A]`, and implements `contains`, `add`, `remove`, `===`. - -### `BitMapNode[A]` - -Main trie node with `valueMap`, `nodeMap`, and `contents: Array[Any]`. -Provides value and node indexing, bitmap-based lookup, and efficient merges of subtrees. - ---- - -## Iterator - -`HashSet.Iterator[A]` performs depth-first traversal without recursion, using fixed-size arrays for stack and cursor management. -`hasNext` and `next()` follow standard Scala iterator semantics. - ---- - -## Companion utilities and factories - -- `improve(hash: Int)` — hash mixing utility. -- `empty[A]`, `apply[A](as: A*)`, `fromSeq`, `fromIterableOnce`, `fromFoldable` — constructors for creating `HashSet`s. - ---- - -## Typeclass instances provided (complete list) - -- `UnorderedFoldable[HashSet]` -- `CommutativeMonoid[HashSet[A]]` -- `Show[HashSet[A]]` -- `Hash[HashSet[A]]` -- `DistributiveLattice[HashSet[A]]` - -Monoid variants: -- `HashSetUnionMonoid[A]` -- `HashSetIntersectionMonoid[A]` - ---- - -## Examples - -```scala mdoc -import cats.collections._ -import cats.implicits._ - -val s = HashSet.empty[Int] -s.isEmpty -s.size -s.show -``` - -Add and check: - -```scala mdoc -val s1 = s.add(10).add(5).add(20) -s1.contains(10) -s1.show -``` - -Remove elements: - -```scala mdoc -val s2 = s1.remove(10) -s2.show -``` - -Union / intersection / difference: - -```scala mdoc -val a = HashSet(1, 2, 3) -val b = HashSet(3, 4, 5) -(a.union(b)).show -(a.intersect(b)).show -(a.union(b).diff(b)).show -``` - -Filter: - -```scala mdoc -val u = a.union(b) -u.filter(_ % 2 == 0).show -u.filterNot(_ % 2 == 0).show -``` - -Constructors: - -```scala mdoc -HashSet.fromSeq(Seq(1, 2, 3)).show -HashSet.fromIterableOnce(Vector(10, 11, 10)).show -HashSet.fromFoldable(Option(42)).show -``` - -Iterator and conversion: - -```scala mdoc -val it = a.iterator -it.hasNext -it.next() -a.toSet -``` - -Equality and show: - -```scala mdoc -val x = HashSet(1,2,3) -val y = HashSet(3,2,1) -x === y -x.hashCode -x.show -``` - ---- - -## Summary - -- `HashSet` is an immutable, CHAMP-based set implementation optimized for fast, type-safe hashing. -- It supports full set algebra and integrates with Cats typeclasses. -- Efficient, persistent structure with compact memory layout and safe updates. From b69183e51e0e5b902e2c593849e087c876098be6 Mon Sep 17 00:00:00 2001 From: sbarhin <102897404+sbarhin@users.noreply.github.com> Date: Tue, 28 Oct 2025 11:54:11 +0000 Subject: [PATCH 15/15] Add hashmap.md and hashset.md to directory.conf --- docs/directory.conf | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/directory.conf b/docs/directory.conf index d38f436e..bb91c683 100644 --- a/docs/directory.conf +++ b/docs/directory.conf @@ -6,7 +6,9 @@ laika.navigationOrder = [ diet.md discrete.md disjointsets.md + hashmap.md + hashset.md predicate.md range.md set.md -] \ No newline at end of file +]