Merge pull request #2 from wbrown/kdcache

KD Cache, fixes, optimizations, cleanup
wbrown · Aug 5, 2024 · 686acf1 · 686acf1
2 parents 7c3a21e + 6c2ddfb
commit 686acf1
Show file tree

Hide file tree

Showing 11 changed files with 635 additions and 221 deletions.
diff --git a/README.md b/README.md
@@ -1,40 +1,136 @@
 img2ansi
 ========
-Image to ANSI conversion:
-
-Major features:
-* Modified Atkinson dithering
-* Edge detection
-* ANSI compression
-* Subcharacter block rendering with 2 colors per character.
-* Separate foreground and background palette for terminals that support it.
-* Color quantization
-* Maximum file size targeting
-* For IRC: line limits
 
+## Block-Based ANSI Art Dithering Algorithm (Brown Dithering Algorithm)
+
+This project implements a unique dithering algorithm specifically designed
+for converting images into ANSI art. Unlike traditional dithering methods,
+my approach uses a block-based processing technique optimized for terminal
+and text-based display.
+
+## Key Features
+
+1. **Block-Based Processing**: Operates on 2x2 pixel blocks instead of
+   individual pixels, allowing for more complex patterns within a single
+   character cell.
+
+2. **ANSI Color Quantization**: Utilizes a specialized color quantization
+   scheme tailored for the ANSI color palette, ensuring optimal color
+   representation in terminal environments.
+
+3. **Unicode Block Character Selection**: Chooses the best Unicode block
+   character to represent each 2x2 pixel block, maximizing the detail in the
+   final ANSI art.
+
+4. **Dual-Color Representation**: Each block is represented by both a
+   foreground and background color, enabling more nuanced color transitions
+   and detail.
+
+5. **Edge Detection Integration**: Incorporates edge detection to adjust
+   error distribution, preserving important image details.
+
+6. **Optimized for Text Output**: Designed to produce ANSI escape code
+   sequences, making it ideal for terminal-based image display.
+
+## How It Works
+
+The algorithm processes the input image in 2x2 blocks, determining the best
+Unicode character and color combination to represent each block. It then
+uses a modified error diffusion technique inspired by Floyd-Steinberg
+dithering to distribute quantization errors to neighboring blocks.
+
+This approach results in high-quality ANSI art that preserves the detail
+and color of the original image while optimizing for the constraints of
+text-based display.
+
+## Requirements
 Requires OpenCV 4 to be installed.
 
+## Example Output
+
+The below examples are 80 column wide images, with a scale factor of 2. The
+first example uses the default 16-color ANSI palette, while the second example
+uses the 256 color scheme.
+
+![Baboon ANSI Art - 16 colors](examples/baboon_16.png)
+![Baboon ANSI Art - JetBrains](examples/baboon_jb.png)
+![Baboon ANSI Art - 256 colors](examples/baboon_256.png)
+
+## Installation
+To build the program, run the following commands:
+
+```sh
+go get -u github.com/wbrown/ansi2img
+```
+
+## Usage
+`./img2ansi -input <input> [-output <output>] [-width <width>]
+[-scale <scale>] [-quantization <quantization>] [-maxchars <maxchars>]
+[-8bit] [-jb] [-table]`
+
+**Performance**
+
+The following performance options are available. There are tradeoffs between
+speed and quality. The defaults are chosen to be a good balance between the
+two. But if you want the absolute best quality, set the `-kdsearch` option to
+`0` and the `-cache_threshold` option to `0`. This may cause the program to
+take multiple minutes to run.
+
+* `-kdsearch <int>`: Number of nearest neighbors to search in KD-tree, `0` to
+  disable (default `50`)
+
+The KD search option is the number of nearest neighbors to search in the
+KD-tree for the block cache. A value of `0` will disable the KD-tree search
+and the cache.
+
+* `-cache_threshold <float>`: Threshold for block cache (default `40`)
+
+The block cache is a cache of the block characters that are used to render the
+image. The cache is used to speed up the program by not having to recompute
+the blocks for each 2x2 pixel block in the image. It is a fuzzy cache, so it
+is thresholded on error distance from the target block.
+
+**Colors**
+
+By default the program uses the 16-color ANSI palette, split into 8 foreground
+colors and 8 background colors. The `-8bit` option can be used to enable 256
+color mode. The `-jb` option can be used to use the JetBrains color scheme,
+which allows for separate foreground and background palettes to effectively
+double the number of colors available.
+
+The program performes well without quantization, but if you want to reduce the
+number of colors in the output, you can use the `-quantization` option. The
+default is `256` colors. This isn't the output colors, but the number of
+colors used in the quantization step.
+
+**Image Size**
+
+The `-width` option can be used to set the target width of the output image,
+this is the primary factor in determining the output ANSI dimensions. The
+default `-scale` is `2`, which approximately halves the height of the output,
+to compensate for the fact that characters are taller than they are wide.
+
 ```
-  -edge int
-    	Color difference threshold for edge detection skipping (default 100)
+  -8bit
+    	Use 8-bit ANSI colors (256 colors)
+  -cache_threshold float
+    	Threshold for block cache (default 40)
   -input string
     	Path to the input image file (required)
+  -jb
+    	Use JetBrains color scheme
+  -kdsearch int
+    	Number of nearest neighbors to search in KD-tree, 0 to disable (default 50)
   -maxchars int
     	Maximum number of characters in the output (default 1048576)
-  -maxline int
-    	Maximum number of characters in a line, 0 for no limit
   -output string
     	Path to save the output (if not specified, prints to stdout)
   -quantization int
     	Quantization factor (default 256)
   -scale float
-    	Scale factor for the output image (default 3)
-  -separate
-    	Use separate palettes for foreground and background colors
-  -shading
-    	Enable shading for more detailed output
-  -space
-    	Convert block characters to spaces
+    	Scale factor for the output image (default 2)
+  -table
+    	Print ANSI color table
   -width int
-    	Target width of the output image (default 100)
+    	Target width of the output image (default 80)
 ```
diff --git a/ansi.go b/ansi.go
@@ -7,7 +7,7 @@ import (
 
 // compressANSI compresses an ANSI image by combining adjacent blocks with
 // the same foreground and background colors. The function takes an ANSI
-// image as a string and returns the compressed ANSI image as a string.
+// image as a string and returns the more efficient ANSI image as a string.
 func compressANSI(ansiImage string) string {
 	var compressed strings.Builder
 	var currentFg, currentBg, currentBlock string
@@ -34,17 +34,21 @@ func compressANSI(ansiImage string) string {
 				fg = ""
 			}
 
+			// If any color or block changes, write the current block
+			// and start a new one
 			if fg != currentFg || bg != currentBg || block != currentBlock {
 				if count > 0 {
 					compressed.WriteString(
-						formatANSICode(currentFg, currentBg, currentBlock, count))
+						formatANSICode(
+							currentFg, currentBg, currentBlock, count))
 				}
 				currentFg, currentBg, currentBlock = fg, bg, block
 				count = 1
 			} else {
 				count++
 			}
 		}
+		// Write the last block of the line
 		if count > 0 {
 			compressed.WriteString(
 				formatANSICode(currentFg, currentBg, currentBlock, count))

diff --git a/approximatecache.go b/approximatecache.go
@@ -0,0 +1,110 @@
+package main
+
+import (
+	"math"
+)
+
+// ApproximateCache is a map of Uint256 to lookupEntry
+// that is used to store approximate matches for a given
+// block of 4 RGB values. Approximate matches are performed
+// by comparing the error of a given match to a threshold
+// value.
+//
+// The key of the map is a Uint256, which is a 256-bit
+// unsigned integer that is used to represent the foreground
+// and background colors of a block of 4 RGB values.
+//
+// There may be multiple matches for a given key, so the
+// value of the map is a lookupEntry, which is a struct
+// that contains a slice of Match structs.
+type ApproximateCache map[Uint256]lookupEntry
+
+// Match is a struct that contains the rune, foreground
+// color, background color, and error of a match. The error
+// is a float64 value that represents the difference between
+// the actual block of 4 RGB values and the pair of foreground
+// and background colors encoded in the key as an Uint256.
+type Match struct {
+	Rune  rune
+	FG    RGB
+	BG    RGB
+	Error float64
+}
+
+type lookupEntry struct {
+	Matches []Match
+}
+
+// AddEntry adds a new entry to the cache. The entry is
+// represented by a key, which is a Uint256, and a Match
+// struct that contains the rune, foreground color, background
+// color, and error of the match.
+func (cache ApproximateCache) addEntry(
+	k Uint256,
+	r rune,
+	fg RGB,
+	bg RGB,
+	block [4]RGB,
+	isEdge bool,
+) {
+	newMatch := Match{
+		Rune: r,
+		FG:   fg,
+		BG:   bg,
+		Error: calculateBlockError(
+			block,
+			getQuadrantsForRune(r),
+			fg,
+			bg,
+			isEdge,
+		),
+	}
+	if entry, exists := lookupTable[k]; exists {
+		// Create a new slice with the appended match
+		updatedMatches := append(entry.Matches, newMatch)
+		// Update the map with the new slice
+		lookupTable[k] = lookupEntry{Matches: updatedMatches}
+	} else {
+		lookupTable[k] = lookupEntry{Matches: []Match{newMatch}}
+	}
+}
+
+// GetEntry retrieves an entry from the cache. The entry is
+// represented by a key, which is a Uint256, and a block of
+// 4 RGB values. The function returns the rune, foreground
+// color, background color, and a boolean value indicating
+// whether the entry was found in the cache.
+//
+// There may be multiple matches for a given key, so the
+// function returns the match with the lowest error value.
+func (cache ApproximateCache) getEntry(
+	k Uint256,
+	block [4]RGB,
+	isEdge bool,
+) (rune, RGB, RGB, bool) {
+	baseThreshold := cacheThreshold
+	if isEdge {
+		baseThreshold *= 0.7
+	}
+	lowestError := math.MaxFloat64
+	var bestMatch *Match = nil
+	if entry, exists := lookupTable[k]; exists {
+		for _, match := range entry.Matches {
+			// Recalculate error for this match
+			matchError := calculateBlockError(block,
+				getQuadrantsForRune(match.Rune), match.FG, match.BG, isEdge)
+			if matchError < baseThreshold {
+				if matchError < lowestError {
+					lowestError = matchError
+					bestMatch = &match
+				}
+			}
+		}
+		if bestMatch != nil {
+			lookupHits++
+			return bestMatch.Rune, bestMatch.FG, bestMatch.BG, true
+		}
+	}
+	lookupMisses++
+	return 0, RGB{}, RGB{}, false
+}
diff --git a/examples/baboon_16.png b/examples/baboon_16.png
diff --git a/examples/baboon_256.png b/examples/baboon_256.png
diff --git a/examples/baboon_jb.png b/examples/baboon_jb.png
diff --git a/image.go b/image.go
@@ -82,13 +82,24 @@ func saveToPNG(img gocv.Mat, filename string) error {
 // prepareForANSI prepares an image for conversion to ANSI art. The function
 // takes an input image, the target width and height for the output image, and
 // returns the resized image and the edges detected in the image.
-func prepareForANSI(img gocv.Mat, width, height int) (resized, edges gocv.Mat) {
+//
+// It uses area interpolation for downscaling to an intermediate size, detects
+// edges on the intermediate image, and resizes both the intermediate image
+// and the edges to the final size. It also applies a very mild sharpening to
+// the resized image.
+func prepareForANSI(
+	img gocv.Mat,
+	width,
+	height int,
+) (
+	resized, edges gocv.Mat,
+) {
 	intermediate := gocv.NewMat()
 	resized = gocv.NewMat()
 	edges = gocv.NewMat()
 
 	// Use area interpolation for downscaling to an intermediate size
-	intermediateWidth := width * 4 // or another multiplier that gives results
+	intermediateWidth := width * 4
 	intermediateHeight := height * 4
 	gocv.Resize(img,
 		&intermediate,