Hybrid Images

Introduction

Hybrid images are images that morph from one object to another depending on the viewing distance. Since high frequencies are more visible from up close than low frequencies, and the opposite is true for far away, that means we can take any two images and combine them, in a way that one will be visible up close, and the other from far away. We will be using the approach described in the SIGGRAPH 2006 paper by Oliva, Torralba, and Schyns.

Rust

Once again we are using Rust, but this time we’re using the image and imageproc libraries, as they are more full featured than Photon-rs. Included out of the box with these libraries are ways of applying 3x3 filters using filter3x3() and a built in gaussian blur via gaussian_blur_f32.

Finally, we also use the handy rustfft library, which provides native fast Fourier transforms.

Text Creation

In this project, alongside normal photographs, we will also create text hybrid images of user input. To do so, we create a function that will create a basic text image given a string:

fn draw_message(
  msg: String,
  width: u32,
  height: u32,
  x: u32,
  y: u32,
  scale: Scale,
  color: image::Rgba<u8>,
) -> DynamicImage {
    // Load font
    let font_data: &[u8] = include_bytes!("/usr/share/fonts/FuturaLT-Bold.ttf");
    let font: Font<'static> = Font::try_from_bytes(font_data).unwrap();

    //Create blank canvas
    let canvas: RgbaImage = ImageBuffer::new(width, height);
    let mut img = DynamicImage::ImageRgba8(canvas);

    //Draw text
    draw_text_mut(&mut img, color, x, y, scale, &font, &msg);
    img
}

So if we call this function and pass in “WELCOME” we will get:

And “GOODBYE”:

Creating Low and High Pass Filters

Low Pass Filter

The simplest way to create a low pass filter for an image, is to apply a Gaussian blur, as this will average the photo together, meaning that high frequencies will be either lowered or removed altogether, while low frequencies will persist.

Luckily, this function is already written for us, so we will wrap it in a simple function, where img is the image, and amt is standard deviation of the Gaussian filter:

fn low_pass(img: DynamicImage, amt: f32) -> DynamicImage {
    DynamicImage::ImageRgba8(gaussian_blur_f32(&img.to_rgba8(), amt))
}

If we apply this to our “GOODBYE” image from before, with a standard deviation of 8, we see we get a blurred image:

Why a standard deviation of 8? After some testing, I realized that anything above 8 will result in some letters that aren’t legible. For example look at the E at the end of GOODBYE. It’s starting to look like a left brace [, and the G almost looks like an O. We could use something less, but the smaller the standard deviation, the less blurred it will look, and that means more higher frequencies.

Now if we compare the Fourier transforms of the before and after, we see we’re successful:

Before:

After Blur:

We see that in the blurred FFT, the only frequencies remaining are those around the edges, which are the low frequencies. Luckily since this filter was built in, there was not much for me to fiddle with, except to determine the highest amount of blur I could use before the letters weren’t legible.

High Pass Filter

A high pass filter is a bit trickier, but luckily we can use the recommendation from the SIGGRAPH paper, and create a filter by subtracting the impulse of the image, minus the Gaussian blur. The impulse filter simply exemplifies the bright spots of an image, so high frequencies become higher, and lower frequencies are diminished. As described before, the Gaussian is a low pass filter, so by subtracting it, we are removing the low frequencies.

It’d also be nice to control the impulse filter, so we have a function called laplacian, that sets the center of the kernel to some amount, meaning we can control how much we want the “impulse” to be:

fn laplacian(amt: f32) -> [f32; 9] {
    let mut v = identity_minus_laplacian;
    v[4] *= amt;
    v
}

And then our high pass function:

fn high_pass(img: DynamicImage, amt: f32,amt2:f32) -> DynamicImage {
    // Create impulse image
    let img_impulse = filter3x3(&img, &laplacian(amt));

    // Create blurred
    let img_low = low_pass(img, amt2);
    // calculate the difference by subtracting one channel from the other
    // Impulse - Gaussian
    let diff = map_colors2(&img_impulse, &img_low, |mut p, q| {
        //Clamp keeps operations in bounds
        p.apply2(&q, |c1, c2| clamp_sub(c1, c2, u8::MAX));
        // Keep alpha at 255
        p.0[3] = 255;
        p
    });
    DynamicImage::ImageRgba8(diff)
}

If we apply it to our goodbye image with 5 as the kernel center and 8.0 as the blur we get:

We see that the edges are now more defined, while the centers of the letters are dimmer. We used 5 as the center, as that’s standard, and worked best for letters after some experimentation. If we compare the FFT of both images, we’ll see what happened:

Before:

After:

The difference is less obvious here, but we can see that it seems like the image is more grainy, the corners are dimmer, but the center is slightly brighter, which makes sense, as that means the higher frequencies are now more represented.

Overlapping images

Next, we must overlay the images on top of each other. On my first approach, I would average the images together. Although this worked, I found that when I simply added them together I’d get more of an effect that I liked. The reason behind it makes sense, when averaged together, we’d essentially get an average of the high and low frequencies, while adding them simply overlayed them. I found that averaging worked well for creating hybrid images that looked like a combination of two images, but adding them together gave more of the distance morphing effect, which I preferred. Our function is relatively simple:

fn overlay(a: DynamicImage, b: DynamicImage) -> DynamicImage {
    let diff = map_colors2(&a, &b, |mut p, q| {
        // add both channels together, and clamp it so its <= 255
        p.apply2(&q, |c1, c2| (clamp_add(c1, c2, u8::MAX)));
        // Don't touch the alpha channel!
        p.0[3] = 255;
        p
    });
    DynamicImage::ImageRgba8(diff)
}

If we overlay our two images, we should get a morphing image:

And we see if we get really close to the screen, we can almost only read “WELCOME”, which is good since we’re so close. If we’re far away, our image tells us goodbye! As if you’re far away, the high frequencies prevail, and all we can read is goodbye. Finally, if we look at the FFT of this image, we see it’s a combination of the past two (note the bright corners from the GOODBYE FFT and the diamond zigzags from the HELLO FFT):

Colors

Something of note of the previous result is that the colors are red and green, not black and white like most text. This isn’t by accident, as in my research for this project I had another thought: red is a higher frequency than green, so maybe if we use red for the close text and another color for the further, it’s results should be enhanced:

Enhanced colors:

And we see that the welcome disappears much quicker, and the goodbye is much more visible. This means that we might be able to get away with blurring the GOODBYE a bit more, meaning when we’re closer it should disappear faster:

This definitely has the most “motion”, or transforms the most out of all we’ve seen, but the downside is that the GOODBYE makes you feel like you need glasses.

And finally, we can see the same image but black and white:

Three images

Finally, I wanted to see if I could combine three images, and the results were somewhat successful. Here is the result without any prompt, see if you can tell what the three phrases are:

Color:

Black and White:

Far away it should read 123, closer ABC, and really close should be DEF. I found that three images was trickier, as you had to blur the far away one more, and the closest one had to have the high pass filter run enough that it was a bit dark. It seems to work to me, the main drawback being that the transitions aren’t crisp, we can see the 123 easily when at medium distance, and the 123 and ABC while very close, meaning that it’s not super clear. The strategy for this was to simply run the high pass filter even higher for the “nearest” image, as a combination of the two didn’t work well, being visible no matter what instead of only at a certain time. I still do like the effect that there are three phrases to discover instead of two though. It feels like the third phrase is hidden, unless you look closely, while the other two are obvious.

Finally, if we look at the FFT of this image, we see we were somewhat successful:

123 after filter: