We have a pretty basic datastructure to store images in our library. The image
struct stores the image metadata like width, height, and number of channels. It also contains the image data stored as a floating point array. You can check it out in src/image.h
, it looks like this:
typedef struct{
int h,w,c;
float *data;
} image;
We have also provided some functions for loading and saving images. Use the function:
image im = load_image("image.jpg");
to load a new image. To save an image use:
save_image(im, "output");
which will save the image as output.jpg
. If you want to make a new image with dimensions Width x Height x Channels you can call:
image im = make_image(w,h,c);
You should also use:
free_image(im);
when you are done with an image. So it goes away. You can check out how all this is implemented in src/load_image.c
. You probably shouldn't change anything in this file. We use the stb_image
libary for the actual loading and saving of jpgs because that is, like, REALLY complicated. I think. I've never tried. Anywho....
You'll be modifying the file src/process_image.c
. We've also included a python compatability library. uwimg.py
includes the code to access your C library from python. tryit.py
has some example code you can run. We will build the library using make
. Simply run the command:
make
after you make any changes to the code. Then you can quickly test your changes by running:
./uwimg test
You can also try running the example python code to generate some images:
python tryit.py
The most basic operation we want to do is change the pixels in an image. As we talked about in class, we represent an image as a 3 dimensional tensor. We have spatial information as well as multiple channels which combine together to form a color image:
The convention is that the coordinate system starts at the top left of the image, like so:
In our data
array we store the image in CHW
format. The first pixel in data is at channel 0, row 0, column 0. The next pixel is channel 0, row 0, column 1, then channel 0, row 0, column 2, etc.
Your first task is to fill out these two functions in src/process_image.c
:
float get_pixel(image im, int x, int y, int c);
void set_pixel(image im, int x, int y, int c, float v);
get_pixel
should return the pixel value at column x
, row y
, and channel c
. set_pixel
should set the pixel to the value v
. You will need to do bounds checking to make sure the coordinates are valid for the image. set_pixel
should simply return without doing anything if you pass in invalid coordinates. For get_pixel
we will perform padding to the image. There are a number of possible padding strategies:
We will use the clamp
padding strategy. This means that if the programmer asks for a pixel at column -3, use column 0, or if they ask for column 300 and the image is only 256x256 you will use column 255 (because of zero-based indexing).
We can test out our pixel-setting code on the dog image by removing all of the red channel. See line 3-8 in tryit.py
:
# 1. Getting and setting pixels
im = load_image("data/dog.jpg")
for row in range(im.h):
for col in range(im.w):
set_pixel(im, row, col, 0, 0)
save_image(im, "figs/dog_no_red")
Then try running it. Check out our very not red dog:
Sometimes you have an image and you want to copy it! To do this we should make a new image of the same size and then fill in the data array in the new image. You could do this by getting and setting pixels, by looping over the whole array and just copying the floats (pop quiz: if the image is 256x256x3, how many total pixels are there?), or by using the built-in memory copying function memcpy
.
Fill in the function image copy_image(image im)
in src/process_image.c
with your code.
Now let's start messing with some images! People like making images grayscale. It makes them look... old? Or something? Let's do it.
Remember how humans don't see all colors equally? Here's the chart to remind you:
This actually makes a huge difference in practice. Here's a colorbar we may want to convert:
If we convert it using an equally weighted mean K = (R+G+B)/3 we get a conversion that doesn't match our perceptions of the given colors:
Instead we are going to use a weighted sum. Now, there are a few ways to do this. If we wanted the most accurate conversion it would take a fair amount of work. sRGB uses [gamma compression][1] so we would first want to convert the color to linear RGB and then calculate relative luminance.
But we don't care about being toooo accurate so we'll just do the quick and easy version instead. Video engineers use a calculation called [luma][2] to find an approximation of perceptual intensity when encoding video signal, we'll use that to convert our image to grayscale. It operates directly on the gamma compressed sRGB values that we already have! We simply perform a weighted sum:
Y' = 0.299 R' + 0.587 G' + .114 B'
Using this conversion technique we get a pretty good grayscale image! Now we can run tryit.py
to output graybar.jpg
. See lines 10-13:
# 3. Grayscale image
im = load_image("data/colorbar.png")
graybar = rgb_to_grayscale(im)
save_image(graybar, "graybar")
Implement this conversion for the function rgb_to_grayscale
. Return a new image that is the same size but only one channel containing the calculated luma values.
Now let's write a function to add a constant factor to a channel in an image. We can use this across every channel in the image to make the image brighter or darker. We could also use it to, say, shift an image to be more or less of a given color.
Fill in the code for void shift_image(image im, int c, float v);
. It should add v
to every pixel in channel c
in the image. Now we can try shifting all the channels in an image by .4
or 40%. See lines 15-20 in tryit.py
:
# 4. Shift Image
im = load_image("data/dog.jpg")
shift_image(im, 0, .4)
shift_image(im, 1, .4)
shift_image(im, 2, .4)
save_image(im, "overflow")
But wait, when we look at the resulting image overflow.jpg
we see something bad has happened! The light areas of the image went past 1 and when we saved the image back to disk it overflowed and made weird patterns:
Our image pixel values have to be bounded. Generally images are stored as byte arrays where each red, green, or blue value is an unsigned byte between 0 and 255. 0 represents none of that color light and 255 represents that primary color light turned up as much as possible.
We represent our images using floating point values between 0 and 1. However, we still have to convert between our floating point representation and the byte arrays that are stored on disk. In the example above, our pixel values got above 1 so when we converted them back to byte arrays and saved them to disk they overflowed the byte data type and went back to very small values. That's why the very bright areas of the image looped around and became dark.
We want to make sure the pixel values in the image stay between 0 and 1. Implement clamping on the image so that any value below zero gets set to zero and any value above 1 gets set to one. Fill in void clamp_image(image im);
to modify the image in-place. Then when we clamp the shifted image and save it we see much better results, see lines 22-24 in tryit.py
:
# 5. Clamp Image
clamp_image(im)
save_image(im, "fixed")
and the resulting image, fixed.jpg
:
So far we've been focussing on RGB and grayscale images. But there are other colorspaces out there too we may want to play around with. Like Hue, Saturation, and Value (HSV). We will be translating the cubical colorspace of sRGB to the cylinder of hue, saturation, and value:
Hue can be thought of as the base color of a pixel. Saturation is the intensity of the color compared to white (the least saturated color). The Value is the perception of brightness of a pixel compared to black. You can try out this demo to get a better feel for the differences between these two colorspaces. For a geometric interpretation of what this transformation:
Now, to be sure, there are lots of issues with this colorspace. But it's still fun to play around with and relatively easy to implement. The easiest component to calculate is the Value, it's just the largest of the 3 RGB components:
V = max(R,G,B)
Next we can calculate Saturation. This is a measure of how much color is in the pixel compared to neutral white/gray. Neutral colors have the same amount of each three color components, so to calculate saturation we see how far the color is from being even across each component. First we find the minimum value
m = min(R,G,B)
Then we see how far apart the min and max are:
C = V - m
and the Saturation will be the ratio between the difference and how large the max is:
S = C / V
Except if R, G, and B are all 0. Because then V would be 0 and we don't want to divide by that, so just set the saturation 0 if that's the case.
Finally, to calculate Hue we want to calculate how far around the color hexagon our target color is.
We start counting at Red. Each step to a point on the hexagon counts as 1 unit distance. The distance between points is given by the relative ratios of the secondary colors. We can use the following formula from Wikipedia:
There is no "correct" Hue if C = 0 because all of the channels are equal so the color is a shade of gray, right in the center of the cylinder. However, for now let's just set H = 0 if C = 0 because then your implementation will match mine.
Notice that we are going to have H = [0,1) and it should circle around if it gets too large or goes negative. Thus we check to see if it is negative and add one if it is. This is slightly different than other methods where H is between 0 and 6 or 0 and 360. We will store the H, S, and V components in the same image, so simply replace the R channel with H, the G channel with S, etc.
Ok, now do it all backwards in hsv_to_rgb
!
Finally, when your done we can mess with some images! In tryit.py
we convert an image to HSV, increase the saturation, then convert it back, lines 26-32:
# 6-7. Colorspace and saturation
im = load_image("data/dog.jpg")
rgb_to_hsv(im)
shift_image(im, 1, .2)
clamp_image(im)
hsv_to_rgb(im)
save_image(im, "dog_saturated")
Hey that's exciting! Play around with it a little bit, see what you can make. Note that with the above method we do get some artifacts because we are trying to increase the saturation in areas that have very little color. Instead of shifting the saturation, you could scale the saturation by some value to get smoother results!
Implement void scale_image(image im, int c, float v);
to scale a channel by a certain amount. This will give us better saturation results. Note, you will have to add the necessary lines to the header and python library, it should be very similar to what's already there for shift_image
. Now if we scale saturation by 2
instead of just shifting it all up we get much better results:
im = load_image("data/dog.jpg")
rgb_to_hsv(im)
scale_image(im, 1, 2)
clamp_image(im)
hsv_to_rgb(im)
save_image(im, "dog_scale_saturated")
Implement RGB to Hue, Chroma, Lightness, a perceptually more accurate version of Hue, Saturation, Value. Note, this will involve gamma decompression, converting to CIEXYZ, converting to CIELUV, converting to HCL, and the reverse transformations. The upside is a similar colorspace to HSV but with better perceptual properties!
We've been talking a lot about resizing and interpolation in class, now's your time to do it! To resize we'll need some interpolation methods and a function to create a new image and fill it in with our interpolation methods.
- Fill in
float nn_interpolate(image im, float x, float y, int c);
insrc/resize_image.c
- It should perform nearest neighbor interpolation. Remember to use the closest
int
, not just type-cast because in C that will truncate towards zero.
- It should perform nearest neighbor interpolation. Remember to use the closest
- Fill in
image nn_resize(image im, int w, int h);
. It should:- Create a new image that is
w x h
and the same number of channels asim
- Loop over the pixels and map back to the old coordinates
- Use nearest-neighbor interpolate to fill in the image
- Create a new image that is
Now you should be able to run the following python
command:
from uwimg import *
im = load_image("data/dogsmall.jpg")
a = nn_resize(im, im.w*4, im.h*4)
save_image(a, "dog4x-nn")
Your image should look something like:
Finally, fill in the similar functions bilinear_interpolate
and bilinear_resize
to perform bilinear interpolation. Try it out again in python
:
from uwimg import *
im = load_image("data/dogsmall.jpg")
a = bilinear_resize(im, im.w*4, im.h*4)
save_image(a, "dog4x-bl")
These functions will work fine for small changes in size, but when we try to make our image smaller, say a thumbnail, we get very noisy results:
from uwimg import *
im = load_image("data/dog.jpg")
a = nn_resize(im, im.w//7, im.h//7)
save_image(a, "dog7th-bl")
As we discussed, we need to filter before we do this extreme resize operation!
We'll start out by filtering the image with a box filter. There are very fast ways of performing this operation but instead, we'll do the naive thing and implement it as a convolution because it will generalize to other filters as well!
Ok, bear with me. We want to create a box filter, which as discussed in class looks like this:
One way to do this is make an image, fill it in with all 1s, and then normalize it. That's what we'll do because the normalization function may be useful in the future!
First fill in void l1_normalize(image im)
. This should normalize an image to sum to 1.
Next fill in image make_box_filter(int w)
. We will only use square box filters so just make your filter w x w
. It should be a square image with one channel with uniform entries that sum to 1.
Now it's time to fill in image convolve_image(image im, image filter, int preserve)
. For this function we have a few scenarios. With normal convolutions we do a weighted sum over an area of the image. With multiple channels in the input image there are a few possible cases we want to handle:
- If
filter
andim
have the same number of channels then it's just a normal convolution. We sum over spatial and channel dimensions and produce a 1 channel image. UNLESS: - If
preserve
is set to 1 we should produce an image with the same number of channels as the input. This is useful if, for example, we want to run a box filter over an RGB image and get out an RGB image. This means each channel in the image will be filtered by the corresponding channel in the filter. UNLESS: - If the
filter
only has one channel butim
has multiple channels we want to apply the filter to each of those channels. Then we either sum between channels or not depending on ifpreserve
is set.
Also, filter
better have either the same number of channels as im
or have 1 channel. I check this with an assert
.
We are calling this a convolution but you don't need to flip the filter or anything (we're actually doing a cross-correlation). Just apply it to the image as we discussed in class:
Once you are done, test out your convolution by filtering our image! We need to use preserve
because we want to produce an image that is still RGB.
from uwimg import *
im = load_image("data/dog.jpg")
f = make_box_filter(7)
blur = convolve_image(im, f, 1)
save_image(blur, "dog-box7")
We'll get some output that looks like this:
Now we can use this to perform our thumbnail operation:
from uwimg import *
im = load_image("data/dog.jpg")
f = make_box_filter(7)
blur = convolve_image(im, f, 1)
thumb = nn_resize(blur, blur.w//7, blur.h//7)
save_image(thumb, "dogthumb")
Look at how much better our new resized thumbnail is!
Resize | Blur and Resize |
---|---|
Fill in the functions image make_highpass_filter()
, image make_sharpen_filter()
, and image make_emboss_filter()
to return the example kernels we covered in class. Try them out on some images! After you have, answer Question 2.2.1 and 2.2.2 in the source file (put your answer just right there)
Highpass | Sharpen | Emboss |
---|---|---|
Implement image make_gaussian_filter(float sigma)
which will take a standard deviation value and return a filter that smooths using a gaussian with that sigma. How big should the filter be, you ask? 99% of the probability mass for a gaussian is within +/- 3 standard deviations so make the kernel be 6 times the size of sigma. But also we want an odd number, so make it be the next highest odd integer from 6x sigma.
We need to fill in our kernel with some values. Use the probability density function for a 2d gaussian:
Technically this isn't perfect, what we would really want to do is integrate over the area covered by each cell in the filter. But that's much more complicated and this is a decent estimate. Remember though, this is a blurring filter so we want all the weights to sum to 1. If only we had a function for that....
Now you should be able to try out your new blurring function! It should have much less noise than the box filter:
from uwimg import *
im = load_image("data/dog.jpg")
f = make_gaussian_filter(2)
blur = convolve_image(im, f, 1)
save_image(blur, "dog-gauss2")
Gaussian filters are cool because they are a true low-pass filter for the image. This means when we run them on an image we only get the low-frequency changes in an image like color. Conversely, we can subtract this low-frequency information from the original image to get the high frequency information!
Using this frequency separation we can do some pretty neat stuff. For example, check out this tutorial on retouching skin in Photoshop (but only if you want to).
We can also make really trippy images that look different depending on if you are close or far away from them. That's what we'll be doing. They are hybrid images that take low frequency information from one image and high frequency info from another. Here's a picture of.... what exactly?
Small | Medium | Large |
---|---|---|
If you don't believe my resizing check out figs/marilyn-einstein.png
and view it from far away and up close. Sorta neat, right?
Your job is to produce a similar image. But instead of famous dead people we'll be using famous fictional people! In particular, we'll be exposing the secret (but totally canon) sub-plot of the Harry Potter franchise that Dumbledore is a time-traveling Ron Weasely. Don't trust me?? The images don't lie! Wake up sheeple!
Small | Large |
---|---|
For this task you'll have to extract the high frequency and low frequency from some images. You already know how to get low frequency, using your gaussian filter. To get high frequency you just subtract the low frequency data from the original image.
Fill in image add_image(image a, image b)
and image sub_image(image a, image b)
so we can perform our transformations. They should probably include some checks that the images are the same size and such. Now we should be able to run something like this:
from uwimg import *
im = load_image("data/dog.jpg")
f = make_gaussian_filter(2)
lfreq = convolve_image(im, f, 1)
hfreq = im - lfreq
reconstruct = lfreq + hfreq
save_image(lfreq, "low-frequency")
save_image(hfreq, "high-frequency")
save_image(reconstruct, "reconstruct")
Low frequency | High frequency | Reconstruction |
---|---|---|
Note, the high-frequency image overflows when we save it to disk? Is this a problem for us? Why or why not?
Use these functions to recreate your own Ronbledore image. You will need to tune your standard deviations for the gaussians you use. You will probably need different values for each image to get it to look good.
The Sobel filter is cool because we can estimate the gradients and direction of those gradients in an image. They should be straightforward now that you all are such pros at image filtering.
First implement the functions to make our sobel filters. They are for estimating the gradient in the x and y direction:
Gx | Gy |
---|---|
To visualize our sobel operator we'll want another normalization strategy, feature normalization. This strategy is simple, we just want to scale the image so all values lie between [0-1]. In particular we will be rescaling the image by subtracting the minimum from all values and dividing by the range of the data. If the range is zero you should just set the whole image to 0 (don't divide by 0 that's bad).
Fill in the function image *sobel_image(image im)
. It should return two images, the gradient magnitude and direction. The strategy can be found here. We can visualize our magnitude using our normalization function:
from uwimg import *
im = load_image("data/dog.jpg")
res = sobel_image(im)
mag = res[0]
feature_normalize(mag)
save_image(mag, "magnitude")
Which results in:
Now using your sobel filter try to make a cool, stylized one. Fill in the function image colorize_sobel(image im)
. I used the magnitude to specify the saturation and value of an image and the angle to specify the hue but you can do whatever you want (as long as it looks cool). I also used some smoothing:
Turn in your resize_image.c
, filter_image.c
, ronbledore.jpg
and sobel.jpg
on canvas under Assignment 1.