This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.
▶ Text on GitHub with an CC-BY-NC-ND license
▶ Code on GitHub with an MIT license
Chapter 11 : Image and Audio Processing
In the previous chapter, we covered signal processing techniques for one-dimensional, time-dependent signals. In this chapter, we will see signal processing techniques for images and sounds.
Generic signal processing techniques can be applied to images and sounds, but many image or audio processing tasks require specialized algorithms. For example, we will see algorithms for segmenting images, detecting points of interest in an image, or detecting faces. We will also hear the effect of linear filters on speech sounds.
scikit-image is one of the main image processing packages in Python. We will use it in most of the image processing recipes in this chapter. For more on scikit-image, refer to http://scikit-image.org.
We will also use OpenCV (http://opencv.org), a computer vision library in C++ that has a Python wrapper.
In this introduction, we will discuss the particularities of images and sounds from a signal processing point of view.
A grayscale image is a bidimensional signal represented by a function,
On a computer, images are digitally sampled. The intensities are not real values, but integers or floating point numbers. On one hand, the mathematical formulation of continuous functions allows us to apply analytical tools such as derivatives and integrals. On the other hand, we need to take into account the digital nature of the images we deal with.
From a signal processing perspective, a sound is a time-dependent signal that has sufficient power in the hearing frequency range (about 20 Hz to 20 kHz). Then, according to the Nyquist-Shannon theorem (introduced in Chapter 10, Signal Processing), the sampling rate of a digital sound signal needs to be at least 40 kHz. A sampling rate of 44100 Hz is frequently chosen.
Here are a few references:
- Image processing on Wikipedia, available at https://en.wikipedia.org/wiki/Image_processing
- Numerical Tours, advanced image processing algorithms available at http://www.numerical-tours.com/python/
- Audio signal processing on Wikipedia, available at https://en.wikipedia.org/wiki/Audio_signal_processing
- Particularities of the 44100 Hz sampling rate explained at https://en.wikipedia.org/wiki/44,100_Hz