Name		Name	Last commit message	Last commit date
parent directory ..
01_exposure_files		01_exposure_files
02_filters_files		02_filters_files
03_segmentation_files		03_segmentation_files
04_interest_files		04_interest_files
05_faces_files		05_faces_files
06_speech_files		06_speech_files
07_synth_files		07_synth_files
00_intro.md		00_intro.md
01_exposure.md		01_exposure.md
02_filters.md		02_filters.md
03_segmentation.md		03_segmentation.md
04_interest.md		04_interest.md
05_faces.md		05_faces.md
06_speech.md		06_speech.md
07_synth.md		07_synth.md
README.md		README.md

README.md

Chapter 11 : Image and Audio Processing

In this chapter, we will cover the following topics:

In the previous chapter, we covered signal processing techniques for one-dimensional, time-dependent signals. In this chapter, we will see signal processing techniques for images and sounds.

Generic signal processing techniques can be applied to images and sounds, but many image or audio processing tasks require specialized algorithms. For example, we will see algorithms for segmenting images, detecting points of interest in an image, or detecting faces. We will also hear the effect of linear filters on speech sounds.

scikit-image is one of the main image processing packages in Python. We will use it in most of the image processing recipes in this chapter. For more on scikit-image, refer to http://scikit-image.org.

We will also use OpenCV (http://opencv.org), a computer vision library in C++ that has a Python wrapper.

In this introduction, we will discuss the particularities of images and sounds from a signal processing point of view.

Images

A grayscale image is a bidimensional signal represented by a function, $f$, that maps each pixel to an intensity. For example, the intensity could be a real value between 0 (dark) and 1 (light). In a colored image, this function maps each pixel to a triplet of intensities, generally, the red, green, and blue (RGB) components.

On a computer, images are digitally sampled. The intensities are not real values, but integers or floating point numbers. On one hand, the mathematical formulation of continuous functions allows us to apply analytical tools such as derivatives and integrals. On the other hand, we need to take into account the digital nature of the images we deal with.

Sounds

From a signal processing perspective, a sound is a time-dependent signal that has sufficient power in the hearing frequency range (about 20 Hz to 20 kHz). Then, according to the Nyquist-Shannon theorem (introduced in Chapter 10, Signal Processing), the sampling rate of a digital sound signal needs to be at least 40 kHz. A sampling rate of 44100 Hz is frequently chosen.

References

Here are a few references:

Image processing on Wikipedia, available at https://en.wikipedia.org/wiki/Image_processing
Numerical Tours, advanced image processing algorithms available at http://www.numerical-tours.com/python/
Audio signal processing on Wikipedia, available at https://en.wikipedia.org/wiki/Audio_signal_processing
Particularities of the 44100 Hz sampling rate explained at https://en.wikipedia.org/wiki/44,100_Hz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapter11_image

chapter11_image

README.md

Chapter 11 : Image and Audio Processing

Images

Sounds

References

Files

chapter11_image

Directory actions

More options

Directory actions

More options

Latest commit

History

chapter11_image

Folders and files

parent directory

README.md

Chapter 11 : Image and Audio Processing

Images

Sounds

References