- In this example, the cat image is 248 pixels wide, 400 pixels tall, and has three color channels Red,Green,Blue.
- Therefore, the image consists of 248 x 400 x 3 numbers, or a total of 297,600 numbers.
- Each number is an integer that ranges from 0 (black) to 255 (white).
- The task of a machine learning algorithm is to turn this quarter of a million numbers into a single label, such as “cat”.
- Viewpoint Variation: An image can be viewed from many different perspective as the orientation of an object may differ with respect to the camera.
- Deformation: The shape of an object may be deformed in many different ways.
- Illumination: The object may have variation in illumination conditions causing the pixels to change drastically.
- Occlusion: Sometimes the object may only be partially visible in the image.
- Background Clutter: The objects of interest may blend into their environment, making them hard to identify.