Skip to content

1 ‐ Video Basics

Alexander Refsum Jensenius edited this page Jan 10, 2024 · 1 revision

Before moving on with introducing the specifics of the MG Toolbox, we will give a quick introduction to video files and formats.

For researchers interested in studying humans and human motion, a regular video recording is often the easiest, fastest, and cheapest " motion capture" solution. Nowadays, everyone can access reasonably high-quality video cameras, even on their mobile phones. The cost of professional-quality video cameras is also within reach for many researchers.

One of the positive things about video analysis is that it opens a broad range of analysis techniques, from purely qualitative to purely quantitative methods. For example, having a video recording that can be played back multiple times and at various speeds is very useful for visual inspection. And, as we shall see later, even a regular video recording can be used to extract meaningful quantitative motion data. Furthermore, it is common to use video recordings as a reference with sensor-based motion-tracking technologies. In such cases, the video recording can help qualitatively interpret numerical results.

What is a video file?

Let us start by understanding more about what a video file contains. Find a video file on your computer, for example, the file dance.avi from the MGT example folder. On most systems (Linux, Mac, Windows), you should see some basic information about the video content by selecting something like "properties" from the file inspector.

  • What are the dimensions?
  • What is the framerate?
  • What type of compression is used?

The important thing to understand here is that a digital video file can be seen as a series of still images. This allows various types of mathematical operations to be done on the file.

This is a typical file dialogue with information about a file:

Video file information

Here, we can see the dimensions (640 pixels wide, 480 pixels tall) and framerate (30 frames per second).

Compression

The video file's properties above show that the video stream has been compressed with the H.264 standard. This is the most common video compression codec these days. It is a lossy codec, meaning it throws away lots of data when compressing the file. The H.264 standard is also a time-based compression codec, meaning that it compares frames over time and only stores the information that changes between so-called keyframes. This is an efficient way of creating good-looking videos, but it is less ideal for analytical purposes.

For analysis, we often prefer Motion JPEG (MJPEG), a format that stores a complete image for each frame. This leads to larger video files but faster and easier processing.

In MGT, we rely on the fast and versatile FFmpeg that enables us to work with basically any video encoder or container. For some functions that rely more on computer vision techniques or render text-based analysis from a video, we rely on OpenCV. Since the latter works best with MJPEG encoding and .AVI containers, your videos will automatically convert to this format if you use those functions.

Video as a stream of numbers

As this figure illustrates, a video file is just a series of matrices with numbers:

A video file is just a collection of numbers

Each pixel is stored in a standard video file with a number between 0 and 255, where 0 means black and 255 means white. Color files have four planes, while greyscale images only need one plane.

Recording video for analysis

Remember that a video recording meant for analytical purposes is quite different from a video recording shot for documentary or artistic purposes. The latter type of video is usually based on creating an aesthetically pleasing result, which often includes continuous variation in the shots through changes in the lighting, background, zooming, panning, etc. A video recording for analysis is quite the opposite: it is best to record it in a controlled studio or lab setting with as few camera changes as possible. This ensures that it is the content of the recording, that is, the human motion, which is in focus, not the motion of the camera or the environment.

Even though a controlled environment may be the best choice from a purely scientific point of view, it is possible to obtain useful recordings for analytical purposes also out in the field. This, however, requires some planning and attention to detail. Here are a few things to consider:

  • Foreground/background: Place the subject in front of a background that is as plain as possible so it is possible to discern easily between the important and non-important elements in the image. Avoid backgrounds with moving objects since these may influence the analysis.

  • Lighting: Avoid changing lights, as they will influence the final video. It may be worth recording with an infrared camera in dark locations or if the lights change rapidly (such as in a disco or club concert). Some consumer cameras have a "night mode" that serves the same purpose. Even though such recordings' visual results may be unsatisfactory, they can still work well for computer-based motion analysis.

  • Camera placement: Place the camera on a tripod, and avoid moving the camera while recording. Panning and zooming make it more challenging to analyze the content of the recordings later. If both overview images and close-ups are needed, using two (or more) cameras to capture different parts of the scene in question is better.

  • Image quality: It is always best to record at the highest possible spatial (number of pixels), temporal (frames per second), and compression (format and ratio) settings the camera allows for. However, the most important thing is to find a balance between image quality, file size, and processing time.

As mentioned, a video recording can be used as the starting point for qualitative and quantitative analysis. We will look at several possibilities, moving from more qualitative visualization methods to advanced motion capture techniques.