Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.

INTRO: Open Visual Cloud Building Blocks

xwu2 edited this page Dec 19, 2019 · 1 revision

The four core building blocks used to build a visual cloud service include: Encode, Decode, Inference, and Render. Each of these building blocks represents the underlying technology of the processes that make up a visual cloud service pipeline.

Developers can use the four basic building blocks and arrange them into a variety of pipelines for different services. For example, a simple transcode service is realized with decode + encode core building blocks. Insertion of an inference building block (decode + inference + encode) would result in a media analytics service relevant for digital security and surveillance or user generated content ad-insertion use cases where intelligent content analysis is required.

Intel is contributing to each core building block with new and existing projects and enhanced performance. After observing that encode is a required building block across all the visual cloud services, Intel released several Scalable Video Technology (SVT) encoder core libraries, along with interoperability with x264 and x265 decoders, to support the ecosystem's needs. Additionally, the Intel® OpenVINO™ Toolkit and the Intel® Rendering Framework (https://software.intel.com/en-us/rendering-framework) make up the inference and render blocks respectively.

ENCODE

At the most basic, encode is compressing video data to reduce it in size. Since the visual cloud is run on video data, Encode becomes one of the key building blocks developers will use in constructing most visual cloud services and pipelines. As of 2019, AV1 has emerged as a new entrant now commercially viable, thanks to SVT-AV1.

There are a variety of individual open source ingredients that make up each Open Visual Cloud Building Block. Some of these ingredients include:

  • Scalable Video Technology (SVT) encoding technology optimized for x86 processors. Supported codecs include HEVC, VP9, and AV1.
  • FFmpeg - FFmpeg is a open source project consisting of a vast software suite of libraries and programs for handling video, audio, and other multimedia files and streams.
  • x265 - x265 is a H.265 / HEVC video encoder application library, designed to encode video or images into an H.265 / HEVC encoded bitstream.
  • x264 - x264 is an open-source software library and a command-line utility developed by VideoLAN for encoding video streams into the H.264/MPEG-4 AVC format.
  • Open WebRTC Toolkit – Open WebRTC Toolkit is an open source real-time media delivery framework, which includes comprehensive media processing functions on video and audio streams.

DECODE

Decode is defined as uncompressing encoded video data. Decode goes hand-in-hand with Encode, as once your video has been encoded, it needs to be decoded to process or view on a screen. Decode technology can be either hardware or software-based. As new HD video file codecs hit the market, we will see a greater adoption of these new codecs into hardware such as TVs, video cameras, smartphones, and others. As of early 2019, some of the open source decoders include:

  • VLC Media Player - VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files.
  • dav1d - dav1d is a new AV1 cross-platform decoder, which is open-source, and focused on speed and correctness.
  • Mozilla Firefox - Firefox is an open source web browser that released AV1 support in Firefox 65 in Jan of 2019.
  • Android Q - Google* has announced that Android* Q will support AV1 decode.
  • Google* Chrome - Google's Chrome/Chromium 70 web-browser is now shipping with AV1 video decoding support

INFERENCE

An inference building block analyzes video content data. Inference is used by Artificial Intelligence (AI) to perform many tasks, from facial recognition to ad insertion, smart city use cases such as street corner traffic management, based on deep learning neural networks.

As video become more ubiquitous, the need to analyze what is shown in the video becomes more and more important. Using AI models, it is possible to train applications to search for specific patterns within a video (a person, a vehicle, brand logo, etc.) and act on what it finds.

Individual open source ingredients that make up the Inference building block can be found in the Intel OpenVINO Toolkit, and include:

  • Intel OpenVINO - OpenVINO, short for Open Visual Inference and Neural network Optimization, is a toolkit that provides developers with improved neural network performance on a variety of hardware (CPU, GPU, FPGA, VPU) and helps them further unlock cost-effective, real-time vision applications.
  • Open Model Zoo - Pre-trained deep learning models and samples for use in Intel OpenVINO.

RENDER

Video rendering is the process by which a computer processes information from a coded data source and uses that information to produce and display an image. This is usually in the context of creating a video animation or visualizing a large data set.

Individual open source ingredients for the Render building block are found in the Intel® Rendering Framework, which includes:

  • Intel Embree - High performance ray tracing kernels.
  • Intel OSPRay - Ray tracing based rendering engine for high-fidelity visualization.
  • Intel OpenSWR - Highly scalable software rasterizer for OpenGL*.
  • Intel Open Image Denoise - An open source library of denoising filters for images rendered with ray tracing.