Updates public APIs and READMEs wrt AV1

AxisCommunications · Dec 17, 2024 · a1ff38e · a1ff38e
1 parent 8c0c682
commit a1ff38e
Show file tree

Hide file tree

Showing 8 changed files with 221 additions and 210 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 *Copyright (C) 2021, Axis Communications AB, Lund, Sweden. All Rights Reserved.*
 
 # Signed Video Framework
-This repository holds the framework code of the feature Signed Video. The Signed Video feature secures the video from tampering after signing by adding cryptographic signatures to the video. Each video frame is hashed and signatures are generated repeatedly based on these hashes using a private key set by the signer. The signature data added to the video does not affect the video rendering. The data is added in a Supplemental Enhancement Information (SEI) NALU with type "user data unregistered". This SEI has a UUID of `5369676e-6564-2056-6964-656f2e2e2e30` in hexadecimal.
+This repository holds the framework code of the feature Signed Video. The Signed Video feature secures the video from tampering after signing by adding cryptographic signatures to the video. Each video frame is hashed and signatures are generated repeatedly based on these hashes using a private key set by the signer. The signature data added to the video does not affect the video rendering. The data is added in a Supplemental Enhancement Information (SEI) NALU with type "user data unregistered" (OBU Metadata of type "user private" for AV1). This SEI has a UUID of `5369676e-6564-2056-6964-656f2e2e2e30` in hexadecimal.
 
 A more detailed description of the Signed Video feature is found in [feature-description](./feature-description.md).
 

diff --git a/feature-description.md b/feature-description.md
@@ -14,32 +14,32 @@ A video consists of picture frames displayed at a certain frame rate. If these f
 
 In brief, the principle of signing documents is used, that is, collect information and sign the information using a Private encryption key. Then, packetize the produced signature together with some additional information. For validation, the user can then verify the information by using the signature and the corresponding Public key.
 
-On a high level, *Signed Video* hashes encoded video frames and on a regular basis creates a `document` representing these hashes and signs that `document`. This signature, together with the `document`, is added to the video using Supplementary Enhancement Information (SEI) frames.
+On a high level, *Signed Video* hashes encoded video frames and on a regular basis creates a `document` representing these hashes and signs that `document`. This signature, together with the `document`, is added to the video using Supplementary Enhancement Information (SEI) frames, or OBU Metadata for AV1.
 
 ## Limitations and properties
-*Signed Video* is currently only available for the video codec formats H264 and H265. Therefore, most of the description uses the Network Abstraction Layer (NAL) concept. Note that raw videos are not supported.
+*Signed Video* is currently only available for the video codec formats H.264, H.265 and AV1. Therefore, most of the description uses the Network Abstraction Layer (NAL) and OBU concept. Note that raw videos are not supported.
 
 Signing frequency. Signing is done upon transition between two Group of Pictures (GOP). For short GOP lengths the time between two GOP transitions may be shorter than the time it takes to perform the signing. Hence, there is a limit on how short GOPs the framework can allow for to be able to sign a video in real-time.
 
-Authenticity level. *Signed Video* supports two levels of authenticity; GOP level and NALU level. For GOP level, all frames between two signatures are treated in one single chunk and if the validation fails, all these frames are marked as not authentic even if it is due to a lost frame. For NALU level the framework can identify which frames are authentic or not, or even lost. This means that frame drops can be handled, but not packet losses. Losing packets in general means losing parts of a frame, hence is equivalent with modifying a frame. The cost for validating the authenticity of each individual NAL is an increase in bitrate.
+Authenticity level. *Signed Video* supports two levels of authenticity; GOP level and NALU/OBU level. For GOP level, all frames between two signatures are treated in one single chunk and if the validation fails, all these frames are marked as not authentic even if it is due to a lost frame. For NALU/OBU level the framework can identify which frames are authentic or not, or even lost. This means that frame drops can be handled, but not packet losses. Losing packets in general means losing parts of a frame, hence is equivalent with modifying a frame. The cost for validating the authenticity of each individual NAL is an increase in bitrate.
 
 ## Detailed description
-As mentioned, the framework currently only supports H264 and H265. These codec formats allow the user to add arbitrary data to a stream through SEI frames of type *user data unregistered*. *Signed Video* puts the produced signatures and additional metadata in such frames. These SEI frames are ignored by the decoder and will therefore not affect the video rendering.
-One obvious drawback is that it is easy to destroy the signed video and make it unsigned by simply dropping those SEI frames. In some cases this can also be beneficial if, e.g., the user is no longer interested in its authenticity.
-It is out of scope to protect against lost SEI frames.
+As mentioned, the framework currently only supports H.264, H.265 and AV1. These codec formats allow the user to add arbitrary data to a stream through SEI/OBU Metadata frames of type *user data unregistered*/*user private*. *Signed Video* puts the produced signatures and additional metadata in such frames. These SEI/OBU Metadata frames are ignored by the decoder and will therefore not affect the video rendering.
+One obvious drawback is that it is easy to destroy the signed video and make it unsigned by simply dropping those SEI/OBU Metadata frames. In some cases this can also be beneficial if, e.g., the user is no longer interested in its authenticity.
+It is out of scope to protect against lost SEI/OBU Metadata frames.
 
-All operations are done on the encoded video stream. Each picture frame is split into NAL Units (NALU) and *Signed Video* operates on these NALUs. NALUs that are not part of a picture frame are ignored. These NALUs are
+All operations are done on the encoded video stream. Each picture frame is split into NAL Units (NALU) (OBU for AV1) and *Signed Video* operates on these NALUs/OBUs. NALUs/OBUs that are not part of a picture frame are ignored. These NALUs/OBUs are
 - SPS/PPS/VPS
-- AUD
-- SEIs other than Signed Video specific
+- AUD/SH
+- SEIs other than Signed Video specific (for H.264 and H.265)
 
 Note that these can still affect the visual aspect of a video.
 
 ### Signing a GOP
 Without loss of generality, consider three consecutive GOPs each starting with an IDR (I-frame) followed by 4 non-IDRs (P-frames). In text format it would look like `IPPPPIPPPPIPPPP`.
-The signing information is collected in a SEI frame (`S`) and put just before the picture frame to follow the Access Unit (AU) format. Each I-frame will trigger a signing procedure and ideally the SEI is generated and available instantaneously and can be attached to the stream as `SIPPPPSIPPPPSIPPPP`.
+The signing information is collected in a SEI frame (`S`) and put just before the picture frame to follow the Access Unit (AU) format. Each I-frame will trigger a signing procedure and ideally the SEI/OBU Metadata is generated and available instantaneously and can be attached to the stream as `SIPPPPSIPPPPSIPPPP`.
 
-Each NALU is hashed using SHA-256, but not in a straightforward manner. Since every P-frame directly or indirectly refers to the I-frame starting the GOP they are linked together. Let `h(F)` denote the hash of a frame `F`, and `href = h(I)` is the hash of the first I-frame in a GOP and used as reference. Then each frame in a GOP is hashed according to `hash(F) = h(href, h(F))` where `href` and `h(F)` have been aligned in memory.
+Each NALU/OBU is hashed using SHA-256, but not in a straightforward manner. Since every P-frame directly or indirectly refers to the I-frame starting the GOP they are linked together. Let `h(F)` denote the hash of a frame `F`, and `href = h(I)` is the hash of the first I-frame in a GOP and used as reference. Then each frame in a GOP is hashed according to `hash(F) = h(href, h(F))` where `href` and `h(F)` have been aligned in memory.
 All hashes are collected in a list and together with some metadata form a `document`, which later will be signed.
 
 To preserve the order of GOPs the I-frame of the next GOP is also include in the list of hashes, that is, `hash(Inext) = h(href, h(Inext))` is added as well.
@@ -48,21 +48,15 @@ This `document` is then hashed, denoted _gop hash_ (`= h(document)`), and signed
 
 `signature = sign(h(document))`
 
-and together with the `document` itself is added to the stream in a SEI, that is, SEI = `document + signature`.
+and together with the `document` itself is added to the stream in a SEI/OBU Metadata, that is, SEI = `document + signature`.
 After signing, the next GOP is then initiated with a new `href` using the very same I-frame that closed the previous GOP.
 For the end user to validate the authenticity of a signed video the public key, associated with the private key used when signing, is needed. The *Signed Video Framework* supports including the public key as part of the metadata. This simplifies validating the authenticity of the video, but requires a separate logic to verify its origin.
 
 #### Signing at GOP authenticity level
-Transmitting the list of hashes can be too expensive in terms of an increased bitrate. The *Signed Video Framework* therefore offer a light version in GOP level as authenticitiy level. Instead of transmitting the hash list a single hash representing all the frames and the metadata is computed. This single hash is implemented recursively and is then signed.
+Transmitting the list of hashes can be too expensive in terms of an increased bitrate. The *Signed Video Framework* therefore offer a light version in GOP level as authenticitiy level. Instead of transmitting the hash list a single hash representing all the frames and the metadata is computed. This single hash is a hash of all the hashes in the GOP and is then signed.
+The GOP hash is initialized with a hashed salt `hash(0) = h(salt)`.
 
-The recursive operation is initialized with a hashed salt `hash(0) = h(salt)`. The next step is to add `href` as `hash(1) = h(hash(0), href)` and the n'th hash becomes `hash(n) = h(hash(n-1), hash(F_n))`, where `F_n` is the frame that produced the n'th hash. The last frame added to the recursive hash is `Inext` and a `document` is created just like above, but now without the hash list, hence it includes the metadata only.
-The recursive hash is then finalized with the hash of this `document` which now becomes the _gop hash_
-
-`hash(gop) = h(hash(N), hash(document))`
-
-The _gop hash_ is then signed by generating a signature. Combine the metadata, which by definition is the `document`, and the signature to form the SEI = `document + signature` where `signature = sign(hash(gop))`
-
-For NALU level and long GOP lengths, *Signed Video* automatically falls back to GOP level to avoid very large SEI frames.
+For NALU/OBU level and long GOP lengths, *Signed Video* automatically falls back to GOP level to avoid very large SEI frames.
 
 ### Metadata
 Part from the public key it is possible to add some signer specific information. That information is today locked to the fields
@@ -72,7 +66,7 @@ Part from the public key it is possible to add some signer specific information.
 - Manufacturer (Who is the signer, for example Axis Communication AB)
 - Address (Contact information of signer, e.g., url, email, mail)
 
-### SEI format
+### SEI/OBU Metadata format
 The framework uses the *user data unregistered* type of SEIs. These are organized as
 
 `| NALU header | payload size | UUID | payload | stop bit |`
@@ -85,5 +79,11 @@ The UUID is used to put a *Signed Video* identity to the SEI. The payload includ
 
 By definition the `document` includes everything from the NALU header to the signature tag, hence the entire frame is secured.
 
+For AV1 this looks like
+`| OBU header | payload size | metadata type | UUID | metadata | list of hashes | signature | stop bit |`
+
+`| -------------------------------- document ---------------------------------- | signature | stop bit |`
+
+
 ### Signing in a secure hardware
-When signing in hardware the signing itself may take some time and to avoid piling up frames *Signed Video Framework* supports the SEI frames being added at a later stage, but no later than at the next signing request. Using the example above, a signed video segment could look like `IPSPPPIPPPPSIPPPSP` where the `S`s show up delayed compared to the ideal case `SIPPPPSIPPPPSIPPPP`.
+When signing in hardware the signing itself may take some time and to avoid piling up frames *Signed Video Framework* supports the SEI/OBU Metadata frames being added at a later stage, but no later than at the next signing request. Using the example above, a signed video segment could look like `IPSPPPIPPPPSIPPPSP` where the `S`s show up delayed compared to the ideal case `SIPPPPSIPPPPSIPPPP`.
diff --git a/lib/README.md b/lib/README.md
@@ -27,8 +27,8 @@ plugin. The interfaces can be found in
 with both a threaded and an unthreaded signing plugin. When building the library with the meson
 structure in this repository, the library includes that plugin.
 
-Vendor specific code and APIs are typically handling extra metadata added to the SEI, which needs to
-be interpreted correctly when validating authenticity. With the meson option `vendor` the user can
-select which vendor(s) to include in the build. Typically, when building for signing the vendor for
-that camera is selected, whereas when building for validation all vendors are included. By default,
-all vendors are added.
+Vendor specific code and APIs are typically handling extra metadata added to the SEI/OBU Metadata,
+which needs to be interpreted correctly when validating authenticity. With the meson option `vendor`
+the user can select which vendor(s) to include in the build. Typically, when building for signing
+the vendor for that camera is selected, whereas when building for validation all vendors are
+included. By default, all vendors are added.
diff --git a/lib/src/README.md b/lib/src/README.md
@@ -6,24 +6,24 @@ APIs needed are located in [includes/](./includes/).
 
 ## Making your own validation application
 The APIs needed are [signed_video_common.h](./includes/signed_video_common.h) and
-[signed_video_auth.h](./includes/signed_video_auth.h). To validate a H264 or H265 video you need to
-split the video into NAL Units. For a detailed description and example code see
+[signed_video_auth.h](./includes/signed_video_auth.h). To validate a H.264, H.265 or AV1 video you
+need to split the video into NAL Units/OBUs. For a detailed description and example code see
 [signed_video_auth.h](./includes/signed_video_auth.h) or look at the validator in the
 [signed-video-framework-examples](https://github.com/AxisCommunications/signed-video-framework-examples)
 repository.
 
 ## Making your own signing application
 The APIs needed are [signed_video_common.h](./includes/signed_video_common.h) and
-[signed_video_sign.h](./includes/signed_video_sign.h). To sign a H264 or H265 video you need to
-split the video into NAL Units. Before signing can begin you need to configure the Signed Video
-session. Setting a private key is mandatory, but there are also possibilities to add some product
-information and what level of authentication to use. The public key, needed for validation, is
-automatically added to the stream.
+[signed_video_sign.h](./includes/signed_video_sign.h). To sign a H.264, H.265 or AV1 video you need
+to split the video into NAL Units/OBUs. Before signing can begin you need to configure the Signed
+Video session. Setting a private key is mandatory, but there are also possibilities to add some
+product information and what level of authentication to use. The public key, needed for validation,
+is automatically added to the stream.
 
-The Signed Video Framework generates SEI frames including signatures and other information. Getting
-them and instructions on how to add them to the current stream are handled through the API
-`signed_video_get_nalu_to_prepend()`. Note that the framework follows the Access Unit format of
-H264, hence SEI frames must prepend the current picture frame.
+The Signed Video Framework generates SEI/OBU Metadata frames including signatures and other
+information. Getting them and instructions on how to add them to the current stream are handled
+through the API `signed_video_get_nalu_to_prepend()`. Note that the framework follows the Access
+Unit format of H.264, hence SEI frames must prepend the current picture frame.
 
 For a detailed description and example code see
 [signed_video_sign.h](./includes/signed_video_sign.h) or look at the signer in the