Precompiled PVA kernel example

sohamm17 · mdemircin · whom3 · mdemircin · commit 37253f538c2f · 2024-08-02T10:38:50.000+03:00
Co-authored-by: Mehmet Demircin &lt;mdemircin@nvidia.com&gt;
Co-authored-by: Wendell Hom &lt;whom@nvidia.com&gt;
Signed-off-by: sohams &lt;sohams@nvidia.com&gt;
diff --git a/applications/CMakeLists.txt b/applications/CMakeLists.txt
@@ -70,6 +70,8 @@ add_holohub_application(object_detection_torch)
 
 add_holohub_application(openigtlink_3dslicer DEPENDS OPERATORS openigtlink)
 
+add_holohub_application(precompiled_pva)
+
 add_holohub_application(prohawk_video_replayer DEPENDS OPERATORS prohawk_video_processing)
 
 add_holohub_application(qt_video_replayer DEPENDS OPERATORS qt_video npp_filter)
diff --git a/applications/precompiled_pva/CMakeLists.txt b/applications/precompiled_pva/CMakeLists.txt
@@ -0,0 +1,19 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+cmake_minimum_required(VERSION 3.20)
+project(precompiled_pva)
+
+add_subdirectory(cpp)
diff --git a/applications/precompiled_pva/README.md b/applications/precompiled_pva/README.md
@@ -0,0 +1,73 @@
+# PVA-Accelerated Image Sharpening Application
+
+This application demonstrates the usage of [Programmable Vision Accelerator (PVA)](#about-pva) within a Holoscan
+application. It reads a video stream, applies a 2D unsharp mask filter and renders it via the
+visualizer. The unsharp mask filtering operation is done in PVA. Since the PVA is used for this
+operation, the GPU workload is minimized. This example is a demonstration of how pre-processing, post-processing, and image processing tasks can be offloaded from a GPU, allowing it to concentrate on more compute-intensive machine learning and artificial intelligence tasks.
+
+This example application processes a video stream, displaying two visualizer windows: one for the original stream and another for the stream enhanced with image sharpening via PVA.
+
+## About PVA
+
+PVA is a highly power-efficient VLIW processor integrated into NVIDIA Tegra platforms, specifically designed for advanced image processing and computer vision algorithms. The CUPVA SDK offers a comprehensive and unified programming model for PVA, enabling developers to create and optimize their own algorithms. For access to the SDK and further development opportunities, please contact NVIDIA.
+
+## Content
+
+- `main.cpp`: This file contains a C++ Holoscan application that demonstrates the use of an operator for loading and executing a precompiled PVA library dedicated to performing the unsharp masking algorithm on images. CUPVA SDK and license are not required to run this Holohub application.
+- `pva_unsharp_mask/`: This directory houses the `pva_unsharp_mask.hpp` header file, which declares the `PvaUnsharpMask` class. The `PvaUnsharpMask` class includes an `init` API, invoked for the initial tensor, and a `process` API, used for processing input tensors. Precompiled algorithm library file, `libpva_unsharp_mask.a`, and the corresponding allow list file, `cupva_allowlist_pva_unsharp_mask`, are automatically downloaded by the CMake scripts.
+
+
+## Algorithm Overview
+
+The PreCompiledPVAExecutor operator performs an image sharpening operation in three steps:
+
+1. Convert the input RGB image to the NV24 color format.
+2. Apply a 5x5 unsharp mask filter on the luminance color plane.
+3. Convert the enhanced image back to the RGB format.
+
+The [VPI library](https://developer.nvidia.com/embedded/vpi) offers numerous algorithm examples that leverage the PVA as the backend.
+
+## Compiling the application
+
+Build the application inside docker
+
+```
+$ ./dev_container build --img holohub:precompiled_pva --base_img nvcr.io/nvidia/clara-holoscan/holoscan:v2.1.0-dgpu --docker_file ./Dockerfile
+# Check which version of CUPVA is installed on your platform at /opt/nvidia
+$ ./dev_container launch --img holohub:precompiled_pva --docker_opts "-v /opt/nvidia/cupva-<version>:/opt/nvidia/cupva-<version> --device /dev/nvhost-ctrl-pva0:/dev/nvhost-ctrl-pva0 --device /dev/nvmap:/dev/nvmap --device /dev/dri/renderD129:/dev/dri/renderD129"
+```
+
+Inside docker, add to your environment variable the following directories:
+```
+# inside docker
+$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/tegra/:/opt/nvidia/cupva-2.5/lib/aarch64-linux-gnu/
+```
+
+Build the application inside docker:
+```
+$ ./run build precompiled_pva
+```
+## Running the application
+
+The application takes an endoscopy video stream as input, applies the unsharp mask filter, and shows it in
+HoloViz window.
+
+Before running the application, deploy VPU application signature allow-list on target in your host (outside a container):
+```bash
+sudo cp <HOLOHUB_BUILD_DIR>/applications/precompiled_pva/cpp/pva_unsharp_mask/cupva_allowlist_pva_unsharp_mask /etc/pva/allow.d/cupva_allowlist_pva_unsharp_mask
+sudo pva_allow
+```
+
+Run the same docker container you used to build your application
+
+```
+$ ./dev_container launch --img holohub:precompiled_pva --docker_opts "-v /opt/nvidia/cupva-<version>:/opt/nvidia/cupva-<version> --device /dev/nvhost-ctrl-pva0:/dev/nvhost-ctrl-pva0 --device /dev/nvmap:/dev/nvmap --device /dev/dri/renderD129:/dev/dri/renderD129"
+
+# inside docker
+# don't forget the line below to export the environment variables
+$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/tegra/:/opt/nvidia/cupva-2.5/lib/aarch64-linux-gnu/
+$ ./run launch precompiled_pva
+```
+
+
+![PVA Example](pva_example.png)
diff --git a/applications/precompiled_pva/cpp/CMakeLists.txt b/applications/precompiled_pva/cpp/CMakeLists.txt
@@ -0,0 +1,67 @@
+#
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA Corporation and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA Corporation is strictly prohibited.
+#
+
+find_package(holoscan 1.0.3 REQUIRED CONFIG
+             PATHS "/opt/nvidia/holoscan" "/workspace/holoscan-sdk/install")
+
+add_executable(precompiled_pva
+  main.cpp
+)
+
+target_link_libraries(precompiled_pva
+  PRIVATE
+  holoscan::core
+  holoscan::ops::video_stream_replayer
+  holoscan::ops::video_stream_recorder
+  holoscan::ops::holoviz
+)
+
+add_library(pva_unsharp_mask STATIC IMPORTED)
+
+# Define the location in the build directory where libpva_unsharp_mask.a will be used
+set(PVA_UNSHARP_MASK_LIB_DEST "${CMAKE_CURRENT_BINARY_DIR}/pva_unsharp_mask/libpva_unsharp_mask.a")
+# Define the destination path in the build directory for cupva_allowlist_pva_unsharp_mask
+set(CUPVA_ALLOWLIST_DEST "${CMAKE_CURRENT_BINARY_DIR}/pva_unsharp_mask/cupva_allowlist_pva_unsharp_mask")
+
+# Define the URL for downloading libpva_unsharp_mask.a if it's not found in the source directory
+set(PVA_UNSHARP_MASK_URL "https://edge.urm.nvidia.com/artifactory/sw-holoscan-thirdparty-generic-local/pva/libpva_unsharp_mask.a")
+# Define the URL for downloading cupva_allowlist_pva_unsharp_mask
+set(CUPVA_ALLOWLIST_URL "https://edge.urm.nvidia.com/artifactory/sw-holoscan-thirdparty-generic-local/pva/cupva_allowlist_pva_unsharp_mask")
+
+# Define a custom target for preparing libpva_unsharp_mask.a and cupva_allowlist_pva_unsharp_mask
+add_custom_target(prepare_pva_dependencies
+  COMMAND ${CMAKE_COMMAND} -E cmake_echo_color --cyan "Preparing PVA dependencies..."
+  COMMAND ${CMAKE_COMMAND} -E make_directory "${CMAKE_CURRENT_BINARY_DIR}/pva_unsharp_mask"
+  COMMAND ${CMAKE_COMMAND} -E cmake_echo_color --green "Directory ensured at ${CMAKE_CURRENT_BINARY_DIR}/pva_unsharp_mask"
+  COMMAND ${CMAKE_COMMAND} -D PVA_UNSHARP_MASK_LIB_DEST="${PVA_UNSHARP_MASK_LIB_DEST}" -D PVA_UNSHARP_MASK_URL="${PVA_UNSHARP_MASK_URL}" -D CUPVA_ALLOWLIST_URL="${CUPVA_ALLOWLIST_URL}" -D CUPVA_ALLOWLIST_DEST="${CUPVA_ALLOWLIST_DEST}" -P "${CMAKE_CURRENT_LIST_DIR}/PreparePVADependencies.cmake"
+  COMMENT "Preparing libpva_unsharp_mask.a and cupva_allowlist_pva_unsharp_mask"
+)
+
+add_dependencies(pva_unsharp_mask prepare_pva_dependencies)
+
+# Update the IMPORTED_LOCATION to the new path in the build directory
+set_target_properties(pva_unsharp_mask PROPERTIES IMPORTED_LOCATION ${PVA_UNSHARP_MASK_LIB_DEST})
+
+# add according to your CUPVA version here
+find_library(CUPVAHOST_LIB libcupva_host.so.2.5 PATHS /opt/nvidia/cupva-2.5/lib/aarch64-linux-gnu/ REQUIRED)
+
+target_link_libraries(precompiled_pva
+  PUBLIC
+  pva_unsharp_mask
+  ${CUPVAHOST_LIB}
+)
+
+# Copy the config to the binary directory
+add_custom_target(precompiled_pva_deps
+  COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_CURRENT_SOURCE_DIR}/main.yaml" ${CMAKE_CURRENT_BINARY_DIR}
+  DEPENDS "main.yaml"
+  BYPRODUCTS "main.yaml"
+)
+add_dependencies(precompiled_pva precompiled_pva_deps)
diff --git a/applications/precompiled_pva/cpp/PreparePVADependencies.cmake b/applications/precompiled_pva/cpp/PreparePVADependencies.cmake
@@ -0,0 +1,28 @@
+if(NOT EXISTS "${PVA_UNSHARP_MASK_LIB_DEST}")
+  # Download libpva_unsharp_mask.a using curl
+  message(STATUS "libpva_unsharp_mask.a not found in source directory. Downloading from ${PVA_UNSHARP_MASK_URL} using curl")
+  execute_process(COMMAND curl -L ${PVA_UNSHARP_MASK_URL} -o ${PVA_UNSHARP_MASK_LIB_DEST}
+                  RESULT_VARIABLE result
+                  OUTPUT_QUIET)
+  if(NOT result EQUAL "0")
+    message(FATAL_ERROR "Error downloading libpva_unsharp_mask.a using curl")
+  endif()
+  # Check if the downloaded file contains a "File not found" error message
+  file(READ ${PVA_UNSHARP_MASK_LIB_DEST} contents)
+  if(contents MATCHES "\"status\" : 404")
+    message(FATAL_ERROR "Downloaded file contains a 'File not found' error. Please check the URL and try again.")
+  endif()
+  # Download cupva_allowlist_pva_unsharp_mask using curl
+  message(STATUS "Downloading cupva_allowlist_pva_unsharp_mask from ${CUPVA_ALLOWLIST_URL} using curl")
+  execute_process(COMMAND curl -L ${CUPVA_ALLOWLIST_URL} -o ${CUPVA_ALLOWLIST_DEST}
+                  RESULT_VARIABLE result_allowlist
+                  OUTPUT_QUIET)
+  if(NOT result_allowlist EQUAL "0")
+    message(FATAL_ERROR "Error downloading cupva_allowlist_pva_unsharp_mask using curl")
+  endif()
+  # Check if the downloaded file contains a "File not found" error message
+  file(READ ${CUPVA_ALLOWLIST_DEST} contents_allowlist)
+  if(contents_allowlist MATCHES "\"status\" : 404")
+    message(FATAL_ERROR "Downloaded cupva_allowlist_pva_unsharp_mask contains a 'File not found' error. Please check the URL and try again.")
+  endif()
+endif()
diff --git a/applications/precompiled_pva/cpp/main.cpp b/applications/precompiled_pva/cpp/main.cpp
@@ -0,0 +1,146 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "gxf/std/tensor.hpp"
+#include "holoscan/holoscan.hpp"
+#include "pva_unsharp_mask/pva_unsharp_mask.hpp"
+
+#include <holoscan/operators/holoviz/holoviz.hpp>
+#include <holoscan/operators/video_stream_recorder/video_stream_recorder.hpp>
+#include <holoscan/operators/video_stream_replayer/video_stream_replayer.hpp>
+#include <holoscan/core/system/gpu_resource_monitor.hpp>
+
+#include <iostream>
+#include <string>
+
+namespace holoscan::ops {
+class PreCompiledPVAExecutor : public Operator {
+ public:
+  HOLOSCAN_OPERATOR_FORWARD_ARGS(PreCompiledPVAExecutor);
+  PreCompiledPVAExecutor() = default;
+
+  void setup(OperatorSpec& spec) override {
+    spec.param(allocator_, "allocator", "Allocator", "Allocator to allocate output tensor.");
+    spec.input<gxf::Entity>("input");
+    spec.output<gxf::Entity>("output");
+  }
+  void compute(InputContext& op_input, OutputContext& op_output,
+               ExecutionContext& context) override {
+    auto maybe_input_message = op_input.receive<gxf::Entity>("input");
+    if (!maybe_input_message.has_value()) {
+      HOLOSCAN_LOG_ERROR("Failed to receive input message gxf::Entity");
+      return;
+    }
+    auto input_tensor = maybe_input_message.value().get<holoscan::Tensor>();
+    if (!input_tensor) {
+      HOLOSCAN_LOG_ERROR("Failed to receive holoscan::Tensor from input message gxf::Entity");
+      return;
+    }
+
+    // get handle to underlying nvidia::gxf::Allocator from std::shared_ptr<holoscan::Allocator>
+    auto allocator = nvidia::gxf::Handle<nvidia::gxf::Allocator>::Create(
+        fragment()->executor().context(), allocator_->gxf_cid());
+
+    // cast Holoscan::Tensor to nvidia::gxf::Tensor to use its APIs directly
+    nvidia::gxf::Tensor input_tensor_gxf{input_tensor->dl_ctx()};
+
+    auto out_message = CreateTensorMap(
+        context.context(),
+        allocator.value(),
+        {{"output",
+          nvidia::gxf::MemoryStorageType::kDevice,
+          input_tensor_gxf.shape(),
+          nvidia::gxf::PrimitiveType::kUnsigned8,
+          0,
+          nvidia::gxf::ComputeTrivialStrides(
+              input_tensor_gxf.shape(),
+              nvidia::gxf::PrimitiveTypeSize(nvidia::gxf::PrimitiveType::kUnsigned8))}},
+        false);
+
+    if (!out_message) { std::runtime_error("failed to create out_message"); }
+    const auto output_tensor = out_message.value().get<nvidia::gxf::Tensor>();
+    if (!output_tensor) { std::runtime_error("failed to create out_tensor"); }
+
+    uint8_t* input_tensor_data = static_cast<uint8_t*>(input_tensor->data());
+    uint8_t* output_tensor_data = static_cast<uint8_t*>(output_tensor.value()->pointer());
+    if (output_tensor_data == nullptr) {
+      throw std::runtime_error("Failed to allocate memory for the output image");
+    }
+
+    const int32_t imageWidth{static_cast<int32_t>(input_tensor->shape()[1])};
+    const int32_t imageHeight{static_cast<int32_t>(input_tensor->shape()[0])};
+    const int32_t inputLinePitch{static_cast<int32_t>(input_tensor->shape()[1])};
+    const int32_t outputLinePitch{static_cast<int32_t>(input_tensor->shape()[1])};
+
+    if (!pvaOperatorTask_.isInitialized()) {
+      pvaOperatorTask_.init(imageWidth, imageHeight, inputLinePitch, outputLinePitch);
+    }
+    pvaOperatorTask_.process(input_tensor_data, output_tensor_data);
+    auto result = gxf::Entity(std::move(out_message.value()));
+
+    op_output.emit(result, "output");
+  }
+
+ private:
+  Parameter<std::shared_ptr<Allocator>> allocator_;
+  PvaUnsharpMask pvaOperatorTask_;
+};
+}  // namespace holoscan::ops
+
+class App : public holoscan::Application {
+ public:
+  void compose() override {
+    using namespace holoscan;
+
+    uint32_t max_width{1920};
+    uint32_t max_height{1080};
+    int64_t source_block_size = max_width * max_height * 3;
+
+    std::shared_ptr<BlockMemoryPool> pva_allocator =
+        make_resource<BlockMemoryPool>("allocator", 1, source_block_size, 1);
+
+    auto precompiledpva = make_operator<ops::PreCompiledPVAExecutor>(
+        "precompiledpva", Arg("allocator") = pva_allocator);
+
+    auto source = make_operator<ops::VideoStreamReplayerOp>("replayer", from_config("replayer"));
+
+    auto recorder = make_operator<ops::VideoStreamRecorderOp>("recorder", from_config("recorder"));
+    auto visualizer1 = make_operator<ops::HolovizOp>(
+        "holoviz1", from_config("holoviz"), Arg("window_title") = std::string("Original Stream"));
+    auto visualizer2 =
+        make_operator<ops::HolovizOp>("holoviz2",
+                                      from_config("holoviz"),
+                                      Arg("window_title") = std::string("Image Sharpened Stream"));
+
+    add_flow(source, precompiledpva);
+    add_flow(source, visualizer1, {{"output", "receivers"}});
+    // add_flow(precompiledpva, recorder);
+    add_flow(precompiledpva, visualizer2, {{"output", "receivers"}});
+  }
+};
+
+int main(int argc, char** argv) {
+  auto app = holoscan::make_application<App>();
+
+  auto config_path = std::filesystem::canonical(argv[0]).parent_path();
+  config_path += "/main.yaml";
+  app->config(config_path);
+
+  app->run();
+
+  return 0;
+}
diff --git a/applications/precompiled_pva/cpp/main.yaml b/applications/precompiled_pva/cpp/main.yaml
@@ -0,0 +1,39 @@
+%YAML 1.2
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+---
+extensions:
+  - libgxf_std.so
+  - libgxf_cuda.so
+  - libgxf_multimedia.so
+  - libgxf_serialization.so
+
+replayer:
+  directory: /workspace/holohub/data/endoscopy
+  basename: "surgical_video"
+  frame_rate: 0   # as specified in timestamps
+  repeat: true    # default: false
+  realtime: true  # default: true
+  count: 0        # default: 0 (no frame count restriction)
+
+recorder:
+  directory: "/tmp"
+  basename: "surgical_video_sharpened"
+
+holoviz:
+  width: 854
+  height: 480
+
+
diff --git a/applications/precompiled_pva/cpp/metadata.json b/applications/precompiled_pva/cpp/metadata.json
diff --git a/applications/precompiled_pva/cpp/pva_unsharp_mask/pva_unsharp_mask.hpp b/applications/precompiled_pva/cpp/pva_unsharp_mask/pva_unsharp_mask.hpp
diff --git a/applications/precompiled_pva/pva_example.png b/applications/precompiled_pva/pva_example.png