Skip to content

Conversation

@khalidbourr
Copy link

The original convertDepth function had two issues:

_mm_div_epi16 doesn't exist - there is no SSE/AVX integer division intrinsic
Logic was incorrect - dividing float16 encoded bits as integers corrupts the values

This PR uses UE4's built-in FFloat16 class for portable float16→float32 conversion, then scales by 0.01 to convert cm→m.

Now no compilation errors,
Produces correct depth values
No longer requires F16C CPU support or manual UE4 recompilation

…at16 conversion

- _mm_div_epi16 doesn't exist in any SSE/AVX instruction set
- Original code incorrectly performed integer division on float16 encoded bits
- Use UE4's FFloat16 for portable float16->float32 conversion
- Removes mandatory F16C CPU requirement
- Fixes depth unit conversion (cm to m) to work correctly
@Sanic
Copy link
Contributor

Sanic commented Dec 8, 2025

Hi @khalidbourr Thanks for the PR!
I can't valide this functionality right now on my machine, but i was wondering if this has a negative impact on the compute time? Have you tested how fast this is in comparison? Will this slow down the overall image capturing compared to the old version?

@khalidbourr
Copy link
Author

Dear @Sanic, I haven’t evaluated it from that perspective yet. However, I did encounter build errors in Unreal Engine 4.27 on Linux, and this is the error message I received.”

FStaticMeshLODResources &LODModel = StaticMesh->RenderData->LODResources[PaintingMeshLODIndex];
^
/home/vampiro/UnrealEngine-4.27/Engine/Source/Runtime/Engine/Classes/Engine/StaticMesh.h:519:2: note: 'RenderData' has been explicitly marked deprecated here
UE_DEPRECATED(4.27, "Please do not access this member directly; use UStaticMesh::GetRenderData() or UStaticMesh::SetRenderData().")
^
/home/vampiro/UnrealEngine-4.27/Engine/Source/Runtime/Core/Public/Misc/CoreMiscDefines.h:234:43: note: expanded from macro 'UE_DEPRECATED'
#define UE_DEPRECATED(Version, Message) [[deprecated(Message " Please update your code to the new API before upgrading to the next release, otherwise your project will no longer compile.")]]
^
In file included from /home/vampiro/Documents/Unreal Projects/AI4FOREST/Plugins/ROSIntegrationVision/Intermediate/Build/Linux/B4D820EA/UE4Editor/Development/ROSIntegrationVision/Module.ROSIntegrationVision.cpp:6:
/home/vampiro/Documents/Unreal Projects/AI4FOREST/Plugins/ROSIntegrationVision/Source/ROSIntegrationVision/Private/VisionComponent.cpp:754:4: error: use of undeclared identifier '_mm_div_epi16'; did you mean '_mm_min_epi16'?
_mm_div_epi16(
^~~~~~~~~~~~~
_mm_min_epi16
/home/vampiro/UnrealEngine-4.27/Engine/Extras/ThirdPartyNotUE/SDKs/HostLinux/Linux_x64/v19_clang-11.0.1-centos7/x86_64-unknown-linux-gnu/lib/clang/11.0.1/include/emmintrin.h:2412:1: note: '_mm_min_epi16' declared here
_mm_min_epi16(__m128i __a, __m128i __b)

@Sanic
Copy link
Contributor

Sanic commented Dec 9, 2025

Alright.
Can you see in your Log what the typical tick rate / delay is? There should be some debug outputs telling you how long generating and sending one Sensor image tuple took.

@khalidbourr
Copy link
Author

Once I do that I'll inform you.

@khalidbourr
Copy link
Author

Screenshot from 2025-12-11 02-22-43 Screenshot from 2025-12-11 02-22-08

I tested the VisionComponent tick timing on Linux (Ubuntu, Intel i7 7th Gen, GTX 1050, UE4.27, ROS Melodic). Initially, F16C was not enabled - I confirmed this with objdump -d libUE4Editor-ROSIntegrationVision.so | grep -i vcvtph2ps showing no output. I enabled F16C by adding -mf16c to LinuxToolChain.cs and also modified the convertDepth() function to use hardware intrinsics (_mm_cvtph_ps()) instead of the software FFloat16::GetFloat() loop. After rebuilding, objdump now shows vcvtph2ps instructions confirming F16C is compiled. However, performance remains at ~1000ms per tick (~1 FPS). Interestingly, the first ticks before ROS publishing are fast (~50ms), but once publishing starts, it drops to ~1 FPS - I think the bottleneck may be in ReadPixels, ROS network I/O, or thread synchronization rather than the depth conversion itself. I will check again, at the moment, my modif solve the building issue.

@khalidbourr
Copy link
Author

This is the current implementation of convertdepth I am using, not pushed yet!

void UVisionComponent::convertDepth(const uint16_t *in, __m128 *out) const
{
const size_t size = (Width * Height) / 4;
const __m128 scale = _mm_set1_ps(0.01f);

for (size_t i = 0; i < size; ++i, in += 4, ++out)
{
    // F16C hardware conversion - 4 half-floats to 4 floats in ONE instruction!
    __m128i half4 = _mm_loadl_epi64((__m128i*)in);
    __m128 depth = _mm_cvtph_ps(half4);  // F16C intrinsic!
    *out = _mm_mul_ps(depth, scale);
}

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants