Two segments intersection #334
Replies: 14 comments 83 replies
-
@alexisnaveros |
Beta Was this translation helpful? Give feedback.
-
Yes, it's about time this became a discussion. The rounding is still there; it just happens before the
EDIT: And it doesn't appear to be a compiler thing, as the assembly version rounding after the idiv is also faster. It's internal CPU scheduling stuff I assume. |
Beta Was this translation helpful? Give feedback.
-
Here's a modifiy _A that's fractionally faster. (It has only one division.)
EDIT: THIS ALGORITHM CAN DEGRADE BADLY WHEN THE 2 SEGMENTS APPROACH COLLINEAR, SO IT'S NOT RECOMMENDED. |
Beta Was this translation helpful? Give feedback.
-
Yup, that's good. Note that compilers do that on their own when compiling with -ffast-math, but I also like being explicit about that stuff. There's a new little problem with the int128 solutions (I'm using them somewhere); it can raise a SIGFPE signal when the int128_t multiplication product, after being divided by the int64_t, doesn't fit in a int64_t. And I think it's worth detecting when many high bits are all zeroes/ones, to branch and perform a double division, even on chips with fast idiv. When the branch is taken, it's just 10% slower than _A. I guess I'll post a PB7 soon. |
Beta Was this translation helpful? Give feedback.
-
Here's my testing code for these algorithms.
Edit:
|
Beta Was this translation helpful? Give feedback.
-
I appreciate your effort to make a robust algorithm for input data in the range within +/- 1 << 62. But does anyone really need this range? For me, the data within +/- 1 << 51 will be fine. And as I understand, this is also good for Angus. In this case, I think that one can use my initial function ('PB'), which is much faster than 'PB3'. For the input data in the range within +- 1<<30, one can write an even faster function. |
Beta Was this translation helpful? Give feedback.
-
Here's an update, (You can enable
With an input range of The threshold to use faster double-precision math is set up in a way that the maximum error is 0.125 (when edge lengths near 2^50) (to clarify what an error of 0.125 means, it means a true intersection point at 1000000000000007.4 might be rounded to 1000000000000008 instead of 1000000000000007) Benchmark, two short edges far from the origin:
Benchmark, two very long edges:
EDIT: I had written ranges based on INT_MAX, I meant INT64_MAX. |
Beta Was this translation helpful? Give feedback.
-
Edited: Nov 26
Known issues:
Code
|
Beta Was this translation helpful? Give feedback.
-
Okay, Sorry for the mess of Enabling Enabling the gizmos and goodies is about 25% slower. If they are all disabled, it's only 17% slower than inaccurate
Yeah, that needs a severe clean up. |
Beta Was this translation helpful? Give feedback.
-
Disregard Inspired by @sergey-239 's SM1, I switched to unsigned math, and that gives us a whole extra bit of numerical accuracy. For an input range of Performance may be 5% lower than
EDIT: Added mathOverflow_Mul64s() and optimized the worst case path (non-intersecting, huge alpha, tiny denom, overflow detection), it's now faster by some 10%. EDIT 2: Added inline assembly for proper after-shift rounding, making the extra rounding practically free. |
Beta Was this translation helpful? Give feedback.
-
I have cleaned up these experiments a bit for proper integration with code at work, here: I also added some quick MSVC support by testing through godbolt.org, and I could see that the optimization coming out of MSVC is very much horrible. It's not able to inline The API is pretty simple:
|
Beta Was this translation helpful? Give feedback.
-
This implementation should work on any platform that has a C++ compiler with 64-bit
|
Beta Was this translation helpful? Give feedback.
-
While I was trying to speedup SM3 and add all necessary checks for border cases I got that I need to start from scratch. So, I ended up with the following SM5 (I jumped over a version as I still plan to update the SM3 code for completeness) My attempt to speedup the division with 32-bit version of Instead, I borrowed an idea from @alexisnaveros to use FP math for products magnitudes that fits into 53 bits but for a larger number of suitable cases. The other thing I implemented is an estimation of intersection offsets from the base point (the l0p0) using FP math and then using division only for cases when this estimation differs from the true intersection point by more than one. This leaves approximately 30% cases for integer division. Also, I dropped the original intent to write the function completely in assembly language as It performs 30%-to-50% faster than mathLineLine_HitAnySafeRobust on my Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz notebook. However it is slower than
The thorough testing over 63^6 combinations is not performed yet, but the random cases I already passed during debugging gives difference with boost implementation not bigger than 1.41 where FP is used to calculate the intersection.
Code:
|
Beta Was this translation helpful? Give feedback.
-
Relevant to this discusssion, I've just added a new GetIntersectPoint benchmark test. |
Beta Was this translation helpful? Give feedback.
-
This thread created to continue the discussion of the best (most fast and accurate) two segments intersection implementation that started in #317
At some iteration @bahvalo in #317 (comment) proposed the algorithm that is derived from solving the two lines intersection equation (see https://en.wikipedia.org/wiki/Line%E2%80%93line_intersection for details).
To the moment with the @alexisnaveros's efforts and some ideas from myself we ended up with the following (please read the original thread to understand how did we get here):
Full inline assembly implementation:
Readers, please note, that this is not thoroughly tested yet and may contain bugs.
Beta Was this translation helpful? Give feedback.
All reactions