-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Signed Fixed-width Integer Support #3162
Comments
Are there considerations to support overflow detection? |
I'm not aware of specific considerations here as of yet. The tinkering so far has focused on efficiency, avoiding signed integer-related UB we may inherit from C, and just general correctness. Does anyone know: is there any prior art in this space for the |
The usual semantics for unsigned types are to wrap around, without overflow. (However, add with carry/subtract with borrow/extended multiplication are often supported.) The overflow semantics for signed types are quite different; they are well supported in LLVM and as builtins in major C compilers. So support is available but might require compiler support. |
The proposed For the sign-sensitive operations, it specifically checks for and avoids the problematic cases which would trigger signed integer overflow (e.g., div and mod explicitly check the few cases where that can happen, so you get predictable albeit perhaps initially odd results like Perhaps I misunderstood what you meant by "support overflow detection". The proposed design explicitly avoids signed integer overflow in the C sense and values will wrap around, but it doesn't attempt to dynamically detect specific cases at runtime in order to warn/error for the user. |
The runtime detection (with catch support?) is what I'm asking about. Just to be clear: I'm just asking whether this is currently considered, I don't think this is important for basic signed integer support. |
This seems closely related to the Things like overflow detection and such might make more sense to implement on |
Yes, that code was helpful to reference =) I just added a
I am increasingly curious if Lean could reasonably help a user avoid/detect overflow while using these sorts of "primitive" |
C/C++ make signed integer overflow undefined but efficient tools are generally available. LLVM support: https://llvm.org/docs/LangRef.html#llvm-sadd-with-overflow-intrinsics A generic operation would have to trap overflow by checking the inputs. |
This is now implemented through #5790, #5885 and #5961. Additionally To the best of our knowledge we have addressed all UB concerns by both mimicking semantics of Additional things such as proper proof APIs for both |
Awesome!! This is very useful, thank you 😃 |
Proposal
Lean currently supports unsigned fixed-width integers (
UInt8
,UInt16
,UInt23
,UInt64
).This proposal is regarding similar support for signed fixed-width integers that provide similar efficiency benefits to the
UIntN
types (e.g., native C representations).Design Points for discussion
Semantics for overflow and avoiding UB.
UIntN
types.i. simply behave more like
Fin
andBitVec
in Lean4 today and provide predictable wraparound behavior on overflow (there is a slight bias for these semantics in this RFC at the moment), orii. choose to detect and panic on such occurrences (perhaps similar to Rust in debug mode?)
C representation
intn_t
types could make for the least surprising set of C definitions/FFI APIsuintn_t
types could also be made as they are already used by the compiler for some typesWhere would
IntN
definitions ideally live in the short and long term?IntN
types or similar?Approaches
Approach 1:
UIntN
SubtypesA
UIntN
is used to represent eachIntN
type, e.g. here is Int8:Efficient Representation
This approach already provides an efficient representation in compiled code with no compiler extensions by being defined directly in terms of
UInt8
. I.e., all compiled definitions operating on theseInt8
s will useuint8_t
directly. Note that this does mean that FFI code will need to manually track which generated C definitions are usinguint8_t
forInt8
and which are usinguint8_t
for some other type (e.g.,UInt8
,Bool
, etc).Operators
Arithmetic operations which are "sign-insensitive" can be defined directly in terms of their corresponding
UInt8
operation, e.g. Int8.add:The remaining few sign-sensitive operations can be given the appropriate implementation in Lean (and backed with a more efficient C representation where needed), e.g., Int8.div:
and the corresponding C definition lean_int8_div:
Here's an initial draft for
Int8
for feedback here if we want to use this approach:Potential pros
Int8
ops simply use the sameUInt8
opUInt8
already provides many appropriate C definitions that are re-usedPotential cons
IntN
types will all be in terms ofuintn_t
types and notintn_t
types. The casts between auintn_t
andintn_t
type are free and semantically correct. The book keeping to track whichuintn_t
means what in the FFI/C layer, however, could be painful and a common Lean4/FFI footgun.Approach 2:
Fin
SubtypeA
Fin
is used to represent eachIntN
type, e.g. here is Int8:Efficient Representation
This approach would require a compiler extension like what exists for
UIntN
types to use the expectedintn_t
C type. This would obviously mean more up front work, but the resulting compiled C/FFI code would be in terms of the expected signed integer type, which seems likely to be the least surprising choice for a user to encounter.Operators
Arithmetic operations which are "sign-insensitive" can be defined directly in terms of their corresponding
Fin
operation in Lean and given the expected efficient representations in C, e.g. Int8.add:The remaining few sign-sensitive operations can be given a more complex definition when appropriate in Lean (along with a more efficient C implementation where needed), e.g., Int8.div:
and the corresponding C definition lean_int8_div:
Here's an initial draft for
Int8
usingFin
for feedback here if we want to use this approach:Potential pros
Fin
may be "more direct" or "more natural" than defining them in terms of unsigned fixed-width integers (debatable of course!)Potential cons
UIntN
Approach 3:
Int
SubtypesEach
IntN
type could also be defined as a subtype ofInt
similar to the following:Zulip discussion steered me away from this and towards the aforementioned UIntN approach, so less experimenting has been done with this approach.
Potential pros
Potential cons
UIntN
approach.Community Feedback
The lack of signed fixed-width integer types has come up in discussion with Lean FRO members and has been discussed some in the lean4 dev community. It was also noted as a pain point in the community when working with external code during Lean Together 2024.
Thank you to those who have helped in discussing this topic so far.
Impact
Add 👍 to issues you consider important. If others benefit from the changes in this proposal being added, please ask them to add 👍 to it.
Related Work
The following also involve supporting signed fixed-width integer types:
Nat
Int64
impl using the same proposed design of wrapping an unsigned int (USize
instead of thanUIntN
). The proposedInt8
impl is similar with a few slight differences (e.g., fewer bitwise operations, leverages a few C definitions for efficiency, etc)Std.Data.BitVec
The text was updated successfully, but these errors were encountered: