Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trust boundary specification #124

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Praetonus
Copy link
Member


### In the standard library

Unsafe maths functions on numeric primitives will be marked `unsafe-1`. There is nothing requiring to be marked `unsafe-2` or `unsafe-3` currently so these annotations will stay unused for now.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing requiring to be marked unsafe-2 or unsafe-3 currently

This doesn't seem quite right. We have standard library types that do FFI, and standard library types that do pointer math.

Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to only mark functions that expose the unsafety to the user (i.e. that have a narrow contract). Array does raw pointer operations, but these operations are fully controlled by the preliminary checks that Array enforces and as a result can't cause problems (i.e. Array has a wide contract).

This is a generalisation of the checks for FFI. You can only use the FFI from a safe package, but this isn't transitive: a package being safe means that you trust that package to use the FFI correctly and to expose a safe interface to other packages. The same thing would apply to any unsafe operation.

Copy link
Member

@jemc jemc Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this answer does more to answer my other comment than this one. Or maybe I just don't get it.

If we ignore pointer math for the moment and focus on FFI, can you explain why you don't think the FFI-using functions in the net package need to be marked with unsafe-3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I can't cause my program to crash or to have some other kind of UB by using these functions since the inputs to the FFI calls are fully controlled. In addition, since you need to have access to AmbientAuth (or something derived from it) in order to use the stuff in net, there is no need for another layer of protection through --safe.

Copy link
Member

@jemc jemc Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the compiler verify that those inputs are controlled properly as you say?

It is my understanding of this RFC that the compiler is going to enforce that all functions that call FFI include the unsafe-3 annotation. Is that wrong? It sounds now like you're saying that adding the annotation is optional, based on whether you as the code author think you created a safe wrapper for the function.

If that's how this is intended to work, I think this is a step backward from how the --safe flag works now, since it enforces that no packages use FFI other than the ones you whitelist. If the new trust boundary works based on these annotations, and the annotations are optional, then you're not able to enforce anything, and an uncareful or unscrupulous package author can sneak an unsafe Pony API through a --safe-{x}-protected compilation simply by leaving out the annotations...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will enforce that

  • Every package that calls unsafe-1 functions is marked --safe-1 or higher
  • Every package that calls unsafe-2 functions is marked --safe-2 or higher
  • Every package that calls unsafe-3 functions or uses the FFI is marked --safe-3 or higher

The compiler won't be able to enforce that the exposed interface is safe, but that doesn't change anything for the FFI. In the current state of the compiler and language, I can have a function that calls sqrt without any input validation, and even if I mark the package it is declared in as trusted, it won't prevent untrusted packages from using it with negative numbers and causing UB. This RFC will not change anything on that front.

Copy link
Member

@jemc jemc Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for spelling it out further. I understand what's being proposed now.

The part I didn't understand is the point that FFI calls would be treated as unsafe-3 function calls, without needing to be annotated, and that the annotation enforces a compiler requirement on the call site, rather than fulfilling a compiler requirement within the body.

After reading your comment here, and @plietar's comment below, I understand better.


### In the language

New syntax is needed to mark unsafe operations. We propose to use new annotations for this purpose: `unsafe-1`, `unsafe-2` and `unsafe-3`. These annotations map directly to the various levels of the trust boundary. These annotations are only allowed on function definitions, for example `fun \unsafe-1\ foo()`.
Copy link
Member

@jemc jemc Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these annotations apply transitively, like the question mark used for partial functions?

That is, if foo is marked with unsafe-1, am I required to put the same mark on any function that calls foo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these annotations aren't transitive for the reasons I explained in my reply to your other comment.

@plietar
Copy link

plietar commented Apr 23, 2018

I don't think there's much point treating (potential) undefined behaviour and FFI as separate unsafety levels. You can achieve one with the other. I'm also not sure guarding against unsafe-1 is very useful. As I understand it, the most it can do is cause a logic error.


If I understand the RFC correctly, I can call any function marked as unsafe-3 from my package, as long as it is built with the --safe-3. While this nice to prevent packages from unexpectedly using unsafe functions, it doesn't allow for easy auditing of packages.

I find Rust's model very nice. Unsafe operations (such as dereferencing a pointer, or calling an unsafe function) requires an unsafe block around the operation. This way I can easily audit a package and identify all places that call unsafe methods.

A common complaint about Rust's unsafe mechanism is the use of the keyword unsafe in both context, even though they have opposite meanings (as pointing out in the RFC), which leads to confusion.

I would suggest we use something like:

var p : Pointer[Foo ref] = ...
p() // error: Pointer's apply function is marked as unsafe
safe // We explicitly assert that what we're doing here is safe
  let x : Foo ref = p() // dereferences p
end

struct Pointer[A]
  fun \unsafe\ apply(): this->A => compile_intrinsic

In short, marking a function as unsafe means it has preconditions which are not encoded in the type system and cannot be verified by the compiler. A safe block declares that I have manually verified these preconditions, and order to use a safe block I must pass the --safe option for this package to the compiler.

@Praetonus
Copy link
Member Author

Praetonus commented Apr 24, 2018

@plietar Regarding the different safety levels for potential undefined behaviour and FFI, I'll give a concrete example of how this could be useful.

Suppose you have access to raw pointer operations in Pony with unsafe-2, and you make a package implementing some data structure that uses those pointer operations for efficiency. You'll need --safe-2 to build this package. Besides that, you have the net package, which uses the FFI to access the network, and thus requires --safe-3 to build.

In the old Pony playground, which AFAIK wasn't sandboxed in a container/VM, all builds were made with --safe in order to prevent access to files, the network, or other IO via the FFI. Suppose that we implemented this RFC for the old playground. It would be possible to build with --safe-2 and allow the data structure package to build, while still rejecting net. This would be fine because raw pointer operations can only cause your program to behave badly, they won't cause uncontrolled effects on the system.

If we were to have only one safety level for both undefined behaviour and the FFI we'd lose that capability.


Regarding unsafe-1, I think it is necessary since undefined results can provoke abnormal program termination. The most relevant example here is unsafe integer division by zero. Depending on the platform, it can raise a SIGFPE (which will terminate the program), terminate the program without a SIGFPE, return a perfectly defined value, throw an SEH exception, invoke a user-defined function (which can do any of the previous things), and probably some other things on exotic platforms.

It also needs to be separated from unsafe-2 because undefined results can't cause memory corruption (unless some user-defined function for handling division by zero causes memory corruption, but I think we can ignore that extreme edge case since FFI access (or linking the Pony program with some C code) would be needed to setup the function anyway).


I agree that it would be nice to have a way to audit packages for unsafe operations, but I'm not sure that a safe block is the best solution since we'd have to reserve that as a keyword. As an alternative, what do you think of requiring a call-site annotation on unsafe function calls, similar to partial functions? Your example would look like this:

var p : Pointer[Foo ref] = ...
p()
let x : Foo ref = p() \unsafe-2\

struct Pointer[A]
  fun \unsafe-2\ apply(): this->A => compile_intrinsic

Base automatically changed from master to main February 8, 2021 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants