-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize compressed blob prefixes #135
base: main
Are you sure you want to change the base?
Standardize compressed blob prefixes #135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I would follow the typical RFC format. The spec is currently outdated and not maintained by anyone.
That's some bad news. I mean, if I want to create my implementation of Polkadot from scratch, where do I go? |
|
||
The current approach to compressing binary blobs is defined in [subsection 2.6.2](https://spec.polkadot.network/chap-state#sect-loading-runtime-code) of Polkadot spec. It involves using `zstd` compression, and the resulting compressed blob is prefixed with a unique 64-bit magic value specified in that subsection. Said subsection only defines the means of compressing Wasm code blobs; no other compression procedure is currently defined by the spec. | ||
|
||
However, in practice, the current de facto protocol uses the said procedure to compress not only Wasm code blobs but also proofs-of-validity. Such a usage is not stipulated by the spec. Currently, having solely a compressed blob, it's impossible to tell what's inside it without decompression, a Wasm blob, or a PoV. That doesn't cause any problems in the current de facto protocol, as Wasm blobs and PoV blobs take completely different execution paths, and it's impossible to mix them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argumentation "it's impossible to mix them" is not really that strong :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm just describing the current situation here, and that's something that should be fixed by implementing this RFC as well. Surely, we currently can mix them in the code, we just don't want to 🙂
@bkchr, returning to that "let's follow the typical RFC format" part and explaining why I'd like to keep it that formal. According to the Polkadot Whitepaper, "A system of RFCs, not unlike the Python Enhancement Proposals, will allow a means of publicly collaborating over protocol changes and upgrades". Thus, the RFCs are proposals to change the protocol. But what protocol? We must understand what we're changing. I understand that currently, we're more or less in limbo between the Polkadot Whitepaper and the JAM Gray paper, and that would be acceptable if we found ourselves in this situation, say, for a couple of months. But JAM is not arriving in a couple of months, and if the spec is not currently maintained, that means we're flying in the Bitcoin mode with an implementation-defined spec. What protocol is changed by any given RFC, then? That one which had been implemented by the time when the RFC was first published? Or will we count from the latest stable release? Again, if a team wants to start its very own implementation of Polkadot tomorrow, where does it start? What should they implement? Should they just read our Rust code and do the same? They may not even know Rust, after all. Or should they take the latest outdated spec and apply all the RFCs over it? Even the infamous EU bureaucracy doesn't treat you like that, you can always get a full text of legislation that is in force right now instead of getting the very first one from 1990s and a hundred amendments to compile everything yourself. That's why I believe the spec is super important. The spec should be a representation of the current state, and RFCs should propose changes to the spec, not the code. Yes, we're not living in a perfect world, and it may become more or less stale sometimes, but if we drop its support altogether, a lot of things lose sense, including alternative implementations. Alternatively, if we wanted to follow the IETF's RFC model, we could have a set of "root RFCs" defining the protocol and other RFCs amending the root ones, but I believe for a protocol as complex as Polkadot it's a counterproductive approach as the set of root RFCs would just become the spec on its own. The IETF's TCP protocol had 10 RFCs on its way before it was rewritten from scratch to include all the amendments, and that happened over a time span of 40+ years. We in Polkadot, on the other hand, have already had 60+ RFCs since Fellowship was formed, roughly in a year and a half. The IETF's approach wouldn't work here. I'll go ahead and tag @gavofyork here 'cause I believe it's a really bad situation. If the spec is not maintained, we should change it today or tomorrow at the latest. |
There are two things, the spec not being maintained and then there are like the standard RFC format. While I'm also not happy that the spec is not maintained and I would like to change this, it is complicated right now and I don't think this will change until we have JAM. Better to accept the reality instead of denying it. |
Polkadot Fellowship does not accept reality, it forms it! 😁 Fellowship is a resourceful organization and could acquire both internal and external resources to get that job done. What happens now is just a lack of understanding of the importance of the task. Having an up-to-date spec is a basic need, not a luxury. Otherwise, we don't hold our guarantees. One of them is "Polkadot is intended to be a free and open project, the protocol specification being under a Creative Commons license". And under that, we encourage people to create alternative implementations to incentivize decentralization. But if anyone wanted to do that right now, he would be fast to learn that the only way to do it is in "just-ask-Basti-how-it-works" mode. I heard you about this very RFC, and I'll fix it to be more typical, but I'm also willing to keep this discussion going and bring it up at the next in-person Fellowship meeting as I believe it's a huge design flaw in the whole process. RFCs should propose changes to the spec, and implemented RFCs MUST be reflected in the spec. That MUST just be a requirement for the implementation to pass the review. Just imagine yourself implementing the initial Polkadot with only outdated and unsupported Wasm specs in your hands. That would be a hell of an implementation. Do we have political parties in the Fellowship yet? I'm ready to establish one. "Witnesses of the Spec" or something. |
You are free to take this on and make it happen :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good.
Just need to be even more conservative when it comes to the time lines as this is not only affecting the relay chain or the collators, but every node.
I don't think a single person can do it because I don't believe there exists a single person who knows all the aspects of the current implementation in depth. I think I could design a pipeline that streamlines the spec updates. But first we should communicate and agree on the procedure anyway. Sounds like one more RFC by itself. |
I did not meant that you should keep the spec updated on your own. Someone needs to setup the process, find the people to update the spec, find out what is outdated etc. |
|
||
Currently, the only requirement for a compressed blob prefix is not to coincide with Wasm magic bytes (as stated in code comments). Changes proposed here increase prefix collision risk, given that arbitrary data may be compressed in the future. However, it must be taken into account that: | ||
* Collision probability per arbitrary blob is ≈5,4×10⁻²⁰ for a single random 64-bit prefix (current situation) and ≈2,17×10⁻¹⁹ for the proposed set of four 64-bit prefixes (proposed situation), which is still low enough; | ||
* The current de facto protocol uses the current compression implementation to compress PoVs, which are arbitrary binary data, so the collision risk already exists and is not introduced by changes proposed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to de-risk further, by deprecating blank/unprefixed payloads entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point 🤔 But I'm afraid that would involve changing the PoV format in the collator protocol which is one more change that could take years to propagate, and we'd need to support both versions all the way. Optimistically, it would take more time to stabilize than for JAM to arrive 🙂
Rendered